Oracle 12.2 UNDO模式BUG导致alert日志频繁报错故障解决
本文为站长原创文章,版权所有,未经允许,禁止转载!
数据库环境信息:
硬件:IBM P750+ IBM S824
操作系统:AIX 7100-10 64bit
oracle版本:GI oracle 12.2.0.1.180116 + DB 12.1.0 、 12.2.0.1.180116
事件分析
此为某省林业厅新上线的一套RAC集群,业务正式上线后,开始跟踪性能情况跟alert日志是否有报错,因为客户前期基本没有做12c上业务的兼容测试,跟踪了一天发现数据库存在一些问题,alert 日志信息如下:
2018-06-25T09:17:56.725402+08:00 Errors in file /u01/app/base/diag/rdbms/orcl/orcl1/trace/orcl1_ora_33489860.trc (incident=702644) (PDBNAME=XXLYTOA): ORA-00600: 内部错误代码, 参数: [ktuisc:xid], [10], [20], [3859267], [596606], [], [], [], [], [], [], [] XXLYTOA(6):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2018-06-25T09:19:52.441529+08:00 Thread 1 advanced to log sequence 3294 (LGWR switch)
Current log# 2 seq# 3294 mem# 0: +DATA/ORCL/ONLINELOG/group_2.537.974733751
Current log# 2 seq# 3294 mem# 1: +FRA/ORCL/ONLINELOG/group_2.2926.974733753 2018-06-25T09:19:52.721530+08:00 Archived Log entry 7234 added for T-1.S-3293 ID 0x598efc2b LAD:1 2018-06-25T09:23:31.390799+08:00 XXSLFH(4):Resize operation completed for file# 66, old size 814080K, new size 824320K 2018-06-25T09:33:56.947278+08:00 Errors in file /u01/app/base/diag/rdbms/orcl/orcl1/trace/orcl1_j001_18154438.trc (incident=704818) (PDBNAME=XXLYTOA): ORA-00600: internal error code, arguments: [ktuisc:xid], [10], [9], [3999594], [596751], [], [], [], [], [], [], [] XXLYTOA(6):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/base/diag/rdbms/orcl/orcl1/trace/orcl1_j001_18154438.trc (incident=704819) (PDBNAME=XXLYTOA): ORA-00600: internal error code, arguments: [ktuisc:xid], [10], [9], [3999594], [596751], [], [], [], [], [], [], [] XXLYTOA(6):Incident details in: /u01/app/base/diag/rdbms/orcl/orcl1/incident/incdir_704819/orcl1_j001_18154438_i704819.trc
2018-06-25T09:34:00.598873+08:00 XXLYTOA(6):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2018-06-25T09:34:00.677721+08:00 Errors in file /u01/app/base/diag/rdbms/orcl/orcl1/trace/orcl1_j001_18154438.trc:
ORA-00600: internal error code, arguments: [ktuisc:xid], [10], [9], [3999594], [596751], [], [], [], [], [], [], [] 2018-06-25T09:34:00.706309+08:00 Dumping diagnostic data in directory=[cdmp_20180625093400], requested by (instance=1, osid=97125247 (J001)), summary=[incident=704819]. 2018-06-25T09:34:00.717778+08:00 opidrv aborting process J001 ospid (18154438) as a result of ORA-600 2018-06-25T09:34:01.797987+08:00 Thread 1 advanced to log sequence 3295 (LGWR switch)
Current log# 1 seq# 3295 mem# 0: +DATA/ORCL/ONLINELOG/group_1.538.974733751
Current log# 1 seq# 3295 mem# 1: +FRA/ORCL/ONLINELOG/group_1.2925.974733751 2018-06-25T09:34:02.039167+08:00 Archived Log entry 7236 added for T-1.S-3294 ID 0x598efc2b LAD:1 2018-06-25T09:55:13.428207+08:00 Dumping diagnostic data in directory=[cdmp_20180625095513], requested by (instance=2, osid=48497465 (J001)), summary=[incident=333320]. 2018-06-25T09:57:56.928960+08:00 Thread 1 advanced to log sequence 3296 (LGWR switch)
Current log# 5 seq# 3296 mem# 0: +DATA/ORCL/ONLINELOG/group_5.480.975145887
Current log# 5 seq# 3296 mem# 1: +FRA/ORCL/ONLINELOG/group_5.2613.975145945 2018-06-25T09:57:57.168971+08:00 Archived Log entry 7238 added for T-1.S-3295 ID 0x598efc2b LAD:1 2018-06-25T10:03:14.371626+08:00 XXLYTOA(6):Resize operation completed for file# 72, old size 1802240K, new size 1812480K
2018-06-25T10:03:24.050272+08:00 XZAJDB(5):Resize operation completed for file# 18, old size 1024000K, new size 1029120K 2018-06-25T10:13:54.327156+08:00 Errors in file /u01/app/base/diag/rdbms/orcl/orcl1/trace/orcl1_j000_16647074.trc (incident=739758) (PDBNAME=HNLYTOA):
.......................
2018-06-25T13:34:11.790361+08:00 opidrv aborting process J000 ospid (34800226) as a result of ORA-600 2018-06-25T13:35:44.762895+08:00 Dumping diagnostic data in directory=[cdmp_20180625133544], requested by (instance=2, osid=110428255 (J001)), summary=[incident=330563]. 2018-06-25T13:45:17.666708+08:00 alter pluggable database HNLYTOA close immediate instances=all
汇总下问题出现的报错有:
1 大量 ORA-00600 [ktuisc:xid]报错。
2 只要是 JOB 任务都有 core dump
3 有问题的表跟表空间收集统计信息全部报错。
因报错频繁,DB 产生大量的 trc file 占用大量空间,临时修改 max_dump_file_size 参数限制 DUMP 大小,然后用 ADRCI 的 IPS 命令把相关的事件 ID 打个 包进行分析 trc 文件。
事件结论
因上线前数据库字符集为 ZHS16GBK,新数据库是 Oracle Database12.2,多租户模式,CDB 字符集是 AL32UTF8,PDB 字符集是 ZHS16GBK,怀疑是不是 Oracle 12.2 下对于 PDB 字符集 跟 CDB 字符集不一致导致的 bug,之后尝试把这个 PDB relocate 到跟 PDB 一致的字符集 CDB 上,发现能够部分缓解报错频率但是未根本解决问题,参考了mos 文档《CREATE INDEX = ORA-600[ktuisc:xid] in a Multitenant Environment (文档 ID 2393664.1) 》同时分析了相关trc文件,最后把CDB UNDO模式从Local Undo改成shared Mode至此问题解决。