环境说明:
DB:Oracle 11.2.0.4.0 OS:Redhat 7.6
问题现象:
数据库告警日志每天会有大量的ORA-32701错误,内容如下:
Sat Apr 03 22:09:48 2021 Errors in file /oracle/db/diag/rdbms/sytrnt/trnt1/trace/trnt1_dia0_126699.trc (incident=264060): ORA-32701: Possible hangs up to hang ID=34 detected DIA0 terminating blocker (ospid: 160308 sid: 2960 ser#: 8539) of hang with ID = 34 requested by master DIA0 process on instance 1 Hang Resolution Reason: Although the number of affected sessions did not justify automatic hang resolution initially, this previously ignored hang was automatically resolved. by terminating the process ospid:160308
影响范围:
AWR快照无法自动生成和手动生成,
尝试手动收集AWR快照,卡住很久,对应等待事件为"enq: WF - contention";
EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
Oracle默认每小时生成一次快照,11g中,默认保存时间为8天,而在10g中,默认保存时间为7天
select * from dba_hist_wr_control;
查看快照生成情况:
col BEGIN_INTERVAL_TIME for a50 col FLUSH_ELAPSED for a30 select SNAP_ID,BEGIN_INTERVAL_TIME,FLUSH_ELAPSED from dba_hist_snapshot order by snap_id;
问题分析:
查看trnt1_dia0_126699.trc日志:
Incident 264050 created, dump file: /oracle/db/diag/rdbms/sytrnt/trnt1/incident/incdir_264050/trnt1_dia0_126699_i264050.trc ORA-32701: Possible hangs up to hang ID=29 detected
查看trnt1_dia0_126699_i264050.trc日志:
*** 2021-04-03 17:09:39.961 Resolvable Hangs in the System Root Chain Total Hang Hang Hang Inst Root #hung #hung Hang Hang Resolution ID Type Status Num Sess Sess Sess Conf Span Action ----- ---- -------- ---- ----- ----- ----- ------ ------ ------------------- 29 HANG RSLNPEND 1 2960 2 2 HIGH GLOBAL Terminate Process Hang Resolution Reason: Although the number of affected sessions did not justify automatic hang resolution initially, this previously ignored hang was automatically resolved. Previous SESSION termination was unsuccessful. PROCESS termination will be attempted.
inst# SessId Ser# OSPID PrcNm Event ----- ------ ----- --------- ----- ----- 2 3162 21973 139799 M000 enq: WF - contention 1 2960 3479 151457 M000 not in wait
查看当前SQL:
*** 2021-04-03 17:09:40.084 current sql: insert into wrh$_sql_bind_metadata (snap_id, dbid, sql_id, name, position, dup_position, datatype, datatype_string, character_sid, precision, scale, max_length) SELECT /*+ ordered use_nl(bnd) index(bnd sql_id) */ :lah_snap_id, :dbid, bnd.sql_id, name, position, dup_position, datatype, dataty
完整SQL如下:
insert into wrh$_sql_bind_metadata (snap_id, dbid, sql_id, name, position, dup_position, datatype, datatype_string, character_sid, precision, scale, max_length) SELECT /*+ ordered use_nl(bnd) index(bnd sql_id) */ :lah_snap_id, :dbid, bnd.sql_id, name, position, dup_position, datatype, datatype_string, character_sid, precision, scale, max_length FROM x$kewrattrnew new, x$kewrsqlidtab tab, v$sql_bind_capture bnd WHERE new.str1_kewrattr = tab.s;
查看SQL对应执行计划如下:
13Plan hash value: 4222011306 14 15---------------------------------------------------------------------------------------------- 16| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 17---------------------------------------------------------------------------------------------- 18| 0 | INSERT STATEMENT | | | | 2 (100)| | 19| 1 | LOAD TABLE CONVENTIONAL | | | | | | 20| 2 | NESTED LOOPS | | 100 | 11500 | 2 (100)| 00:00:01 | 21|* 3 | HASH JOIN | | 100 | 4800 | 0 (0)| | 22| 4 | FIXED TABLE FULL | X$KEWRATTRNEW | 100 | 3400 | 0 (0)| | 23| 5 | FIXED TABLE FULL | X$KEWRSQLIDTAB | 100 | 1400 | 0 (0)| | 24|* 6 | FIXED TABLE FIXED INDEX| X$KQLFBC (ind:2) | 1 | 67 | 0 (0)| | 25---------------------------------------------------------------------------------------------- 26 27Predicate Information (identified by operation id): 28--------------------------------------------------- 29 30 3 - access("NEW"."STR1_KEWRATTR"="TAB"."SQLID_KEWRSIE") 31 6 - filter(("INST_ID"=USERENV('INSTANCE') AND "TAB"."SQLID_KEWRSIE"="KQLFBC_SQLID" 32 AND "TAB"."CHILDADDR_KEWRSIE"="KQLFBC_CADD"))
查看X$KQLFBC数据量很大:
select count(*) from X$KQLFBC; ---长时间无返回结果
查看MOS:
Error ORA-32701 'On Current SQL: insert into wrh$_sql_bind_metadata' (文档 ID 2226216.1)
原因:
CAUSE 视图v$sqlbind_capture对应于固定表X$KQLFBC,该表主要用于存储与数据绑定相关的变量。 View v$sqlbind_capture corresponds to fixed table X$KQLFBC table which is mainly used to store variables associated with the binding of data. 在使用大量绑定变量的大型数据库中可以注意到此错误。 This error can be noticed in large databases using large amount of binding variables.
解决方案:
1. Collect statistics on following fixed table:
exec dbms_stats.gather_table_stats('SYS', 'X$KEWRATTRNEW');
exec dbms_stats.gather_table_stats('SYS', 'X$KEWRSQLIDTAB');
立即生效
exec dbms_stats.gather_table_stats(ownname => 'SYS',tabname => 'X$KEWRATTRNEW',no_invalidate => FALSE);
exec dbms_stats.gather_table_stats(ownname => 'SYS',tabname => 'X$KEWRSQLIDTAB',no_invalidate => FALSE);
经测试,收集统计信息后,并没有解决问题,AWR还是无法生成快照。
Or
2.重新启动数据库将释放X$KQLFBC表数据(需要停机窗口,谨慎操作)
2. Restarting the database will release of X$KQLFBC table data
Or
3.定期刷新共享池(需要停机窗口,谨慎操作)
3. Flush shared_pool on a regular basis
alter system flush shared_pool;
Or
4.针对数据量较大的基表,也可以通过设置参数来屏蔽相关数据写入到awr中。
alter system set "_awr_disabled_flush_tables" = 'wrh$_sql_bind_metadata';
其中 wrh$_sql_bind_metadata 可以替换成其他导致 AWR 无法正常完成的收集任务的基表名称。
#####chenjuchao 20210830 21:45#####