1 数据库alert日志频繁告警,每分钟报一次。根据告警信息,认为是事务在进行回滚导致的锁冲突。
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
2 查看恢复使用的回滚段
SYS@XXDB3 >select b.name,b.inst#,b.status$,a.ktuxeusn,a.ktuxeslt,a.ktuxesqn,a.ktuxesiz,a.ktuxesta
from x$ktuxe a, undo$ b where a.ktuxecfl like '%DEAD%' and a.ktuxeusn = b.us#;
NAME INST# STATUS$ KTUXEUSN KTUXESLT KTUXESQN KTUXESIZ KTUXESTA
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------------
_SYSSMU60_3444085615$ 2 2 60 15 6684846 0 INACTIVE
_SYSSMU606_4051243031$ 2 2 606 13 139201 0 INACTIVE
_SYSSMU883_22107100$ 3 3 883 9 13962437 2731264 ACTIVE
根据如上信息,将事务信息进程转换。
活动的会话 _SYSSMU883_22107100$ 将 883 9 13962437 转换为 十六进制 373.9.D50CC5
3 查看死的事务及其恢复进度,还有270万数据块还没有回滚完成。
SYS@XXDB3 >select ktuxeusn,ktuxeslt,ktuxesqn,ktuxesta,ktuxesiz from x$ktuxe where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%' order by ktuxesiz ;
KTUXEUSN KTUXESLT KTUXESQN KTUXESTA KTUXESIZ
---------- ---------- ---------- ---------------- ----------
883 9 13962437 ACTIVE 2728325
4 通过v$fast_start_transactions视图查看事务恢复信息,及进度信息
SYS@XXDB3 >select t.USN,t.SLT,t.SEQ,t.STATE,t.UNDOBLOCKSDONE,t.UNDOBLOCKSTOTAL,t.PID,t.CPUTIME,t.XID
from v$fast_start_transactions t;
USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL PID CPUTIME XID
---------- ---------- ---------- ---------------- -------------- --------------- ---------- ---------- ----------------
1 1 28197245 RECOVERED 18582 18582 30429 010001007D41AE01
79 0 18624142 RECOVERED 2131645 2131645 115595 4F0000008E2E1C01
883 9 13962437 RECOVERING 269974 2775242 23418 73030900C50CD500
26 7 9559527 RECOVERED 570 570 1 1A000700E7DD9100
38 4 5652761 RECOVERING 85313 86927 26852095 2600040019415600
95 8 10571398 RECOVERED 175 175 1 5F000800864EA100
6 rows selected.
或通过如下语句进行查找
SYS@XXDB3 >select t.USN,t.SLT,t.SEQ,t.STATE,t.UNDOBLOCKSDONE,t.UNDOBLOCKSTOTAL,t.PID,t.CPUTIME,t.XID
from v$fast_start_transactions t
2 where t.USN=883 and t.SLT=9;
USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL PID CPUTIME XID
---------- ---------- ---------- ---------------- -------------- --------------- ---------- ---------- ----------------
883 9 13962437 RECOVERING 270889 2775242 23526 73030900C50CD500
5 通过dump undo header 寻找事务使用的undo segment
执行的SQL语句
alter system dump undo header '_SYSSMU883_22107100$';
生成的文件
-rw-r----- 1 oracle asmadmin 124 Nov 18 14:15 XXDB3_ora_159663.trm
-rw-r----- 1 oracle asmadmin 39321 Nov 18 14:15 XXDB3_ora_159663.trc
查看生成文件的内容,找到异常的数据块,state=10 意味活动事务,DBA为数据块的地址

查询对应dba号,并转换为十进制。
SYS@XXDB3 >select to_number('9e45f736','xxxxxxxxxxxxxxxxxx') from dual;
TO_NUMBER('9E45F736','XXXXXXXXXXXXXXXXXX')
------------------------------------------
2655385398
6 根据DBA号, 查看数据文件的ID 及数据块
SYS@XXDB3 >select dbms_utility.data_block_address_file(2655385398) from dual;
DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(2655385398)
------------------------------------------------
633
SYS@XXDB3 >select dbms_utility.data_block_address_block(2655385398) from dual;
DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(2655385398)
-------------------------------------------------
390966
7 dump对应的数据块,查看对象名
alter system dump datafile 633 block 390966;
根据生成trace的如下信息,可以发现出问题的在索引上,对象为 seg/obj: 0x1944ff
Block header dump: 0x9e45f736
Object id on Block? Y
seg/obj: 0x1944ff csc: 0xe41.e7d3dd9a itc: 2 flg: E typ: 2 - INDEX
brn: 0 bdba: 0x9e45f680 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0053.006.003c9137 0xc804028d.f4da.01 CBU- 0 scn 0x0e2e.7a50997c
0x02 0x0053.01a.003c633b 0xc843541a.f4db.0f C-U- 0 scn 0x0e41.e6e06a6e
Leaf block dump
===============
header address 140709705595492=0x7ff98804ba64
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 137
kdxcofbo 310=0x136
kdxcofeo 3924=0xf54
kdxcoavs 3614
kdxlespl 0
kdxlende 0
kdxlenxt 2755946911=0xa444699f
kdxleprv 2760175590=0xa484efe6
kdxledsz 10
kdxlebksz 8032
row#0[4403] flag: ------, lock: 0, len=30, data:(10): 00 19 44 d8 2e 24 6a f9 00 2f
col 0; len 7; (7): 78 76 0c 01 01 01 01
col 1; len 9; (9): c8 0b 01 01 01 01 1d 32 2c
row#1[4433] flag: ------, lock: 0, len=30, data:(10): 00 19 44 d8 37 e0 09 e9 00 6b
col 0; len 7; (7): 78 76 0c 01 01 01 01
col 1; len 9; (9): c8 0b 01 01 01 01 1d 32 2d
row#2[4463] flag: ------, lock: 0, len=30, data:(10): 00 19 44 d8 38 5f c1 70 00 55
将十六进制的对象ID转换为十进制的
SYS@XXDB3 >select to_number('1944ff','xxxxxxxxxxxxxxxxxx') from dual;
TO_NUMBER('1944ff','XXXXXXXXXXXXXXXXXX')
------------------------------------------
1656063
根据对象ID,查看对象名
SYS@XXDB3 >select a.OWNER,a.OBJECT_NAME,a.OBJECT_TYPE from dba_objects a where a.OBJECT_ID='1656063';
OWNER OBJECT_NAME OBJECT_TYPE
-------------------- ------------------------------ ------------------------------
TESTUSER PK_A_USER_STOP INDEX
根据索引找表名:
OWNER INDEX_NAME TABLE_OWNER TABLE_NAME
-------------------- -------------------- -------------------- --------------------
TESTUSER PK_A_USER_STOP TESTUSER A_USER_STOP
SYS@XXDB3 >select OBJ#,DATAOBJ#,OWNER#,NAME,NAMESPACE,SUBNAME,TYPE# from obj$
where obj#=1656063;
OBJ# DATAOBJ# OWNER# NAME NAMESPACE SUBNAME TYPE#
---------- ---------- ---------- ------------------------------ ---------- ------------------------------ ----------
1656063 1656063 89 PK_A_USER_STOP 4 1
8 经过4天的等待,数据库总算回滚完相关事务,数据库也没有相关告警信息
SYS@XXDB3 >select sysdate from dual;
SYSDATE
------------------
22-NOV-21
SYS@XXDB3 >select ktuxeusn,ktuxeslt,ktuxesqn,ktuxesta,ktuxesiz from x$ktuxe where ktuxesta <> 'INACTIVE' and ktuxecfl like '%DEAD%' order by ktuxesiz ;
no rows selected
经过查看在21日 5:25以后,数据库再未报任何告警。
Sun Nov 21 05:25:46 2021
Transaction recovery: lock conflict caught and ignored
Sun Nov 21 05:25:46 2021
Thread 3 advanced to log sequence 702806 (LGWR switch)
Current log# 27 seq# 702806 mem# 0: +DATA_P/XXDB3 /onlinelog/group_27.507.1074248035
Current log# 27 seq# 702806 mem# 1: +DATA_P/XXDB3 /onlinelog/group_27.508.1074248039
9 原因分析
咨询相关业务人员,相关人员在执行表分区,将执行了2天的工作强制终止,才是导致此次事件的根本原因。
故在长时间运行的工作,强制终止,数据库进行大量回滚操作,有可能会产生未知的报错,还需谨慎。
10 第二种解决方法,不推荐使用。
根据Instance Terminates With ORA-07445 [ktundo] Errors From SMON Process (Doc ID 2736978.1) 文档,
可以通过设置10513事件,禁用SMON回滚会话,并重建相关undo表空间来解决。但设置此事件,
将会导致数据库所有会话都不能 回滚,有可能会导致数据方面的问题,故如非特别情况,不建议使用。

