1 备份报错
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_1 channel at 05/27/2022 09:41:13
ORA-15028: ASM file '+ARCH/cjdb/archivelog/2022_05_25/thread_1_seq_585427.1532.1105599629' not dropped; currently being accessed
2 使用lsof命令,查看归档日志是否被占用,经查询,数据库节点1占用了此归档日志。
ASMCMD> lsof -G ARCH
DB_Name Instance_Name Path
cjdb cjdb1 +arch/cjdb/archivelog/2022_05_25/thread_1_seq_585427.1532.1105599629
3 在asm实例中,针对ORA-15028,生成trace文件,来分析根本原因。
SYS@+ASM1 > select pid, program from v$process where program like '%DIAG%';
PID PROGRAM
---------- ------------------------------------------------
6 oracle@dbssvra (DIAG)
SYS@+ASM1 >select pid, spid, pname from v$process where pname like '%DIAG%';
PID SPID PNAME
---------- ------------------------ -----
6 129616 DIAG
SYS@+ASM1 >alter system set events '15028 trace name systemstate_global level 267';
System altered.
4 在asmcmd命令行中删除被占用的归档日志,让数据库自动生成trace文件
ASMCMD> rm thread_1_seq_585427.1532.1105599629
ORA-15032: not all alterations performed
ORA-15028: ASM file '+ARCH/cjdb/ARCHIVELOG/2022_05_25/thread_1_seq_585427.1532.1105599629' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)
5 关闭ORA-15028的trace文件
SYS@+ASM1 >alter system set events '15028 trace name systemstate_global off';
System altered.
6 查看生成的trace文件
-rw-r----- 1 grid oinstall 6645 May 27 10:22 +ASM1_diag_129616.trm
-rw-r----- 1 grid oinstall 194058 May 27 10:22 +ASM1_diag_129616.trc
-rw-r----- 1 grid oinstall 21385428 May 27 10:22 +ASM1_diag_129616_20220527102235.trc
7 经查看,在 +ASM1_diag_129616.trc文件中未找到有用的信息,但在系统生成的dump文件中( +ASM1_diag_129616_20220527102235.trc),有相关进程信息。为了快速找到相关信息,建议打开文件,
直接搜索thread_1_seq_585427.1532.1105599629,就会出现如下信息。
SO: 0xbcd63970, type: 132, owner: 0xba3f5248, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x1107dee20, name=ASM file, file=kff2.h LINE:898, pg=0
(kffil) netnm: cjdb1:cjdb, mapid: 20187978
gnum: 8.1267240719, fnum: 1532.1105599629, flgs: 0x182
family: 0.0, parent: 0.0
opennm: +ARCH/cjdb/archivelog/2022_05_25/thread_1_seq_585427.1532.1105599629
openflags: 0x80000000
fullnm: +arch/cjdb/archivelog/2022_05_25/thread_1_seq_585427.1532.1105599629
redun: 0x11, fdflg: 0x0, blksiz: 512, fsiz: 2806702
ftype: 4, extsz: [4294967295,0,0]
extents: 1371, start: 60, count: 1311, xp: 0x0
base: 14302, lxcnt: 1
enq: 0xfd0d1db0, cod: 0x0
unlock[0].au: 0, unlock[0].disk: 0, unlock[0].flags 0x0
unlock[1].au: 0, unlock[1].disk: 0, unlock[1].flags 0x0
unlock[2].au: 0, unlock[2].disk: 0, unlock[2].flags 0x0
unlock[3].au: 0, unlock[3].disk: 0, unlock[3].flags 0x0
unlock[4].au: 0, unlock[4].disk: 0, unlock[4].flags 0x0
unlock[5].au: 0, unlock[5].disk: 0, unlock[5].flags 0x0
strpsz: 1048576, strpwdth: 1
lnk: 0xbcd63ff0, 0xbcd63ff0
aba: 0, 0
client pid: 1180 osid: 174392
8 根据操作系统OSPID,查询数据库的sid和serial#
SYS@cjdb1 >select SID,SERIAL# from v$session where PADDR in (select ADDR from v$process where SPID=&b);
Enter value for b: 174392
old 1: select SID,SERIAL# from v$session where PADDR in (select ADDR from v$process where SPID=&b)
new 1: select SID,SERIAL# from v$session where PADDR in (select ADDR from v$process where SPID=174392)
SID SERIAL#
---------- ----------
2813 40761
9 根据查询出的会话SID,来查看是哪个用户及其执行的SQL
SYS@cjdb1 >select b.SID,b.SERIAL#,b.USERNAME,b.SCHEMANAME,b.OSUSER,b.SQL_ID from v$session b where b.SID='2813';
SID SERIAL# USERNAME SCHEMANAME OSUSER SQL_ID
---------- ---------- ---------------- ------------------------------ ------------------------------ -------------
2813 40761 SJZT SJZT Ruby 2d1z203tkk3fu
10 根据SQL_ID,查看执行的SQL,经确定,认为是业务模块,使用数据库的logminer进程数据挖掘,导致的
归档日志不释放,咨询相关业务模块,经确认,可以杀相关会话。
SQL_ID 2d1z203tkk3fu, child number 0
-------------------------------------
SELECT SCN, TIMESTAMP, XID, OPERATION_CODE, ROLLBACK, SQL_REDO, ROW_ID,
DATA_OBJ#, CSF, PXID, RBASQN, RBABLK, RBABYTE, TX_NAME from
V$LOGMNR_CONTENTS
Plan hash value: 2156073882
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)|
|* 1 | FIXED TABLE FULL| X$LOGMNR_CONTENTS | 1 | 2294 | 0 (0)|
---------------------------------------------------------------------------
11 杀相关会话,验证归档日志是否可以删除,经确认,可以进行删除。
在对应的数据库节点,执行如下命令,杀相关会话
alter system disconnect session '2813,40761' immediate;
等待2分钟后,在asmcmd命令行,执行如下命令进行删除,可以正常删除了。
ASMCMD> rm thread_1_seq_585427.1532.1105599629
ASMCMD> ls thread_1_seq_585427.1532.1105599629
ASMCMD-8002: entry 'thread_1_seq_585427.1532.1105599629' does not exist in directory '+ARCH/cjdb/ARCHIVELOG/2022_05_25/'
12 本文参考了Oracle官方文档为:
《 Get ORA-15028 when delete datafile on ASM (Doc ID 2083351.1)》 和 《 RMAN Receives ORA-15028 When Trying To Delete Archivelog File From ASM Diskgroup (Doc ID 1466848.1)》