上周给客户做巡检时在alert日志发现这个错误ORA-07445: exception encountered: core dump [PC:0x90000000D017E10] [SIGSEGV]
ORA-07445: exception encountered: core dump [PC:0x90000000D017E10] [SIGSEGV] [ADDR:0xFFFFFFFFDFFFB00] [PC:0x90000000D017E10] [Address not mapped to object]
2014/11/18 04:39:20 Incident details in: /u01/oracle/diag/rdbms/jzh/jzh2/incident/incdir_432484/jzh2_ora_16778166_i432484.trc
2014/11/18 04:39:20 Use ADRCI or Support Workbench to package the incident.
2014/11/18 04:39:20 See Note 411.1 at My Oracle Support for error and packaging details.
2014/11/18 04:39:23 Dumping diagnostic data in directory=[cdmp_20141118043923], requested by (instance=2, osid=16778166), summary=[incident=432484].
2014/11/18 04:39:25 Sweep [inc][432484]: completed
2014/11/18 04:39:25 Sweep [inc2][432484]: completed
检查jzh2_ora_16778166_i432484.trc文件
*** 2014-11-18 04:39:20.986
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
700001b4ec35250 287 package body SYS.DBMS_BACKUP_RESTORE
主要关注Call Stack调用堆栈部份,标红字体似乎与备份有关,接着往下看调用堆栈跟踪;
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+40 bl 107c4a874 000000000 ? 000000001 ?
000000003 ? 000000001 ?
000000000 ? 000000001 ?
000000003 ? 000000001 ?
ksedst1()+112 call skdstdst() 13A78E053D5C8E98 ?
4820384100000000 ?
110849E30 ? 000002004 ?
1106C99A8 ? 10A389A84 ?
000000000 ? 1106C99A8 ?
ksedst()+40 call ksedst1() 303106C99A8 ? 002050033 ?
10A389A78 ? 7000000000262 ?
000000000 ? 000000000 ?
10A389078 ? 000000000 ?
dbkedDefDump()+1516 call ksedst() 00000000D ? 110000B14 ?
000000000 ? 110000B14 ?
000000000 ? 00000000B ?
F1000F0A00340000 ?
300000003 ?
ksedmp()+72 call dbkedDefDump() 300000000 ? 064306634 ?
90000000D017E10 ? 000000001 ?
00000000B ? 11084AB10 ?
11084ADC0 ? 000000001 ?
ssexhd()+2672 call ksedmp() 109E0CBF0 ? 200000002 ?
函数调用顺序 skdstdst-> ksedst1-> ksedst-> dbkedDefDump-> ksedmp-> ssexhd,接着往下看;
**** At frame 60 recursion pattern of size 2 found, for return address
90000000d022d50 suppressing printing.
**** At frame 14748 recursion pattern broken, last return was ----->在框架14748递归模式被broken
90000000d017fd8
skgfrls()+492 call 90000000cff2b80 1054647D0 ? 10A258658 ?
00000001A ? 000000002 ?
FFFFFFFFFFFAEF4 ? 000000000 ?
000000000 ? 700000000023600 ?
ksfq_rls()+160 call skgfrls() 1054551C4 ? 700001B6DB04418 ?
700001A3AD57258 ? 000000002 ?
000000001 ? 700001B6A0BCC98 ?
11085AF58 ? 700000000003668 ?
ksfqxdes()+1368 call ksfq_rls() 000000000 ? 1106597D8 ?
700001B6DB043B8 ? 000000005 ?
700001B6A0BCC98 ? 110000250 ?
FFFFFFFFFFF44F0 ?
8842408200000002 ?
krbdrel()+148 call ksfqxdes() 110B3C2F0 ? 000000000 ?
110983E18 ? 000000000 ?
000000000 ? 000000000 ?
110974628 ? 000000003 ?
krbdgdal()+480 call krbdrel() 000000000 ? 110B2A388 ?
000000000 ? 000000010 ?
FFFFFFFFFFF4CF0 ? 110B29870 ?
FFFFFFFFFFF4D10 ?
FFFFFFFFFFF4C40 ?
krbidvda()+480 call krbdgdal() 11065D870 ? 000000000 ?
FFFFFFFFFFF5180 ?
2220004100000000 ?
000000001 ? 000000400 ?
110B29890 ? 000000044 ?
pevm_icd_call_commo call krbidvda() 110974628 ? 700001B6A0BCC98 ?
函数调用顺序skgfrls-> ksfq_rls-> ksfqxdes-> krbdrel-> krbdgdal-> krbidvda 这一过程出现了问题,ksfg_rls是release the sequential device that was allocated earlier,说明在release device(释放设备)时出现了问题,当rman在释放channel时可能遇到该问题,接着往下看;
client details:
O/S info: user: oraprd, term: , ospid: 55312806
machine: jzh2 program: rman@jzh2 (TNS V1-V3)
client info: rman channel=ch02
application name: backup incr datafile, hash value=1167306624
action name: 0000066 STARTED4, hash value=2243181322
Current Wait Stack:
0: waiting for 'Backup: MML shutdown'
=0x0, =0x0, =0x0
wait_id=1815 seq_num=2056 snap_id=1
wait times: snap=2.196096 sec, exc=2.196096 sec, total=2.196096 sec
wait times: max=infinite, heur=2.196096 sec
wait counts: calls=0 os=0
in_wait=1 iflags=0x5a0
--------------------------------------------------
[2 samples, 04:39:21 - 04:39:22]
waited for 'Backup: MML shutdown', seq_num: 2056
p1: ''=0x0
p2: ''=0x0
p3: ''=0x0
time_waited: >= 1 sec (still in wait)
[1 sample, 04:39:20]
not in wait at each sample
[117 samples, 04:37:23 - 04:39:19]
rman channel为ch02在等待MML shutdown,等待时间从04:37:23 - 04:39:19,alert日志与trace文件中报ORA-07445的时间点如下:
2014/11/18 04:39:20 ORA-07445: exception encountered: core dump [PC:0x90000000D017E10] [SIGSEGV] [ADDR:0xFFFFFFFFDFFFB00] [PC:0x90000000D017E10] [Address not mapped to object] []
至此可以判断该错误是由于rman在release channel时导致的。
mos上解释是bug 17240394,oracle没有给出原因和解决方案,一般这个错误可以忽略!