客户反映说一个库前几天因为磁盘损坏，导致操作系统宕机。几经折腾终于把盘恢复了，却又发现数据库无法启动了。并且已经使用trace里面的backup controlfile重建了控制文件，但还是无法正常启动。

检查之后， mount 实例，尝试 recover database 并 open 数据库 :

recover database until cancel using backup controlfile;

因为controlfile是重建出来的，所以当前的controlfile并不知道哪个在线日志是current的，需要手动指定。

Thu Apr 16 13:01:14 2015 ALTER DATABASE RECOVER database until cancel using backup controlfile Thu Apr 16 13:01:14 2015 Media Recovery Start WARNING! Recovering data file 1 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. ... WARNING! Recovering data file 75 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. ORA-279 signalled during: ALTER DATABASE RECOVER database until cancel usin... Thu Apr 16 13:02:31 2015 ALTER DATABASE RECOVER LOGFILE 'E:\datafile\redo01.log' Thu Apr 16 13:02:31 2015 Media Recovery Log E:\datafile\redo01.log Errors with log E:\datafile\redo01.log ORA-339 signalled during: ALTER DATABASE RECOVER LOGFILE 'E:\datafile\red... Thu Apr 16 13:02:31 2015 ALTER DATABASE RECOVER CANCEL ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ... Thu Apr 16 13:03:02 2015 ALTER DATABASE RECOVER database until cancel using backup controlfile Thu Apr 16 13:03:02 2015 Media Recovery Start WARNING! Recovering data file 1 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. ... WARNING! Recovering data file 75 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. ORA-279 signalled during: ALTER DATABASE RECOVER database until cancel usin... Thu Apr 16 13:03:16 2015 ALTER DATABASE RECOVER LOGFILE 'E:\datafile\redo02.log' Thu Apr 16 13:03:16 2015 Media Recovery Log E:\datafile\redo02.log Errors with log E:\datafile\redo02.log ORA-339 signalled during: ALTER DATABASE RECOVER LOGFILE 'E:\datafile\red... Thu Apr 16 13:03:16 2015 ALTER DATABASE RECOVER CANCEL ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ... Thu Apr 16 13:03:34 2015 ALTER DATABASE RECOVER database until cancel using backup controlfile Thu Apr 16 13:03:34 2015 Media Recovery Start WARNING! Recovering data file 1 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. ... WARNING! Recovering data file 75 from a fuzzy file. If not the current file it might be an online backup taken without entering the begin backup command. ORA-279 signalled during: ALTER DATABASE RECOVER database until cancel usin... Thu Apr 16 13:03:49 2015 ALTER DATABASE RECOVER LOGFILE 'E:\datafile\redo03.log' Thu Apr 16 13:03:49 2015 Media Recovery Log E:\datafile\redo03.log Incomplete recovery applied all redo ever generated. Recovery completed through change 14081497748113 Media Recovery Complete Completed: ALTER DATABASE RECOVER LOGFILE 'E:\datafile\red

多次尝试后发现redo03.log是所需要的current group member，尝试alter database open resetlogs，但是出现ORA-00600 [4000]错误:

Thu Apr 16 08:00:40 2015 Errors in file d:\oracle\admin\kf2\udump\kf2_ora_144.trc: ORA-00600: 内部错误代码，参数 : [4000], [3], [], [], [], [], [], [] Thu Apr 16 08:00:40 2015 Errors in file d:\oracle\admin\kf2\udump\kf2_ora_144.trc: ORA-00704: 引导程序进程失败 ORA-00704: 引导程序进程失败 ORA-00600: 内部错误代码，参数 : [4000], [3], [], [], [], [], [], [] Thu Apr 16 08:00:40 2015 Error 704 happened during db open, shutting down database USER: terminating instance due to error 704 Thu Apr 16 08:00:40 2015 Errors in file d:\oracle\admin\kf2\bdump\kf2_pmon_5172.trc: ORA-00704: bootstrap process failure

以上ORA-00600: [4000], [3],说明在使用usn=3的回滚段rollback数据块时发现rollback segment存在错误，且伴随有ORA-00704: bootstrap process failure错误，说明需要回滚的数据块是bootstrap需要的自举对象。

一般来说bootstrap object需要做rollback或cleanup，而apply undo数据时，我们是无法使用_corrupted_rollback_segments,_offline_rollback_segments或10513事件来阻止ORA-00600: [4000]的发生。

不过还是抱着试一试的心理尝试一下：

*._allow_resetlogs_corruption=true event=('10510 trace name context forever,level 1','10511 trace name context forever,level 2','10512 trace name context forever,level 1','10513 trace name context forever,level 2') *._corrupted_rollback_segments=(_SYSSMU3$) *._offline_rollback_segments=(_SYSSMU3$)

再次尝试启动数据库：

SMON: enabling cache recovery Thu Apr 16 08:10:20 2015 Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5148.trc: ORA-00600: 内部错误代码，参数 : [4000], [3], [], [], [], [], [], [] Thu Apr 16 08:10:20 2015 Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5148.trc: ORA-00704: 引导程序进程失败 ORA-00704: 引导程序进程失败 ORA-00600: 内部错误代码，参数 : [4000], [3], [], [], [], [], [], []

确实还是无法避免bootstrap对象发生ORA-00600:[4000]错误。没办法，要想恢复这个数据库就必须要解决这个bootstrap对象。

先来看一下ORA-00600:[4000]内部错误的trace日志:

Dump file d:\oracle\admin\kf2\udump\kf2_ora_5148.trc Thu Apr 16 08:10:08 2015 ORACLE V9.2.0.8.0 - Production vsnsta=0 vsnsql=12 vsnxtr=3 Windows 2000 Version 5.2 Service Pack 2, CPU type 586 Oracle9i Enterprise Edition Release 9.2.0.8.0 - Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.8.0 - Production Windows 2000 Version 5.2 Service Pack 2, CPU type 586 Instance name: kf2 Redo thread mounted by this instance: 1 Oracle process number: 12 Windows thread id: 5148, image: ORACLE.EXE *** SESSION ID 9.3) 2015-04-16 08:10:08.515 Start recovery at thread 1 ckpt scn 14081497668136 logseq 1 block 2 *** 2015-04-16 08:10:10.265 Media Recovery Log D:\ORACLE\ORA92\RDBMS\ARC00001.001 *** 2015-04-16 08:10:10.281 Media Recovery Log D:\ORACLE\ORA92\RDBMS\ARC00001.001 *** 2015-04-16 08:10:18.015 *** 2015-04-16 08:10:20.140 ksedmp: internal or fatal error ORA-00600: 内部错误代码，参数 : [4000], [3], [], [], [], [], [], [] Current SQL statement for this session: select ctime, mtime, stime from obj$ where obj# = :1

引发错误的数据块头信息：

Block header dump: 0x0040007a Object id on Block? Y seg/obj: 0x12 csc: 0xcce.9aabd24f itc: 1 flg: - typ: 1 - DATA fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0003.000.0001a69c 0x0080002d.33df.01 --U- 1 fsc 0x0000.9aabd250 data_block_dump,data header at 0x34270044

通过上面的trace文件内容，我们知道：

1. 引发ORA-00600:[4000], [3]错误的语句是”select ctime, mtime, stime from obj$ where obj# = :1″。这是一条常用的递归SQL语句，查询的对象是非常重要的bootstrap自举对象OBJ$基表，这说明需要cleanup的块是OBJ$表上的。

2. 引发ORA-00600:[4000], [3]错误的数据块是1号数据文件的122块，seg/obj为0×12，块类型为Data，且存在有一条ITL entry:

SQL> select DBMS_UTILITY.data_block_address_file (TO_NUMBER ('0040007a','xxxxxxxx')) file_no, 2 DBMS_UTILITY.data_block_address_block (TO_NUMBER ('0040007a','xxxxxxxx')) block_no 3 from dual; FILE_NO BLOCK_NO ---------- ---------- 1 122 SQL> Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0003.000.0001a69c 0x0080002d.33df.01 --U- 1 fsc 0x0000.9aabd250

因为该致命内部错误涉及到的对象是重要的Bootstrap表OBJ$，无法使用传统的例如_corrupted_rollback_segments,_offline_rollback_segments或10513事件来阻止ORA-00600: [4000]的发生，必须使用块修改工具BBED来修改存在问题的数据块将ITL事务槽的FLAG从U修改为C(Commit)，手工提交该事务。

事务状态标识：

TRANSACTION_COMMITED = 0x08;TRANSACTION_UPBOUND = 0x02;TRANSACTION_ACTIVE = 0x01;

Flag= -U- 即TRANSACTION_UPBOUND时，flag值为0×02，需要将该字节修改为TRANSACTION_COMMITED = 0×08;

下面用bbed修改system01.dbf文件。注意修改前一定要先备份。

E:\datafile>bbed filename=system01.dbf password=blockedit blocksize=8192 mode=edit BBED: Release 2.0.0.0.0 - Limited Production on 星期四 4 月 16 09:58:02 2015 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. ************* !!! For Oracle Internal Use only !!! *************** BBED> map File: system01.dbf (0) Block: 1 Dba:0x00000000 ------------------------------------------------------------ Data File Header struct kcvfh, 360 bytes @0 ub4 tailchk @8188 BBED> set block 122 BLOCK# 122 BBED> map File: system01.dbf (0) Block: 122 Dba:0x00000000 ------------------------------------------------------------ KTB Data Block (Table/Cluster) struct kcbh, 20 bytes @0 struct ktbbh, 48 bytes @20 struct kdbh, 14 bytes @68 struct kdbt[1], 4 bytes @82 sb2 kdbr[108] @86 ub1 freespace[859] @302 ub1 rowdata[7027] @1161 ub4 tailchk @8188 BBED> p ktbbh struct ktbbh, 48 bytes @20 ub1 ktbbhtyp @20 0x01 (KDDBTDATA) union ktbbhsid, 4 bytes @24 ub4 ktbbhsg1 @24 0x00000012 ub4 ktbbhod1 @24 0x00000012 struct ktbbhcsc, 8 bytes @28 ub4 kscnbas @28 0x9aabd24f ub2 kscnwrp @32 0x0cce b2 ktbbhict @36 1 ub1 ktbbhflg @38 0x02 (NONE) ub1 ktbbhfsl @39 0x00 ub4 ktbbhfnx @40 0x00000000 struct ktbbhitl[0], 24 bytes @44 struct ktbitxid, 8 bytes @44 ub2 kxidusn @44 0x0003 ub2 kxidslt @46 0x0000 ub4 kxidsqn @48 0x0001a69c struct ktbituba, 8 bytes @52 ub4 kubadba @52 0x0080002d ub2 kubaseq @56 0x33df ub1 kubarec @58 0x01 ub2 ktbitflg @60 0x2001 (KTBFUPB) union _ktbitun, 2 bytes @62 b2 _ktbitfsc @62 0 ub2 _ktbitwrp @62 0x0000 ub4 ktbitbas @64 0x9aabd250 BBED> set offset 61 OFFSET 61 BBED> set count 16 COUNT 16 BBED> d File: system01.dbf (0) Block: 122 Offsets: 61 to 76 Dba:0x00000000 ------------------------------------------------------------------------ 20000050 d2ab9a00 016c00ff ffea0045 <32 bytes per line> BBED> m /x 0x80 File: system01.dbf (0) Block: 122 Offsets: 61 to 76 Dba:0x00000000 ------------------------------------------------------------------------ 80000050 d2ab9a00 016c00ff ffea0045 <32 bytes per line> BBED> sum apply Check value for File 0, Block 122: current = 0xb0d6, required = 0xb0d6 BBED>

再次尝试打开数据库，出现 ORA-00600:[2256] 错误：

Thu Apr 16 14:43:57 2015 Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5744.trc: ORA-00600: 内部错误代码，参数 : [2256], [3278], [3221225472], [3278], [3221225539], [], [], [] Thu Apr 16 14:43:57 2015 Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5744.trc: ORA-00600: 内部错误代码，参数 : [2256], [3278], [3221225472], [3278], [3221225539], [], [], [] Thu Apr 16 14:43:57 2015 Error 600 happened during db open, shutting down database USER: terminating instance due to error 600

一般ORA-00600:[2256]错误可以通过10015 ADJUST_SCN事件来推进SCN解决。

MOS里面对ORA-00600:[2256]的各个参数解释如下：

Arg [a] Current SCN WRAP Arg Current SCN BASE Arg [c] dependent SCN WRAP Arg [d] dependent SCN BASE Arg [e] Where present this is the DBA where the dependent SCN came from.

我们的日志中的各个参数值如下：

ORA-00600: 内部错误代码，参数 : [2662], [3278], [2594951966], [3278], [2595857063], [4235250], [], []

根据level的计算规则算出需要推进的level大小：

Level = Arg[c] * 4 + Arg[d]/1024/1024/1024 = 13115

alter session set events '10015 trace name adjust_scn level 13115';

再次open数据库，alert日志如下：
Thu Apr 16 13:47:49 2015 alter database open resetlogs Thu Apr 16 13:47:49 2015 RESETLOGS is being done without consistancy checks. This may result in a corrupted database. The database should be recreated. RESETLOGS after incomplete recovery UNTIL CHANGE 14081497748203 Resetting resetlogs activation ID 1527068138 (0x5b0535ea) Online log 1 of thread 1 was previously cleared Online log 2 of thread 1 was previously cleared Thu Apr 16 13:47:51 2015 Assigning activation ID 1527030262 (0x5b04a1f6) Thread 1 opened at log sequence 1   Current log# 3 seq# 1 mem# 0: E:\DATAFILE\REDO03.LOG   Current log# 3 seq# 1 mem# 1: E:\DATAFILE\REDO06.LOG Successful open of redo thread 1 Thu Apr 16 13:47:52 2015 SMON: enabling cache recovery Thu Apr 16 13:47:52 2015 Debugging event used to advance scn to 14082124021760 Dictionary check beginning Dictionary check complete Thu Apr 16 13:47:55 2015 SMON: enabling tx recovery Thu Apr 16 13:47:55 2015 Database Characterset is ZHS16CGB231280 Hex dump of Absolute File 1, Block 44180 in trace file d:\oracle\admin\kf2\udump\kf2_ora_5052.trc *** Corrupt block relative dba: 0x0040ac94 (file 1, block 44180) Fractured block found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x0040ac94 last change scn: 0x0cce.9ab97c09 seq: 0x1 flg: 0x06 consistency value in tail: 0xae1f0601 check value in block header: 0xc9d7, computed block checksum: 0xc30f spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x0040ac94 (file 1, block 44180) found same corrupted data Thu Apr 16 13:47:55 2015 Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5052.trc: ORA-01578: ORACLE 数据块损坏（文件号 1 ，块号 44180 ） ORA-01110: 数据文件 1: 'E:\DATAFILE\SYSTEM01.DBF' Error 1578 happened during db open, shutting down database USER: terminating instance due to error 1578

可以看到，这时的报错已经不一样了，这是因为system01.dbf文件中还有其他的坏块从而导致down库的。

用dbv检查system01.dbf文件：
E:\datafile>dbv file='E:\datafile\SYSTEM01.dbf' blocksize=8192 DBVERIFY: Release 9.2.0.8.0 - Production on 星期四 4 月 16 13:59:11 2015 Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved. DBVERIFY - 验证正在开始 : FILE = E:\datafile\SYSTEM01.dbf Block Checking: DBA = 4194426, Block Type = KTB-managed data block data header at 0x1162244 kdbchk: row locked by non-existent transaction         table=0   slot=46         lockid=1   ktbbhitc=1 页 122 失败，校验代码为 6101 汇入的页 43268 - 可能是介质损坏 *** Corrupt block relative dba: 0x0040a904 (file 1, block 43268) Fractured block found during dbv: Data in bad block - type: 6 format: 2 rdba: 0x0040a904 last change scn: 0x0cce.9ab97baa seq: 0x1 flg: 0x06 consistency value in tail: 0xaf060601 check value in block header: 0x43af, computed block checksum: 0xd8a5 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** 汇入的页 44052 - 可能是介质损坏 *** Corrupt block relative dba: 0x0040ac14 (file 1, block 44052) Fractured block found during dbv: Data in bad block - type: 6 format: 2 rdba: 0x0040ac14 last change scn: 0x0cce.9aaba70d seq: 0x1 flg: 0x06 consistency value in tail: 0x748e0601 check value in block header: 0x7766, computed block checksum: 0x91c3 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** 汇入的页 44180 - 可能是介质损坏 *** Corrupt block relative dba: 0x0040ac94 (file 1, block 44180) Fractured block found during dbv: Data in bad block - type: 6 format: 2 rdba: 0x0040ac94 last change scn: 0x0cce.9ab97c09 seq: 0x1 flg: 0x06 consistency value in tail: 0xae1f0601 check value in block header: 0xc9d7, computed block checksum: 0xc30f spare1: 0x0, spare2: 0x0, spare3: 0x0 *** 汇入的页 47652 - 可能是介质损坏 *** Corrupt block relative dba: 0x0040ba24 (file 1, block 47652) Fractured block found during dbv: Data in bad block - type: 6 format: 2 rdba: 0x0040ba24 last change scn: 0x0cce.9aabaf06 seq: 0x1 flg: 0x06 consistency value in tail: 0x7baa0601 check value in block header: 0xf2dc, computed block checksum: 0x9ec4 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** DBVERIFY - 验证完成检查的页总数           ： 128000 处理的页总数（数据）： 31569 失败的页总数（数据）： 1 处理的页总数（索引）： 5268 失败的页总数（索引）： 0 处理的页总数（其它）： 1546 处理的总页数 ( 段 )  : 0 失败的总页数 ( 段 )  : 0 空的页总数              ： 89613 标记为损坏的总页数： 4 汇入的页总数             ： 4 Highest block SCN            : 14081498655836 (3278.2595859548) E:\datafile>

检查出有4个坏块（43268，44180，44052，47652）。

再次用bbed工具修复坏块：
E:\datafile>bbed filename=system01.dbf password=blockedit blocksize=8192 mode=edit BBED: Release 2.0.0.0.0 - Limited Production on 星期四 4 月 16 15:03:53 2015 Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved. ************* !!! For Oracle Internal Use only !!! *************** BBED> set block 43268         BLOCK#          43268 BBED> p kcbh struct kcbh, 20 bytes                       @0    ub1 type_kcbh                            @0        0x06    ub1 frmt_kcbh                            @1        0x02    ub1 spare1_kcbh                          @2        0x00    ub1 spare2_kcbh                          @3        0x00    ub4 rdba_kcbh                            @4        0x0040a904     ub4 bas_kcbh                             @8        0x9ab97baa    ub2 wrp_kcbh                             @12       0x0cce     ub1 seq_kcbh                             @14       0x01    ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)    ub2 chkval_kcbh                          @16       0x43af    ub2 spare3_kcbh                          @18       0x0000 BBED> p tailchk ub4 tailchk                                 @8188     0xaf060601

正常情况下tail check=scn base+block type+scn seq，因此taikchk的值应该是7baa + 06 + 01,即[url=] 0×7baa0601[/url] 。而我们的当前值是 0xaf060601 。

根据little-endian的规则，0×7baa0601应该写成：01 06 aa 7b
BBED> m /x 0106aa7b offset 8188 File: system01.dbf (0) Block: 43268            Offsets: 8188 to 8191           Dba:0x00000000 ------------------------------------------------------------------------ 0106aa7b <32 bytes per line> BBED> sum apply Check value for File 0, Block 43268: current = 0x4fa6, required = 0x4fa6 BBED> verify DBVERIFY - 验证正在启动 FILE =system01.dbf BLOCK = 43268 DBVERIFY - 验证完成检查的总块数： 1 已处理的总块数（数据）： 1 无法处理的总块数（数据）： 0 已处理的总块数（索引）： 0 无法处理的总块数（索引）： 0 空的总块数： 0 标记为损坏的总数块： 0 汇入的块总数： 0

验证后，43268的块已经修复。用同样的方法将另外3个块修复。

再次打开数据库：
SQL> alter database open resetlogs; 数据库已更改。 SQL>

终于启动起来了。

不过还需要重建undo表空间，并立马将数据exp导出备份好。事实上，数据字典还是有所损坏，部分数据查询时还有错误，在此不再赘述。

一次异常宕机后数据库启动报ora-600[4000]错误恢复