很久以前某次银行生产环境环境data gurad添加表空间数据文件故障(UNNAMED00011)

一 环境:  data gurad 主备机之间使用物理standby,数据库版本 11.2.0.1.0,aix 平台



二 故障描述:主库使用以下语句添加表空间的数据文件后,备库没有同步创建对应数据文件:

        ALTER TABLESPACE OCR ADD DATAFILE '/orcldata/ocr7.dbf' SIZE 30000M 

             备库查询日志使用情况:SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#,

         发现日志从80929到80947都没有应用

            检查进程:SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY; 结果结果无MRP0进程

            检查备库是否有没有传递过来的日志:SELECT * FROM V$ARCHIVE_GAP; 结果为no rows 即所有主库产生的日志 

          都传递到备库, 在主备库之间使用archive log list 和在归档目录下ls -l检查两边日志都一样

            select  DEST_ID,DEST_NAME,error from v$archive_dest;--无结果

      以上情况说明:归档传输没有问题,在备库端无法应用日志。

三 解决过程:


   检查参数

   主库:show parameter standby_file_management  

    standby_file_management=auto


   备库:show parameter standby_file_management  


    standby_file_management=manual


   修改备库参数standby_file_management为auto,重启备库。

   vi  /orcldata/initorcl.int

    standby_file_management=auto

   startup pfile='/orcldata/initorcl.int' mount;

  试图重新启动日志应用:

   alter database recover managed standby database disconnect from session;

   执行成功:database altered

   但是在检查进程,和日志应用情况,发现依然没有应用

   SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;

   ---APPLIED为no的日志依然从80929开始。


   SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;--  无MRP0进程

   

  启动数据库到open状态,alter database open,报错如下:


Beginning standby crash recovery.

Serial Media Recovery started

Managed Standby Recovery starting Real Time Apply

Standby crash recovery aborted due to error 1111.

Errors in file /oracle/diag/rdbms/orcl2/orcl/trace/orcl_ora_16515082.t

ORA-01111: name for data file 11 is unknown - rename to correct file

ORA-01110: data file 11: '/oracle/db/dbs/UNNAMED00011'

ORA-01157: cannot identify/lock data file 11 - see DBWR trace file

ORA-01111: name for data file 11 is unknown - rename to correct file

ORA-01110: data file 11: '/oracle/db/dbs/UNNAMED00011'

Completed standby crash recovery.




发现出现异常文件:/oracle/db/dbs/UNNAMED00011,ls该文件又没有结果

$ ls -l /oracle/db/dbs/UNNAMED00011

ls: 0653-341 The file /oracle/db/dbs/UNNAMED00011 does not exist.


根据报错:ORA-01111提示,在主库检查数据文件

select name,file# from v$datafile;

发现11号文件对应的就是刚才添加的数据文件:/orcldata/ocr7.dbf



在备库同样检查:select name,file# from v$datafile;


发现11好文件对应的是:/oracle/db/dbs/UNNAMED00011


由此判断:主库添加的数据文件/orcldata/ocr7.dbf在备库被应用错误,产生了/oracle/db/dbs/UNNAMED00011文件。


根据以上情况,初步判断主库无法应用日志是因为在备库没有产生正确的文件名,视图通过rename文件名,修改正确,

 重新启动备库日志应用应该没有问题,详见以下步骤:

alter system set STANDBY_FILE_MANAGEMENT=AUTO;


rename 不正常的文件/oracle/db/dbs/UNNAMED00011到正确的文件:

ALTER DATABASE RENAME FILE '/oracle/db/dbs/UNNAMED00011'  TO '/orcldata/ocr7.dbf';

执行不成功:oracle提示无法找到/orcldata/ocr7.dbf,这是由于系统上没有/orcldata/ocr7.dbf文件。

在操作系统touch一个文件 /orcldata/ocr7.dbf


再次执行ALTER DATABASE RENAME FILE '/oracle/db/dbs/UNNAMED00011'  TO '/orcldata/ocr7.dbf';


修改成功,再次尝试开始启动日志应用:


alter database recover managed standby database disconnect from session;

   执行不成功:提示11号数据文件需要恢复。

   但是在检查进程,和日志应用情况,发现依然没有应用

   SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;

  ---APPLIED为no的日志依然从80929开始。

   SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;--  无MRP0进程

   

至此,陷入僵局,根据提示使用以下命令恢复数据库 recover database,recover datafile 11都失败。


后根据错误号查询metlink,根据oracle官网提示,出现这种情况的解决方案是:当备库日志应用出现异常文件

 /oracle/db/dbs/UNNAMED00011


应该通过以下方法解决 

-----------------------------------------------------------------------------------------------------------------------------------

ALTER SYSTEM SET standby_file_management='MANUAL' SCOPE=BOTH;

alter database create datafile  '/oracle/db/dbs/UNNAMED00011' as '/orcldata/ocr7.dbf';

 --即通过异常文件创建出需要应用的正确文件名。

ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=BOTH;

ALTER DATABASE RECOVER  managed standby database disconnect from session 


-----------------------------------------------------------------------------------------------------------------------------------



由于之前通过rename数据文件名:

ALTER DATABASE RENAME FILE '/oracle/db/dbs/UNNAMED00011'  TO '/orcldata/ocr7.dbf';

再次检查,select name,file# from v$datafile;

发现11好文件对应已经是正常的文件:/orcldata/ocr7.dbf,即无法通过官网提示的解决方案操作。


试图通过rename 操作复用以前的情况:ALTER DATABASE RENAME FILE '/orcldata/ocr7.dbf'  

TO ' /oracle/db/dbs/UNNAMED00011';

提示不成功,因为没有文件/oracle/db/dbs/UNNAMED00011,该文件在操作系统下本来不存在。


再尝试将/oracle/db/dbs/UNNAMED00011 rename 到另外一个文件 /orcldata/ocr8.dbf,然后

将/orcldata/ocr8.dbf rename 到 /orcldata/ocr7.dbf,方案依然失败,如下:


ALTER DATABASE RENAME FILE '/oracle/db/dbs/UNNAMED00011'  TO '/orcldata/ocr8.dbf';

ALTER DATABASE RENAME FILE '/orcldata/ocr7.dbf'  TO '/orcldata/ocr8.dbf';


最终解决办法,见(四 解决方案 2)


四 最终解决方案


1 当主库添加数据文件,备库无法应用,在$ORACLE_HOME/dbs目录下产生UNNAMED000N文件时候,参考以下解决方案:


启动备库到日志应用状态报错:


Sun Jul 5 23:28:23 2009

Media Recovery Log /opt/oracle/archivelog/1_47_689973859.dbf

Media Recovery Log /opt/oracle/archivelog/1_48_689973859.dbf

Media Recovery Log /opt/oracle/archivelog/1_49_689973859.dbf

WARNING: File being created with same name as in Primary

Existing file may be overwritten

File #5 added to control file as 'UNNAMED00005'.

Originally created as:

'/opt/oracle/oradata/mmstest/test01.dbf'

Recovery was unable to create the file as:

'/opt/oracle/oradata/mmstest/test01.dbf'

Errors with log /opt/oracle/archivelog/1_49_689973859.dbf


出现此种情况,进一步的告警日志可能会报出如下错误:



Sun Jul 5 23:28:28 2009

Errors in file /opt/oracle/admin/mmstest/bdump/mmstest_mrp0_32062.trc:

ORA-19502: write error on file "/opt/oracle/oradata/mmstest/test01.dbf", blockno 1024 (blocksize=8192)

ORA-27072: File I/O error

Linux Error: 9: Bad file descriptor

Additional information: 4

Additional information: 1024

Additional information: 397312

Some recovered datafiles maybe left media fuzzy

Media recovery may continue but open resetlogs may fail

Sun Jul 5 23:28:29 2009

Errors in file /opt/oracle/admin/mmstest/bdump/mmstest_mrp0_32062.trc:

ORA-19502: write error on file "/opt/oracle/oradata/mmstest/test01.dbf", blockno 1024 (blocksize=8192)

ORA-27072: File I/O error

Linux Error: 9: Bad file descriptor

Additional information: 4

Additional information: 1024

Additional information: 397312

以及尝试recover时可能再次出现:


Mon Jul 6 01:36:30 2009

Errors in file /opt/oracle/admin/mmstest/bdump/mmstest_mrp0_32589.trc:

ORA-01111: name for data file 5 is unknown - rename to correct file

ORA-01110: data file 5: '/opt/oracle/product/10.2.0/dbs/UNNAMED00005'

ORA-01157: cannot identify/lock data file 5 - see DBWR trace file

ORA-01111: name for data file 5 is unknown - rename to correct file

ORA-01110: data file 5: '/opt/oracle/product/10.2.0/dbs/UNNAMED00005'


出现这些错误时MRP进程会停止工作,恢复中断:


Mon Jul 6 01:36:30 2009

MRP0: Background Media Recovery process shutdown (mmstest)


在修正相关的问题之后,我们可以进行如下一系列的操作来恢复这些错误:


SQL> alter system set standby_file_management=manual;


System altered.


SQL> alter database create datafile 

2 '/opt/oracle/product/10.2.0/dbs/UNNAMED00005' as '/opt/oracle/oradata/mmstest/test01.dbf';


Database altered.


SQL> alter system set standby_file_management=auto;


System altered.


SQL> recover managed standby database disconnect from session;

Media recovery complete.


此时备库的恢复得以继续:

Mon Jul 6 01:41:14 2009

ALTER SYSTEM SET standby_file_management='MANUAL' SCOPE=MEMORY;

Mon Jul 6 01:42:13 2009

alter database create datafile

'/opt/oracle/product/10.2.0/dbs/UNNAMED00005' as '/opt/oracle/oradata/mmstest/test01.dbf'

Mon Jul 6 01:42:14 2009

Completed: alter database create datafile

'/opt/oracle/product/10.2.0/dbs/UNNAMED00005' as '/opt/oracle/oradata/mmstest/test01.dbf'

Mon Jul 6 01:42:26 2009

ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=MEMORY;

Mon Jul 6 01:42:40 2009

ALTER DATABASE RECOVER managed standby database disconnect from session

Mon Jul 6 01:42:40 2009

Attempt to start background Managed Standby Recovery process (mmstest)

MRP0 started with pid=16, OS id=32607

Mon Jul 6 01:42:41 2009

MRP0: Background Managed Standby Recovery process started (mmstest)

Managed Standby Recovery not using Real Time Apply

parallel recovery started with 3 processes

Media Recovery Log /opt/oracle/archivelog/1_49_689973859.dbf

Mon Jul 6 01:42:47 2009

Completed: ALTER DATABASE RECOVER managed standby database disconnect from session

Mon Jul 6 01:43:02 2009

Media Recovery Log /opt/oracle/archivelog/1_50_689973859.dbf

Mon Jul 6 01:43:17 2009

Media Recovery Log /opt/oracle/archivelog/1_51_689973859.dbf

Mon Jul 6 01:43:32 2009

Media Recovery Log /opt/oracle/archivelog/1_52_689973859.dbf

Mon Jul 6 01:43:45 2009

Media Recovery Log /opt/oracle/archivelog/1_53_689973859.dbf


正常情况下的配置及文件创建,其提示应该类似如下过程:

Mon Jul 6 01:53:28 2009

WARNING: File being created with same name as in Primary

Existing file may be overwritten

Recovery created file /opt/oracle/oradata/mmstest/wztest02.dbf

Successfully added datafile 7 to media recovery

Datafile #7: '/opt/oracle/oradata/mmstest/wztest02.dbf'

Media Recovery Log /opt/oracle/archivelog/1_80_689973859.dbf


在这个测试环境中,是由于空间不足导致的文件创建失败。


注意,在以上步骤中,如果standby_file_management设置为AUTO时,执行create命令会遇到如下错误:

SQL> alter database rename 

2 file '/opt/oracle/product/10.2.0/dbs/UNNAMED00005' to '/opt/oracle/oradata/mmstest/test01.dbf';


alter database rename

*

ERROR at line 1:

ORA-01511: error in renaming log/data files

ORA-01275: Operation RENAME is not allowed if standby file management is automatic.


2 由于报错提示一个数据文件异常,可以采取恢复备库单个数据文件的办法。

  主库先备份对应数据文件的备份copy,再创建一个standby控制文件

              在备库端,先使用主库产生的standby控制文件,重命名到参数文件中记录的控制文件指向:

           /orcldata/control01.ctl和/orcldata/control02.ctl

              启动备库到mount状态,恢复11号数据文件 recover datafile 11

            

具体步骤是:


主库端:

主库先备份一个数据文件的映像


rman target /

backup as copy datafile '/orcldata/ocr7.dbf' format='/orcldata/ocr7_stb';


sqlplus / as sysdba


主库生成standby控制文件:


alter database create standby controlfile as '/orcldata/control_stb.ctl';


ftp control_stb.ctl和 ocr7_stb到备库


备库:


备份备库原有的控制文件

cp /orcldata/control01.ctl /oradata 

cp /orcllog/control02.ctl  /oradata 

使用主库生成的statdby控制文件:control_stb.ctl 


cp /orcldata/control_stb.ctl /orcldata/control01.ctl

cp /orcldata/control_stb.ctl  /orcllog/control02.ctl


启动数据库到mount:

startup pfile='/orcldata/initorcl.int' mount;


查询需要恢复的数据文件:

select  ERROR  from  v$recover_file;

提示 11号文件需要恢复


recover datafile 11


提示需要使用当前的日志文件,再执行以下命令


alter database recover managed standby database using current logfile disconnect from session;

 

提示需要先取消应用

 

alter database recover managed standby database cancel


再执行recover datafile 11


--recover managed standby database; 


成功,日志逐渐恢复

ok....



查看alter日志动态变化,逐个应用归档日志,约过半小时,没有应用的日志全部应用完毕


在主库切换日志,备库正常应用,至此解决问题。


请使用浏览器的分享功能分享到微信等