一 环境: data gurad 主备机之间使用物理standby,数据库版本 11.2.0.1.0,aix 平台
二 故障描述:主库使用以下语句添加表空间的数据文件后,备库没有同步创建对应数据文件:
ALTER TABLESPACE OCR ADD DATAFILE '/orcldata/ocr7.dbf' SIZE 30000M
备库查询日志使用情况:SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#,
发现日志从80929到80947都没有应用
检查进程:SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY; 结果结果无MRP0进程
检查备库是否有没有传递过来的日志:SELECT * FROM V$ARCHIVE_GAP; 结果为no rows 即所有主库产生的日志
都传递到备库, 在主备库之间使用archive log list 和在归档目录下ls -l检查两边日志都一样
select DEST_ID,DEST_NAME,error from v$archive_dest;--无结果
以上情况说明:归档传输没有问题,在备库端无法应用日志。
三 解决过程:
检查参数
主库:show parameter standby_file_management
standby_file_management=auto
备库:show parameter standby_file_management
standby_file_management=manual
修改备库参数standby_file_management为auto,重启备库。
vi /orcldata/initorcl.int
standby_file_management=auto
startup pfile='/orcldata/initorcl.int' mount;
试图重新启动日志应用:
alter database recover managed standby database disconnect from session;
执行成功:database altered
但是在检查进程,和日志应用情况,发现依然没有应用
SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;
---APPLIED为no的日志依然从80929开始。
SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;-- 无MRP0进程
启动数据库到open状态,alter database open,报错如下:
Beginning standby crash recovery.
Serial Media Recovery started
Managed Standby Recovery starting Real Time Apply
Standby crash recovery aborted due to error 1111.
Errors in file /oracle/diag/rdbms/orcl2/orcl/trace/orcl_ora_16515082.t
ORA-01111: name for data file 11 is unknown - rename to correct file
ORA-01110: data file 11: '/oracle/db/dbs/UNNAMED00011'
ORA-01157: cannot identify/lock data file 11 - see DBWR trace file
ORA-01111: name for data file 11 is unknown - rename to correct file
ORA-01110: data file 11: '/oracle/db/dbs/UNNAMED00011'
Completed standby crash recovery.
发现出现异常文件:/oracle/db/dbs/UNNAMED00011,ls该文件又没有结果
$ ls -l /oracle/db/dbs/UNNAMED00011
ls: 0653-341 The file /oracle/db/dbs/UNNAMED00011 does not exist.
根据报错:ORA-01111提示,在主库检查数据文件
select name,file# from v$datafile;
发现11号文件对应的就是刚才添加的数据文件:/orcldata/ocr7.dbf
在备库同样检查:select name,file# from v$datafile;
发现11好文件对应的是:/oracle/db/dbs/UNNAMED00011
由此判断:主库添加的数据文件/orcldata/ocr7.dbf在备库被应用错误,产生了/oracle/db/dbs/UNNAMED00011文件。
根据以上情况,初步判断主库无法应用日志是因为在备库没有产生正确的文件名,视图通过rename文件名,修改正确,
重新启动备库日志应用应该没有问题,详见以下步骤:
alter system set STANDBY_FILE_MANAGEMENT=AUTO;
rename 不正常的文件/oracle/db/dbs/UNNAMED00011到正确的文件:
ALTER DATABASE RENAME FILE '/oracle/db/dbs/UNNAMED00011' TO '/orcldata/ocr7.dbf';
执行不成功:oracle提示无法找到/orcldata/ocr7.dbf,这是由于系统上没有/orcldata/ocr7.dbf文件。
在操作系统touch一个文件 /orcldata/ocr7.dbf
再次执行ALTER DATABASE RENAME FILE '/oracle/db/dbs/UNNAMED00011' TO '/orcldata/ocr7.dbf';
修改成功,再次尝试开始启动日志应用:
alter database recover managed standby database disconnect from session;
执行不成功:提示11号数据文件需要恢复。
但是在检查进程,和日志应用情况,发现依然没有应用
SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;
---APPLIED为no的日志依然从80929开始。
SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;-- 无MRP0进程
至此,陷入僵局,根据提示使用以下命令恢复数据库 recover database,recover datafile 11都失败。
后根据错误号查询metlink,根据oracle官网提示,出现这种情况的解决方案是:当备库日志应用出现异常文件
/oracle/db/dbs/UNNAMED00011
应该通过以下方法解决
-----------------------------------------------------------------------------------------------------------------------------------
ALTER SYSTEM SET standby_file_management='MANUAL' SCOPE=BOTH;
alter database create datafile '/oracle/db/dbs/UNNAMED00011' as '/orcldata/ocr7.dbf';
--即通过异常文件创建出需要应用的正确文件名。
ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=BOTH;
ALTER DATABASE RECOVER managed standby database disconnect from session
-----------------------------------------------------------------------------------------------------------------------------------
由于之前通过rename数据文件名:
ALTER DATABASE RENAME FILE '/oracle/db/dbs/UNNAMED00011' TO '/orcldata/ocr7.dbf';
再次检查,select name,file# from v$datafile;
发现11好文件对应已经是正常的文件:/orcldata/ocr7.dbf,即无法通过官网提示的解决方案操作。
试图通过rename 操作复用以前的情况:ALTER DATABASE RENAME FILE '/orcldata/ocr7.dbf'
TO ' /oracle/db/dbs/UNNAMED00011';
提示不成功,因为没有文件/oracle/db/dbs/UNNAMED00011,该文件在操作系统下本来不存在。
再尝试将/oracle/db/dbs/UNNAMED00011 rename 到另外一个文件 /orcldata/ocr8.dbf,然后
将/orcldata/ocr8.dbf rename 到 /orcldata/ocr7.dbf,方案依然失败,如下:
ALTER DATABASE RENAME FILE '/oracle/db/dbs/UNNAMED00011' TO '/orcldata/ocr8.dbf';
ALTER DATABASE RENAME FILE '/orcldata/ocr7.dbf' TO '/orcldata/ocr8.dbf';
最终解决办法,见(四 解决方案 2)
四 最终解决方案
1 当主库添加数据文件,备库无法应用,在$ORACLE_HOME/dbs目录下产生UNNAMED000N文件时候,参考以下解决方案:
启动备库到日志应用状态报错:
Sun Jul 5 23:28:23 2009
Media Recovery Log /opt/oracle/archivelog/1_47_689973859.dbf
Media Recovery Log /opt/oracle/archivelog/1_48_689973859.dbf
Media Recovery Log /opt/oracle/archivelog/1_49_689973859.dbf
WARNING: File being created with same name as in Primary
Existing file may be overwritten
File #5 added to control file as 'UNNAMED00005'.
Originally created as:
'/opt/oracle/oradata/mmstest/test01.dbf'
Recovery was unable to create the file as:
'/opt/oracle/oradata/mmstest/test01.dbf'
Errors with log /opt/oracle/archivelog/1_49_689973859.dbf
出现此种情况,进一步的告警日志可能会报出如下错误:
Sun Jul 5 23:28:28 2009
Errors in file /opt/oracle/admin/mmstest/bdump/mmstest_mrp0_32062.trc:
ORA-19502: write error on file "/opt/oracle/oradata/mmstest/test01.dbf", blockno 1024 (blocksize=8192)
ORA-27072: File I/O error
Linux Error: 9: Bad file descriptor
Additional information: 4
Additional information: 1024
Additional information: 397312
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Sun Jul 5 23:28:29 2009
Errors in file /opt/oracle/admin/mmstest/bdump/mmstest_mrp0_32062.trc:
ORA-19502: write error on file "/opt/oracle/oradata/mmstest/test01.dbf", blockno 1024 (blocksize=8192)
ORA-27072: File I/O error
Linux Error: 9: Bad file descriptor
Additional information: 4
Additional information: 1024
Additional information: 397312
以及尝试recover时可能再次出现:
Mon Jul 6 01:36:30 2009
Errors in file /opt/oracle/admin/mmstest/bdump/mmstest_mrp0_32589.trc:
ORA-01111: name for data file 5 is unknown - rename to correct file
ORA-01110: data file 5: '/opt/oracle/product/10.2.0/dbs/UNNAMED00005'
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01111: name for data file 5 is unknown - rename to correct file
ORA-01110: data file 5: '/opt/oracle/product/10.2.0/dbs/UNNAMED00005'
出现这些错误时MRP进程会停止工作,恢复中断:
Mon Jul 6 01:36:30 2009
MRP0: Background Media Recovery process shutdown (mmstest)
在修正相关的问题之后,我们可以进行如下一系列的操作来恢复这些错误:
SQL> alter system set standby_file_management=manual;
System altered.
SQL> alter database create datafile
2 '/opt/oracle/product/10.2.0/dbs/UNNAMED00005' as '/opt/oracle/oradata/mmstest/test01.dbf';
Database altered.
SQL> alter system set standby_file_management=auto;
System altered.
SQL> recover managed standby database disconnect from session;
Media recovery complete.
此时备库的恢复得以继续:
Mon Jul 6 01:41:14 2009
ALTER SYSTEM SET standby_file_management='MANUAL' SCOPE=MEMORY;
Mon Jul 6 01:42:13 2009
alter database create datafile
'/opt/oracle/product/10.2.0/dbs/UNNAMED00005' as '/opt/oracle/oradata/mmstest/test01.dbf'
Mon Jul 6 01:42:14 2009
Completed: alter database create datafile
'/opt/oracle/product/10.2.0/dbs/UNNAMED00005' as '/opt/oracle/oradata/mmstest/test01.dbf'
Mon Jul 6 01:42:26 2009
ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=MEMORY;
Mon Jul 6 01:42:40 2009
ALTER DATABASE RECOVER managed standby database disconnect from session
Mon Jul 6 01:42:40 2009
Attempt to start background Managed Standby Recovery process (mmstest)
MRP0 started with pid=16, OS id=32607
Mon Jul 6 01:42:41 2009
MRP0: Background Managed Standby Recovery process started (mmstest)
Managed Standby Recovery not using Real Time Apply
parallel recovery started with 3 processes
Media Recovery Log /opt/oracle/archivelog/1_49_689973859.dbf
Mon Jul 6 01:42:47 2009
Completed: ALTER DATABASE RECOVER managed standby database disconnect from session
Mon Jul 6 01:43:02 2009
Media Recovery Log /opt/oracle/archivelog/1_50_689973859.dbf
Mon Jul 6 01:43:17 2009
Media Recovery Log /opt/oracle/archivelog/1_51_689973859.dbf
Mon Jul 6 01:43:32 2009
Media Recovery Log /opt/oracle/archivelog/1_52_689973859.dbf
Mon Jul 6 01:43:45 2009
Media Recovery Log /opt/oracle/archivelog/1_53_689973859.dbf
正常情况下的配置及文件创建,其提示应该类似如下过程:
Mon Jul 6 01:53:28 2009
WARNING: File being created with same name as in Primary
Existing file may be overwritten
Recovery created file /opt/oracle/oradata/mmstest/wztest02.dbf
Successfully added datafile 7 to media recovery
Datafile #7: '/opt/oracle/oradata/mmstest/wztest02.dbf'
Media Recovery Log /opt/oracle/archivelog/1_80_689973859.dbf
在这个测试环境中,是由于空间不足导致的文件创建失败。
注意,在以上步骤中,如果standby_file_management设置为AUTO时,执行create命令会遇到如下错误:
SQL> alter database rename
2 file '/opt/oracle/product/10.2.0/dbs/UNNAMED00005' to '/opt/oracle/oradata/mmstest/test01.dbf';
alter database rename
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-01275: Operation RENAME is not allowed if standby file management is automatic.
2 由于报错提示一个数据文件异常,可以采取恢复备库单个数据文件的办法。
主库先备份对应数据文件的备份copy,再创建一个standby控制文件
在备库端,先使用主库产生的standby控制文件,重命名到参数文件中记录的控制文件指向:
/orcldata/control01.ctl和/orcldata/control02.ctl
启动备库到mount状态,恢复11号数据文件 recover datafile 11
具体步骤是:
主库端:
主库先备份一个数据文件的映像
rman target /
backup as copy datafile '/orcldata/ocr7.dbf' format='/orcldata/ocr7_stb';
sqlplus / as sysdba
主库生成standby控制文件:
alter database create standby controlfile as '/orcldata/control_stb.ctl';
ftp control_stb.ctl和 ocr7_stb到备库
备库:
备份备库原有的控制文件
cp /orcldata/control01.ctl /oradata
cp /orcllog/control02.ctl /oradata
使用主库生成的statdby控制文件:control_stb.ctl
cp /orcldata/control_stb.ctl /orcldata/control01.ctl
cp /orcldata/control_stb.ctl /orcllog/control02.ctl
启动数据库到mount:
startup pfile='/orcldata/initorcl.int' mount;
查询需要恢复的数据文件:
select ERROR from v$recover_file;
提示 11号文件需要恢复
recover datafile 11
提示需要使用当前的日志文件,再执行以下命令
alter database recover managed standby database using current logfile disconnect from session;
提示需要先取消应用
alter database recover managed standby database cancel
再执行recover datafile 11
--recover managed standby database;
成功,日志逐渐恢复
ok....
查看alter日志动态变化,逐个应用归档日志,约过半小时,没有应用的日志全部应用完毕
在主库切换日志,备库正常应用,至此解决问题。