1 系统概述
业务系统需要更换SSD存储设备,需要将所有的磁盘替换掉。更换数据磁盘、OCR 、表决磁盘后,在节点1重启集群进行验证,发现数据库实例的资源状态为OFFLINE,使用集群命令servctl也不能启动,报ORA-15017: diskgroup "OCR" cannot be mounted
2 启动数据库,查看集群状态,发现数据库资源状态为offline
启动数据库
SYS@test1 >startup
ORACLE instance started.
Total System Global Area 2.3517E+11 bytes
Fixed Size 2267464 bytes
Variable Size 5.0466E+10 bytes
Database Buffers 1.8415E+11 bytes
Redo Buffers 553766912 bytes
Database mounted.
Database opened.
查看数据库是否启动
[grid@testdb1 ~]$ ps -ef|grep smon
root 13667 1 8 Mar26 ? 12:56:12 /opt/app/11.2.0.4/grid/bin/osysmond.bin
grid 15764 1 0 Mar26 ? 00:00:17 asm_smon_+ASM1
grid 17875 1 0 Mar26 ? 00:26:00 ora_smon_test1
grid 163774 163708 0 15:22 pts/1 00:00:00 grep smon
查看集群状态,发现节点1数据库实例的状态不对
ora.test.db
1 ONLINE OFFLINE Instance Shutdown
2 ONLINE ONLINE testdb2 Open
3 ONLINE ONLINE testdb3 Open
3 关闭数据库,使用集群命令进行启动,发现报错的原因为没有OCR磁盘组,此磁盘组在1小时前已经进行替换和删除。
SYS@test1 >shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SYS@test1 >exit
试着用集群命令启动,发现报错
[grid@testdb1 ~]$ srvctl start instance -d test -i test1
PRCR-1013 : Failed to start resource ora.test.db
PRCR-1064 : Failed to start resource ora.test.db on node testdb1
CRS-5017: The resource action "ora.OCR.dg start" encountered the following error:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "OCR" cannot be mounted
ORA-15040: diskgroup is incomplete
. For details refer to "(:CLSN00107:)" in "/opt/app/11.2.0.4/grid/log/testdb1/agent/crsd/oraagent_grid//oraagent_grid.log".
CRS-2674: Start of 'ora.OCR.dg' on 'testdb1' failed
4 查看报错信息的trace文件,还是在寻找OCR磁盘组
2022-03-26 02:23:40.443: [ora.asm][1424000768]{1:20237:2} [check] AsmProxyAgent::createAsmOcrKeys host: testdb1 oracleSid_lc:
+asm1 oracleSid_uc: +ASM1 tempHost1: testdb1.+asm1
[ OCRAPI][1424000768]a_is_valid_user_group: User [oracle] does not match with init user
[ OCRUTL][1424000768]u_fill_errorbuf: Error Info : [User [oracle] does not match with initialized user]
[ OCRAPI][1424000768]a_create_n_set_key:THE SECURITY ATTRIBUTE PASSED is invalid
2022-03-26 02:23:40.454: [ OCRAPI][1424000768]a_batch_execute: Check batch exec failed [5]
2022-03-26 02:23:40.454: [ OCRAPI][1424000768]a_batch_execute: Failed to execute batch. Return [5]
2022-03-26 02:23:40.454: [ora.asm][1424000768]{1:20237:2} [check] OcrBatch Execute failed 5
2022-03-26 02:23:40.455: [ora.asm][1424000768]{1:20237:2} [check] AsmProxyAgent::createAsmOcrKeys failed
[ OCRUTL][1424000768]u_check_tag: Tag check failed on 0th comparison for [PROCRKEY]. Ptr passed was [40446f0] Byte was [0]
2022-03-26 02:23:40.455: [ora.asm][1424000768]{1:20237:2} [check] AsmProxyAgent::createAsmOcrKeys exit }
2022-03-26 02:23:40.455: [ora.asm][1424000768]{1:20237:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2022-03-26 02:23:40.477: [ora.OCR.dg][787871488]{1:20237:207} [start] ORA-15032: not all alterations performed
ORA-15017: diskgroup "OCR" cannot be mounted
ORA-15040: diskgroup is incomplete
2022-03-26 02:23:40.478: [ USRTHRD][1424000768]{1:20237:2} CrsCmd::ClscrsCmdData::destroy
2022-03-26 02:23:40.478: [ora.OCR.dg][787871488]{1:20237:207} [start] clsnUtils::error Exception type=2 string=
CRS-5017: The resource action "ora.OCR.dg start" encountered the following error:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "OCR" cannot be mounted
ORA-15040: diskgroup is incomplete
. For details refer to "(:CLSN00107:)" in "/opt/app/11.2.0.4/grid/log/testdb1/agent/crsd/oraagent_grid//oraagent_grid.log".
5 原因分析
根据如上信息,可以确定集群需要OCR磁盘组,但此磁盘组由于更换存储设备,已经进行了删除,迁移到新的磁盘组下。故判断是由于将数据库相关信息注册到OCR磁盘组下,导致集群状态不正常及不能使用集群命令启动数据库。查看20分钟官方文档,未找到合适的官方文档,经过思考,认为就是集群注册信息在老的OCR磁盘组下,故可以将数据库及实例的集群注册信息删除,并重建,估计就恢复正常了。经测试,一切OK。
5.1 关闭所有数据库实例
5.2 使用如下命令删除数据库资源的相关信息
删除:
cd /opt/app/11.2.0.4/grid/bin
./srvctl remove database -d test
./srvctl remove instance -d test -i test1
./srvctl remove instance -d test -i test2
./srvctl remove instance -d test -i test3
5.3 使用如下命令添加数据库资源的相关信息
添加:
srvctl add database -d test -o $ORACLE_HOME
srvctl add instance -d test -i test1 -n testdb1
srvctl add instance -d test -i test2 -n testdb2
srvctl add instance -d test -i test3 -n testdb3
5.4 使用集群命令启动数据库,并查看集群信息,发现集群信息已经恢复正常
ora.test.db
1 ONLINE ONLINE testdb1 Open
2 ONLINE ONLINE testdb2 Open
3 ONLINE ONLINE testdb3 Open