1 系统概述

业务系统需要更换SSD存储设备，需要将所有的磁盘替换掉。更换数据磁盘、OCR 、表决磁盘后，在节点1重启集群进行验证，发现数据库实例的资源状态为OFFLINE，使用集群命令servctl也不能启动，报ORA-15017: diskgroup "OCR" cannot be mounted

2 启动数据库，查看集群状态，发现数据库资源状态为offline

启动数据库

SYS@test1 >startup

ORACLE instance started.

Total System Global Area 2.3517E+11 bytes

Fixed Size 2267464 bytes

Variable Size 5.0466E+10 bytes

Database Buffers 1.8415E+11 bytes

Redo Buffers 553766912 bytes

Database mounted.

Database opened.

查看数据库是否启动

[grid@testdb1 ~]$ ps -ef|grep smon

root 13667 1 8 Mar26 ? 12:56:12 /opt/app/11.2.0.4/grid/bin/osysmond.bin

grid 15764 1 0 Mar26 ? 00:00:17 asm_smon_+ASM1

grid 17875 1 0 Mar26 ? 00:26:00 ora_smon_test1

grid 163774 163708 0 15:22 pts/1 00:00:00 grep smon

查看集群状态，发现节点1数据库实例的状态不对

ora.test.db

1 ONLINE OFFLINE Instance Shutdown

2 ONLINE ONLINE testdb2 Open

3 ONLINE ONLINE testdb3 Open

3 关闭数据库，使用集群命令进行启动，发现报错的原因为没有OCR磁盘组，此磁盘组在1小时前已经进行替换和删除。

SYS@test1 >shutdown immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

SYS@test1 >exit

试着用集群命令启动，发现报错

[grid@testdb1 ~]$ srvctl start instance -d test -i test1

PRCR-1013 : Failed to start resource ora.test.db

PRCR-1064 : Failed to start resource ora.test.db on node testdb1

CRS-5017: The resource action "ora.OCR.dg start" encountered the following error:

ORA-15032: not all alterations performed

ORA-15017: diskgroup "OCR" cannot be mounted

ORA-15040: diskgroup is incomplete

. For details refer to "(:CLSN00107:)" in "/opt/app/11.2.0.4/grid/log/testdb1/agent/crsd/oraagent_grid//oraagent_grid.log".

CRS-2674: Start of 'ora.OCR.dg' on 'testdb1' failed

4 查看报错信息的trace文件，还是在寻找OCR磁盘组

2022-03-26 02:23:40.443: [ora.asm][1424000768]{1:20237:2} [check] AsmProxyAgent::createAsmOcrKeys host: testdb1 oracleSid_lc:

+asm1 oracleSid_uc: +ASM1 tempHost1: testdb1.+asm1

[ OCRAPI][1424000768]a_is_valid_user_group: User [oracle] does not match with init user

[ OCRUTL][1424000768]u_fill_errorbuf: Error Info : [User [oracle] does not match with initialized user]

[ OCRAPI][1424000768]a_create_n_set_key:THE SECURITY ATTRIBUTE PASSED is invalid

2022-03-26 02:23:40.454: [ OCRAPI][1424000768]a_batch_execute: Check batch exec failed [5]

2022-03-26 02:23:40.454: [ OCRAPI][1424000768]a_batch_execute: Failed to execute batch. Return [5]

2022-03-26 02:23:40.454: [ora.asm][1424000768]{1:20237:2} [check] OcrBatch Execute failed 5

2022-03-26 02:23:40.455: [ora.asm][1424000768]{1:20237:2} [check] AsmProxyAgent::createAsmOcrKeys failed

[ OCRUTL][1424000768]u_check_tag: Tag check failed on 0th comparison for [PROCRKEY]. Ptr passed was [40446f0] Byte was [0]

2022-03-26 02:23:40.455: [ora.asm][1424000768]{1:20237:2} [check] AsmProxyAgent::createAsmOcrKeys exit }

2022-03-26 02:23:40.455: [ora.asm][1424000768]{1:20237:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0

2022-03-26 02:23:40.477: [ora.OCR.dg][787871488]{1:20237:207} [start] ORA-15032: not all alterations performed

ORA-15017: diskgroup "OCR" cannot be mounted

ORA-15040: diskgroup is incomplete

2022-03-26 02:23:40.478: [ USRTHRD][1424000768]{1:20237:2} CrsCmd::ClscrsCmdData::destroy

2022-03-26 02:23:40.478: [ora.OCR.dg][787871488]{1:20237:207} [start] clsnUtils::error Exception type=2 string=

CRS-5017: The resource action "ora.OCR.dg start" encountered the following error:

ORA-15032: not all alterations performed

ORA-15017: diskgroup "OCR" cannot be mounted

ORA-15040: diskgroup is incomplete

. For details refer to "(:CLSN00107:)" in "/opt/app/11.2.0.4/grid/log/testdb1/agent/crsd/oraagent_grid//oraagent_grid.log".

5 原因分析

根据如上信息，可以确定集群需要OCR磁盘组，但此磁盘组由于更换存储设备，已经进行了删除，迁移到新的磁盘组下。故判断是由于将数据库相关信息注册到OCR磁盘组下，导致集群状态不正常及不能使用集群命令启动数据库。查看20分钟官方文档，未找到合适的官方文档，经过思考，认为就是集群注册信息在老的OCR磁盘组下，故可以将数据库及实例的集群注册信息删除，并重建，估计就恢复正常了。经测试，一切OK。

5.1 关闭所有数据库实例

5.2 使用如下命令删除数据库资源的相关信息

删除：

cd /opt/app/11.2.0.4/grid/bin

./srvctl remove database -d test

./srvctl remove instance -d test -i test1

./srvctl remove instance -d test -i test2

./srvctl remove instance -d test -i test3

5.３使用如下命令添加数据库资源的相关信息

添加：

srvctl add database -d test -o $ORACLE_HOME

srvctl add instance -d test -i test1 -n testdb1

srvctl add instance -d test -i test2 -n testdb2

srvctl add instance -d test -i test3 -n testdb3

5.４使用集群命令启动数据库，并查看集群信息，发现集群信息已经恢复正常

ora.test.db

1 ONLINE ONLINE testdb1 Open

2 ONLINE ONLINE testdb2 Open

3 ONLINE ONLINE testdb3 Open

替换OCR和表决磁盘后，重启集群，数据库资源的集群状态为OFFLINE