一个客户的11.2 RAC for Linux X86-64环境停电后出现了故障，RAC环境无法自动启动。

这一篇介绍重装后出现的新问题。

11.2 RAC自动启动报错诊断：http://yangtingkun.itpub.net/post/468/518656

11.2 RAC自动启动报错诊断（二）：http://yangtingkun.itpub.net/post/468/518805

11.2 RAC自动启动报错诊断（三）：http://yangtingkun.itpub.net/post/468/518834

上次诊断了导致问题的原因在于操作系统级别报了大量的I/O相关的错误，客户最终重建了整个RAC环境。在重装的过程中，I/O问题依然存在，最后配合硬件工程师检查存储发现：存储链路断了一条，且其中一个控制器的关键服务没有启动。

文件解决后，I/O错误没有再次出现，RAC环境也成功搭建，本以为问题圆满解决，没有想到这个RAC环境最近又频繁出现自动重启的现象。

检查alert文件发现错误：

pMon Jun 20 22:50:49 2011
NOTE: ASMB terminating
Errors in file /oracle/diag/rdbms/spsp/SPSP1/trace/SPSP1_asmb_15270.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 2 Serial number: 3
Errors in file /oracle/diag/rdbms/spsp/SPSP1/trace/SPSP1_asmb_15270.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 2 Serial number: 3
ASMB (ospid: 15270): terminating the instance due to error 15064
Mon Jun 20 22:50:51 2011
ORA-1092 : opitsk aborting process
Termination issued to instance processes. Waiting for the processes to exit
Mon Jun 20 22:50:59 2011
Instance termination failed to kill one or more processes
Instance terminated by ASMB, pid = 15270

显然是ASM实例出现了故障，导致数据库实例和ASM实例通信中断，检查ASM实例告警信息：

Mon Jun 20 22:50:49 2011
NOTE: ASMB process exiting, either shutdown is in progress
NOTE: or foreground connected to ASMB was killed.
Mon Jun 20 22:50:49 2011
NOTE: client exited [14631]
NOTE: force a map free for map id 2
Mon Jun 20 22:50:50 2011
Received an instance abort message from instance 2
Mon Jun 20 22:50:50 2011
Received an instance abort message from instance 2
Please check instance 2 alert and LMON trace files for detail.
Please check instance 2 alert and LMON trace files for detail.
LMS0 (ospid: 14565): terminating the instance due to error 481
Mon Jun 20 22:50:51 2011
ORA-1092 : opitsk aborting process
Mon Jun 20 22:50:51 2011
License high water mark = 18
Termination issued to instance processes. Waiting for the processes to exit
Instance termination failed to kill one or more processes
Instance terminated by LMS0, pid = 14565
USER (ospid: 3069): terminating the instance
Mon Jun 20 22:51:03 2011
Termination issued to instance processes. Waiting for the processes to exit
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 3069

看来又是CLUSTER的问题，导致ASM实例的关闭，继续检查CLUSTER的相关日志：

2011-06-20 22:49:07.305
[cssd(14245)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file ORCL:VOL will be considered not functional in 99320 milliseconds
2011-06-20 22:49:57.441
[cssd(14245)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file ORCL:VOL will be considered not functional in 49190 milliseconds
2011-06-20 22:50:27.505
[cssd(14245)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file ORCL:VOL will be considered not functional in 19130 milliseconds
2011-06-20 22:50:47.545
[cssd(14245)]CRS-1604:CSSD voting file is offline: ORCL:VOL; details at (:CSSNM00058:) in /oracle/product/11g/grid/log/oracle-01/cssd/ocssd.log.
2011-06-20 22:50:47.546
[cssd(14245)]CRS-1606:The number of voting files available, 0, is less than the minimum number of voting files required, 1, resulting in CSSD termination to ensure data integrity; details at (:CSSNM00018:) in /oracle/product/11g/grid/log/oracle-01/cssd/ocssd.log
2011-06-20 22:50:47.546
[cssd(14245)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /oracle/product/11g/grid/log/oracle-01/cssd/ocssd.log
2011-06-20 22:50:47.581
[cssd(14245)]CRS-1652:Starting clean up of CRSD resources.
2011-06-20 22:50:48.824
[/oracle/product/11g/grid/bin/oraagent.bin(14753)]CRS-5016:Process "/oracle/product/11g/grid/opmn/bin/onsctli" spawned by agent "/oracle/product/11g/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:49.435
[/oracle/product/11g/grid/bin/oraagent.bin(14753)]CRS-5016:Process "/oracle/product/11g/grid/bin/lsnrctl" spawned by agent "/oracle/product/11g/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:49.441
[/oracle/product/11g/grid/bin/oraagent.bin(14753)]CRS-5016:Process "/oracle/product/11g/grid/bin/lsnrctl" spawned by agent "/oracle/product/11g/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:49.445
[cssd(14245)]CRS-1654:Clean up of CRSD resources finished successfully.
2011-06-20 22:50:49.446
[cssd(14245)]CRS-1655:CSSD on node oracle-01 detected a problem and started to shutdown.
2011-06-20 22:50:49.459
[/oracle/product/11g/grid/bin/orarootagent.bin(14756)]CRS-5822:Agent '/oracle/product/11g/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:1:8} in /oracle/product/11g/grid/log/oracle-01/agent/crsd/orarootagent_root/orarootagent_root.log.
2011-06-20 22:50:49.460
[/oracle/product/11g/grid/bin/oraagent.bin(15085)]CRS-5822:Agent '/oracle/product/11g/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:5:4} in /oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2011-06-20 22:50:49.472
[/oracle/product/11g/grid/bin/oraagent.bin(14753)]CRS-5822:Agent '/oracle/product/11g/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:2:7} in /oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_grid/oraagent_grid.log.
2011-06-20 22:50:49.611
[cssd(14245)]CRS-1660:The CSS daemon shutdown has completed
2011-06-20 22:50:49.622
[ohasd(14459)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle-01'.
2011-06-20 22:50:50.648
[crsd(3045)]CRS-0805:Cluster Ready Service aborted due to failure to communicate with Cluster Synchronization Service with error [3]. Details at (:CRSD00109:) in /oracle/product/11g/grid/log/oracle-01/crsd/crsd.log.
2011-06-20 22:50:50.932
[ohasd(14459)]CRS-2765:Resource 'ora.diskmon' has failed on server 'oracle-01'.
2011-06-20 22:50:50.944
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:50.946
[ohasd(14459)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle-01'.
2011-06-20 22:50:51.127
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:51.129
[ohasd(14459)]CRS-2765:Resource 'ora.asm' has failed on server 'oracle-01'.
2011-06-20 22:50:51.309
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:51.455
[ohasd(14459)]CRS-2765:Resource 'ora.cssd' has failed on server 'oracle-01'.
2011-06-20 22:50:51.491
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:51.673
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:52.660
[crsd(3055)]CRS-0805:Cluster Ready Service aborted due to failure to communicate with Cluster Synchronization Service with error [3]. Details at (:CRSD00109:) in /oracle/product/11g/grid/log/oracle-01/crsd/crsd.log.
2011-06-20 22:50:52.984
[cssd(3071)]CRS-1713:CSSD daemon is started in clustered mode
2011-06-20 22:50:53.664
[ohasd(14459)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle-01'.
2011-06-20 22:50:58.936
[ohasd(14459)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'oracle-01'.

导致ASM实例关闭的原因是VOTE盘读取超时，而导致超时的原因多半又是I/O相关的问题，继续检查ocssd.log：

2011-06-20 22:47:35.621: [ CSSD][1095420224]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 8010 msecs
2011-06-20 22:47:37.749: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
.
.
.
2011-06-20 22:47:43.633: [ CSSD][1099643200]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 16020 msecs
2011-06-20 22:47:46.767: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
.
.
.
2011-06-20 22:47:52.655: [ CSSD][1095420224]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 25040 msecs
2011-06-20 22:47:54.783: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
.
.
.
2011-06-20 22:48:00.671: [ CSSD][1095420224]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 33050 msecs
2011-06-20 22:48:04.803: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
2011-06-20 22:48:04.803: [ CSSD][1092266304]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-20 22:48:08.687: [ CSSD][1095420224]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 41070 msecs
2011-06-20 22:48:08.811: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
.
.
.
2011-06-20 22:48:16.703: [ CSSD][1095420224]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 49090 msecs
2011-06-20 22:48:16.827: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
.
.
.
2011-06-20 22:48:24.248: [ CSSD][1099643200]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 56630 msecs
2011-06-20 22:48:26.847: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
.
.
.
2011-06-20 22:48:33.638: [ CSSD][1099643200]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 66020 msecs
2011-06-20 22:48:35.865: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
.
.
.
2011-06-20 22:50:38.988: [ CSSD][1095420224]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 191350 msecs
2011-06-20 22:50:42.117: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
2011-06-20 22:50:42.117: [ CSSD][1092266304]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-20 22:50:47.003: [ CSSD][1095420224]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 199360 msecs
2011-06-20 22:50:47.127: [ CSSD][1092266304]clssnmSendingThread: sending status msg to all nodes
2011-06-20 22:50:47.128: [ CSSD][1092266304]clssnmSendingThread: sent 5 status msgs to all nodes
2011-06-20 22:50:47.545: [ CSSD][1085339968](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for 200900 ms for voting file ORCL:VOL)
2011-06-20 22:50:47.546: [ CSSD][1085339968]clssnmvDiskAvailabilityChange: voting file ORCL:VOL now offline
2011-06-20 22:50:47.546: [ CSSD][1085339968](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1
2011-06-20 22:50:47.546: [ SKGFD][1082186048]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab000e060 for disk :ORCL:VOL:

2011-06-20 22:50:47.546: [ CSSD][1085339968]###################################
2011-06-20 22:50:47.546: [ CSSD][1085339968]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
2011-06-20 22:50:47.546: [ CSSD][1085339968]###################################
2011-06-20 22:50:47.546: [ SKGFD][1083763008]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab0056990 for disk :ORCL:VOL:

2011-06-20 22:50:47.546: [ CSSD][1085339968](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2011-06-20 22:50:47.546: [ CSSD][1085339968]

----- Call Stack Trace -----
2011-06-20 22:50:47.546: [ CSSD][1085339968]calling call entry argument values in hex
2011-06-20 22:50:47.546: [ CSSD][1085339968]location type point (? means dubious value)
2011-06-20 22:50:47.546: [ CSSD][1085339968]-------------------- -------- -------------------- ----------------------------
2011-06-20 22:50:47.552: [ CSSD][1085339968]clssscExit()+726 call kgdsdst() 000000000 ? 000000000 ?
2011-06-20 22:50:47.552: [ CSSD][1085339968] 040B0A568 ? 000000001 ?
2011-06-20 22:50:47.552: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]clssnmvDiskCheck()+ call clssscExit() 005237F80 ? 000000002 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]3220 040B0A568 ? 000000001 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]clssnmvDiskPingMoni call clssnmvDiskCheck() 005237F80 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]torThread()+404 040B0F0B8 ? 000000000 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]clssscthrdmain()+25 call clssnmvDiskPingMoni 005237F80 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]3 torThread() 040B0F0B8 ? 000000000 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]start_thread()+199 call clssscthrdmain() 005237F80 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 2AAAAC074780 ? 000000000 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]clone()+109 call start_thread() 040B0F940 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 2AAAAC074780 ? 000000000 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]0000000000000000 call clone() 040B0F940 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 2AAAAC074780 ? 000000000 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?

可以看到，导致问题产生的原因是由于超过阈值20秒没有办法获取VOT盘，导致节点被强制重启。

检查对应时刻的操作系统信息：

Jun 20 22:47:48 Oracle-01 kernel: sd 6:0:2:4: SCSI error: return code = 0x00020000
Jun 20 22:47:48 Oracle-01 kernel: end_request: I/O error, dev sdq, sector 142754
Jun 20 22:47:48 Oracle-01 kernel: device-mapper: multipath: Failing path 65:0.
Jun 20 22:47:48 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:47:48 Oracle-01 multipathd: dm-5: devmap already registered
Jun 20 22:47:48 Oracle-01 multipathd: 65:0: mark as failed
Jun 20 22:47:48 Oracle-01 multipathd: mpath3: remaining active paths: 3
Jun 20 22:47:53 Oracle-01 multipathd: sdq: readsector0 checker reports path is up
Jun 20 22:47:53 Oracle-01 multipathd: 65:0: reinstated
Jun 20 22:47:53 Oracle-01 multipathd: mpath3: remaining active paths: 4
Jun 20 22:47:53 Oracle-01 multipathd: sds: readsector0 checker reports path is down
Jun 20 22:47:53 Oracle-01 multipathd: checker failed path 65:32 in map mpath5
Jun 20 22:47:53 Oracle-01 multipathd: mpath5: remaining active paths: 3
Jun 20 22:47:53 Oracle-01 kernel: device-mapper: multipath: Failing path 65:32.
Jun 20 22:47:53 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:47:53 Oracle-01 multipathd: dm-5: devmap already registered
Jun 20 22:47:53 Oracle-01 multipathd: dm-10: add map (uevent)
Jun 20 22:47:53 Oracle-01 multipathd: dm-10: devmap already registered
Jun 20 22:47:58 Oracle-01 multipathd: sds: readsector0 checker reports path is up
Jun 20 22:47:58 Oracle-01 multipathd: 65:32: reinstated
Jun 20 22:47:58 Oracle-01 multipathd: mpath5: remaining active paths: 4
Jun 20 22:47:58 Oracle-01 multipathd: dm-10: add map (uevent)
Jun 20 22:47:58 Oracle-01 multipathd: dm-10: devmap already registered
Jun 20 22:48:00 Oracle-01 kernel: sd 6:0:2:4: SCSI error: return code = 0x00020000
Jun 20 22:48:00 Oracle-01 kernel: end_request: I/O error, dev sdq, sector 142786
Jun 20 22:48:00 Oracle-01 kernel: device-mapper: multipath: Failing path 65:0.
Jun 20 22:48:00 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:48:00 Oracle-01 multipathd: dm-5: devmap already registered
Jun 20 22:48:00 Oracle-01 multipathd: 65:0: mark as failed
Jun 20 22:48:00 Oracle-01 multipathd: mpath3: remaining active paths: 3
.
.
.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:1): Abort command issued -- 1 1fdf13 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:2): Abort command issued -- 1 1fe172 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:1): Abort command issued -- 1 1fe17e 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:4): Abort command issued -- 1 1fe181 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:4): Abort command issued -- 1 1fe185 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:4): Abort command issued -- 1 1fe180 2002.
Jun 20 22:53:09 Oracle-01 multipathd: sdq: readsector0 checker reports path is down
Jun 20 22:53:09 Oracle-01 multipathd: checker failed path 65:0 in map mpath3
Jun 20 22:53:09 Oracle-01 kernel: sd 6:0:2:4: timing out command, waited 300s
Jun 20 22:53:09 Oracle-01 kernel: device-mapper: multipath: Failing path 65:0.
Jun 20 22:53:09 Oracle-01 multipathd: mpath3: remaining active paths: 3
Jun 20 22:53:09 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:53:09 Oracle-01 multipathd: dm-5: devmap already registered
Jun 20 22:53:14 Oracle-01 multipathd: sdq: readsector0 checker reports path is up
Jun 20 22:53:14 Oracle-01 multipathd: 65:0: reinstated
Jun 20 22:53:14 Oracle-01 multipathd: mpath3: remaining active paths: 4
Jun 20 22:53:14 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:53:14 Oracle-01 multipathd: dm-5: devmap already registered

在同一时刻，操作系统上也出现了I/O和多路径相关的错误。

通过dmesg命令查看操作系统上的错误信息：

[root@Oracle-01 log]# dmesg|grep error
sd 6:0:2:3: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdp, sector 223618
end_request: I/O error, dev sdat, sector 651264
end_request: I/O error, dev sdat, sector 652288
end_request: I/O error, dev sdat, sector 653312
end_request: I/O error, dev sdat, sector 654336
end_request: I/O error, dev sdas, sector 1024
end_request: I/O error, dev sdas, sector 2048
end_request: I/O error, dev sdas, sector 3072
end_request: I/O error, dev sdas, sector 0
end_request: I/O error, dev sdar, sector 1024
.
.
.
end_request: I/O error, dev sdf, sector 2048
end_request: I/O error, dev sdf, sector 3072
sd 6:0:2:4: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdq, sector 142722
sd 6:0:2:4: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdq, sector 142850
sd 6:0:2:4: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdq, sector 142722
sd 6:0:2:4: SCSI error: return code = 0x00020000
.
.
.
end_request: I/O error, dev sdp, sector 5617186
sd 6:0:2:2: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdo, sector 1474050
sd 6:0:2:3: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdp, sector 1564770
sd 6:0:2:4: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdq, sector 142722
sd 6:0:2:3: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdp, sector 177538
sd 6:0:2:2: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdo, sector 1555986

显然导致问题产生的根据还是操作系统和硬件级别上的，看来存储和多路径的问题依然没有彻底解决。

11.2 RAC自动启动报错诊断（四）