一个客户的11.2 RAC for Linux X86-64环境停电后出现了故障,RAC环境无法自动启动。
这一篇介绍重装后出现的新问题。
11.2 RAC自动启动报错诊断:http://yangtingkun.itpub.net/post/468/518656
11.2 RAC自动启动报错诊断(二):http://yangtingkun.itpub.net/post/468/518805
11.2 RAC自动启动报错诊断(三):http://yangtingkun.itpub.net/post/468/518834
上次诊断了导致问题的原因在于操作系统级别报了大量的I/O相关的错误,客户最终重建了整个RAC环境。在重装的过程中,I/O问题依然存在,最后配合硬件工程师检查存储发现:存储链路断了一条,且其中一个控制器的关键服务没有启动。
文件解决后,I/O错误没有再次出现,RAC环境也成功搭建,本以为问题圆满解决,没有想到这个RAC环境最近又频繁出现自动重启的现象。
检查alert文件发现错误:
pMon Jun 20 22:50:49 2011
NOTE: ASMB terminating
Errors in file /oracle/diag/rdbms/spsp/SPSP1/trace/SPSP1_asmb_15270.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 2 Serial number: 3
Errors in file /oracle/diag/rdbms/spsp/SPSP1/trace/SPSP1_asmb_15270.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 2 Serial number: 3
ASMB (ospid: 15270): terminating the instance due to error 15064
Mon Jun 20 22:50:51 2011
ORA-1092 : opitsk aborting process
Termination issued to instance processes. Waiting for the processes to exit
Mon Jun 20 22:50:59 2011
Instance termination failed to kill one or more processes
Instance terminated by ASMB, pid = 15270
显然是ASM实例出现了故障,导致数据库实例和ASM实例通信中断,检查ASM实例告警信息:
Mon Jun 20 22:50:49 2011
NOTE: ASMB process exiting, either shutdown is in progress
NOTE: or foreground connected to ASMB was killed.
Mon Jun 20 22:50:49 2011
NOTE: client exited [14631]
NOTE: force a map free for map id 2
Mon Jun 20 22:50:50 2011
Received an instance abort message from instance 2
Mon Jun 20 22:50:50 2011
Received an instance abort message from instance 2
Please check instance 2 alert and LMON trace files for detail.
Please check instance 2 alert and LMON trace files for detail.
LMS0 (ospid: 14565): terminating the instance due to error 481
Mon Jun 20 22:50:51 2011
ORA-1092 : opitsk aborting process
Mon Jun 20 22:50:51 2011
License high water mark = 18
Termination issued to instance processes. Waiting for the processes to exit
Instance termination failed to kill one or more processes
Instance terminated by LMS0, pid = 14565
USER (ospid: 3069): terminating the instance
Mon Jun 20 22:51:03 2011
Termination issued to instance processes. Waiting for the processes to exit
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 3069
看来又是CLUSTER的问题,导致ASM实例的关闭,继续检查CLUSTER的相关日志:
2011-06-20 22:49:07.305
[cssd(14245)]CRS-1615:No I/O has completed after 50% of the maximum interval.
Voting file ORCL:VOL will be considered not functional in 99320 milliseconds
2011-06-20 22:49:57.441
[cssd(14245)]CRS-1614:No I/O has completed after 75% of the maximum interval.
Voting file ORCL:VOL will be considered not functional in 49190 milliseconds
2011-06-20 22:50:27.505
[cssd(14245)]CRS-1613:No I/O has completed after 90% of the maximum interval.
Voting file ORCL:VOL will be considered not functional in 19130 milliseconds
2011-06-20 22:50:47.545
[cssd(14245)]CRS-1604:CSSD voting file is offline: ORCL:VOL; details at
(:CSSNM00058:) in /oracle/product/11g/grid/log/oracle-01/cssd/ocssd.log.
2011-06-20 22:50:47.546
[cssd(14245)]CRS-1606:The number of voting files available, 0, is less than the
minimum number of voting files required, 1, resulting in CSSD termination to
ensure data integrity; details at (:CSSNM00018:) in /oracle/product/11g/grid/log/oracle-01/cssd/ocssd.log
2011-06-20 22:50:47.546
[cssd(14245)]CRS-1656:The CSS daemon is terminating due to a fatal error;
Details at (:CSSSC00012:) in
/oracle/product/11g/grid/log/oracle-01/cssd/ocssd.log
2011-06-20 22:50:47.581
[cssd(14245)]CRS-1652:Starting clean up of CRSD resources.
2011-06-20 22:50:48.824
[/oracle/product/11g/grid/bin/oraagent.bin(14753)]CRS-5016:Process
"/oracle/product/11g/grid/opmn/bin/onsctli" spawned by agent
"/oracle/product/11g/grid/bin/oraagent.bin" for action
"check" failed: details at "(:CLSN00010:)" in
"/oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:49.435
[/oracle/product/11g/grid/bin/oraagent.bin(14753)]CRS-5016:Process
"/oracle/product/11g/grid/bin/lsnrctl" spawned by agent
"/oracle/product/11g/grid/bin/oraagent.bin" for action
"check" failed: details at "(:CLSN00010:)" in
"/oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:49.441
[/oracle/product/11g/grid/bin/oraagent.bin(14753)]CRS-5016:Process
"/oracle/product/11g/grid/bin/lsnrctl" spawned by agent
"/oracle/product/11g/grid/bin/oraagent.bin" for action
"check" failed: details at "(:CLSN00010:)" in
"/oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:49.445
[cssd(14245)]CRS-1654:Clean up of CRSD resources finished successfully.
2011-06-20 22:50:49.446
[cssd(14245)]CRS-1655:CSSD on node oracle-01 detected a problem and started to
shutdown.
2011-06-20 22:50:49.459
[/oracle/product/11g/grid/bin/orarootagent.bin(14756)]CRS-5822:Agent
'/oracle/product/11g/grid/bin/orarootagent_root' disconnected from server.
Details at (:CRSAGF00117:) {0:1:8} in
/oracle/product/11g/grid/log/oracle-01/agent/crsd/orarootagent_root/orarootagent_root.log.
2011-06-20 22:50:49.460
[/oracle/product/11g/grid/bin/oraagent.bin(15085)]CRS-5822:Agent
'/oracle/product/11g/grid/bin/oraagent_oracle' disconnected from server.
Details at (:CRSAGF00117:) {0:5:4} in
/oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2011-06-20 22:50:49.472
[/oracle/product/11g/grid/bin/oraagent.bin(14753)]CRS-5822:Agent
'/oracle/product/11g/grid/bin/oraagent_grid' disconnected from server. Details
at (:CRSAGF00117:) {0:2:7} in /oracle/product/11g/grid/log/oracle-01/agent/crsd/oraagent_grid/oraagent_grid.log.
2011-06-20 22:50:49.611
[cssd(14245)]CRS-1660:The CSS daemon shutdown has completed
2011-06-20 22:50:49.622
[ohasd(14459)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle-01'.
2011-06-20 22:50:50.648
[crsd(3045)]CRS-0805:Cluster Ready Service aborted due to failure to
communicate with Cluster Synchronization Service with error [3]. Details at
(:CRSD00109:) in /oracle/product/11g/grid/log/oracle-01/crsd/crsd.log.
2011-06-20 22:50:50.932
[ohasd(14459)]CRS-2765:Resource 'ora.diskmon' has failed on server 'oracle-01'.
2011-06-20 22:50:50.944
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource
"+ASM" failed: details at "(:CLSN00006:)" in
"/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:50.946
[ohasd(14459)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle-01'.
2011-06-20 22:50:51.127
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource
"+ASM" failed: details at "(:CLSN00006:)" in
"/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:51.129
[ohasd(14459)]CRS-2765:Resource 'ora.asm' has failed on server 'oracle-01'.
2011-06-20 22:50:51.309
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource
"+ASM" failed: details at "(:CLSN00006:)" in
"/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:51.455
[ohasd(14459)]CRS-2765:Resource 'ora.cssd' has failed on server 'oracle-01'.
2011-06-20 22:50:51.491
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource
"+ASM" failed: details at "(:CLSN00006:)" in
"/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:51.673
[/oracle/product/11g/grid/bin/oraagent.bin(14984)]CRS-5011:Check of resource
"+ASM" failed: details at "(:CLSN00006:)" in
"/oracle/product/11g/grid/log/oracle-01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2011-06-20 22:50:52.660
[crsd(3055)]CRS-0805:Cluster Ready Service aborted due to failure to
communicate with Cluster Synchronization Service with error [3]. Details at
(:CRSD00109:) in /oracle/product/11g/grid/log/oracle-01/crsd/crsd.log.
2011-06-20 22:50:52.984
[cssd(3071)]CRS-1713:CSSD daemon is started in clustered mode
2011-06-20 22:50:53.664
[ohasd(14459)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle-01'.
2011-06-20 22:50:58.936
[ohasd(14459)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server
'oracle-01'.
导致ASM实例关闭的原因是VOTE盘读取超时,而导致超时的原因多半又是I/O相关的问题,继续检查ocssd.log:
2011-06-20 22:47:35.621: [ CSSD][1095420224]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 8010 msecs
2011-06-20 22:47:37.749: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
.
.
.
2011-06-20 22:47:43.633: [ CSSD][1099643200]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 16020 msecs
2011-06-20 22:47:46.767: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
.
.
.
2011-06-20 22:47:52.655: [ CSSD][1095420224]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 25040 msecs
2011-06-20 22:47:54.783: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
.
.
.
2011-06-20 22:48:00.671: [ CSSD][1095420224]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 33050 msecs
2011-06-20 22:48:04.803: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
2011-06-20 22:48:04.803: [ CSSD][1092266304]clssnmSendingThread: sent 5 status
msgs to all nodes
2011-06-20 22:48:08.687: [ CSSD][1095420224]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 41070 msecs
2011-06-20 22:48:08.811: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
.
.
.
2011-06-20 22:48:16.703: [ CSSD][1095420224]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 49090 msecs
2011-06-20 22:48:16.827: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
.
.
.
2011-06-20 22:48:24.248: [ CSSD][1099643200]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 56630 msecs
2011-06-20 22:48:26.847: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
.
.
.
2011-06-20 22:48:33.638: [ CSSD][1099643200]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 66020 msecs
2011-06-20 22:48:35.865: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
.
.
.
2011-06-20 22:50:38.988: [ CSSD][1095420224]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 191350 msecs
2011-06-20 22:50:42.117: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
2011-06-20 22:50:42.117: [ CSSD][1092266304]clssnmSendingThread: sent 5 status
msgs to all nodes
2011-06-20 22:50:47.003: [ CSSD][1095420224]clssscMonitorThreads
clssnmvDiskPingThread not scheduled for 199360 msecs
2011-06-20 22:50:47.127: [ CSSD][1092266304]clssnmSendingThread: sending status
msg to all nodes
2011-06-20 22:50:47.128: [ CSSD][1092266304]clssnmSendingThread: sent 5 status
msgs to all nodes
2011-06-20 22:50:47.545: [ CSSD][1085339968](:CSSNM00058:)clssnmvDiskCheck: No
I/O completions for 200900 ms for voting file ORCL:VOL)
2011-06-20 22:50:47.546: [ CSSD][1085339968]clssnmvDiskAvailabilityChange:
voting file ORCL:VOL now offline
2011-06-20 22:50:47.546: [ CSSD][1085339968](:CSSNM00018:)clssnmvDiskCheck:
Aborting, 0 of 1 configured voting disks available, need 1
2011-06-20 22:50:47.546: [ SKGFD][1082186048]Lib
:ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab000e060
for disk :ORCL:VOL:
2011-06-20 22:50:47.546: [
CSSD][1085339968]###################################
2011-06-20 22:50:47.546: [ CSSD][1085339968]clssscExit: CSSD aborting from
thread clssnmvDiskPingMonitorThread
2011-06-20 22:50:47.546: [ CSSD][1085339968]###################################
2011-06-20 22:50:47.546: [ SKGFD][1083763008]Lib
:ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab0056990
for disk :ORCL:VOL:
2011-06-20 22:50:47.546: [ CSSD][1085339968](:CSSSC00012:)clssscExit: A
fatal error occurred and the CSS daemon is terminating abnormally
2011-06-20 22:50:47.546: [ CSSD][1085339968]
----- Call Stack Trace -----
2011-06-20 22:50:47.546: [ CSSD][1085339968]calling call entry argument values
in hex
2011-06-20 22:50:47.546: [ CSSD][1085339968]location type point (? means
dubious value)
2011-06-20 22:50:47.546: [ CSSD][1085339968]-------------------- --------
-------------------- ----------------------------
2011-06-20 22:50:47.552: [ CSSD][1085339968]clssscExit()+726 call kgdsdst()
000000000 ? 000000000 ?
2011-06-20 22:50:47.552: [ CSSD][1085339968] 040B0A568 ? 000000001 ?
2011-06-20 22:50:47.552: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]clssnmvDiskCheck()+ call
clssscExit() 005237F80 ? 000000002 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]3220 040B0A568 ? 000000001 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]clssnmvDiskPingMoni call
clssnmvDiskCheck() 005237F80 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]torThread()+404 040B0F0B8 ?
000000000 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]clssscthrdmain()+25 call
clssnmvDiskPingMoni 005237F80 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]3 torThread() 040B0F0B8 ? 000000000
?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]start_thread()+199 call
clssscthrdmain() 005237F80 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 2AAAAC074780 ? 000000000 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]clone()+109 call start_thread()
040B0F940 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 2AAAAC074780 ? 000000000 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968]0000000000000000 call clone()
040B0F940 ? 2AAAAC074780 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 2AAAAC074780 ? 000000000 ?
2011-06-20 22:50:47.553: [ CSSD][1085339968] 000000001 ? 000000003 ?
可以看到,导致问题产生的原因是由于超过阈值20秒没有办法获取VOT盘,导致节点被强制重启。
检查对应时刻的操作系统信息:
Jun 20 22:47:48 Oracle-01 kernel: sd 6:0:2:4: SCSI error: return code =
0x00020000
Jun 20 22:47:48 Oracle-01 kernel: end_request: I/O error, dev sdq, sector
142754
Jun 20 22:47:48 Oracle-01 kernel: device-mapper: multipath: Failing path 65:0.
Jun 20 22:47:48 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:47:48 Oracle-01 multipathd: dm-5: devmap already registered
Jun 20 22:47:48 Oracle-01 multipathd: 65:0: mark as failed
Jun 20 22:47:48 Oracle-01 multipathd: mpath3: remaining active paths: 3
Jun 20 22:47:53 Oracle-01 multipathd: sdq: readsector0 checker reports path is
up
Jun 20 22:47:53 Oracle-01 multipathd: 65:0: reinstated
Jun 20 22:47:53 Oracle-01 multipathd: mpath3: remaining active paths: 4
Jun 20 22:47:53 Oracle-01 multipathd: sds: readsector0 checker reports path is
down
Jun 20 22:47:53 Oracle-01 multipathd: checker failed path 65:32 in map mpath5
Jun 20 22:47:53 Oracle-01 multipathd: mpath5: remaining active paths: 3
Jun 20 22:47:53 Oracle-01 kernel: device-mapper: multipath: Failing path 65:32.
Jun 20 22:47:53 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:47:53 Oracle-01 multipathd: dm-5: devmap already registered
Jun 20 22:47:53 Oracle-01 multipathd: dm-10: add map (uevent)
Jun 20 22:47:53 Oracle-01 multipathd: dm-10: devmap already registered
Jun 20 22:47:58 Oracle-01 multipathd: sds: readsector0 checker reports path is
up
Jun 20 22:47:58 Oracle-01 multipathd: 65:32: reinstated
Jun 20 22:47:58 Oracle-01 multipathd: mpath5: remaining active paths: 4
Jun 20 22:47:58 Oracle-01 multipathd: dm-10: add map (uevent)
Jun 20 22:47:58 Oracle-01 multipathd: dm-10: devmap already registered
Jun 20 22:48:00 Oracle-01 kernel: sd 6:0:2:4: SCSI error: return code =
0x00020000
Jun 20 22:48:00 Oracle-01 kernel: end_request: I/O error, dev sdq, sector 142786
Jun 20 22:48:00 Oracle-01 kernel: device-mapper: multipath: Failing path 65:0.
Jun 20 22:48:00 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:48:00 Oracle-01 multipathd: dm-5: devmap already registered
Jun 20 22:48:00 Oracle-01 multipathd: 65:0: mark as failed
Jun 20 22:48:00 Oracle-01 multipathd: mpath3: remaining active paths: 3
.
.
.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:1): Abort
command issued -- 1 1fdf13 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:2): Abort
command issued -- 1 1fe172 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:1): Abort
command issued -- 1 1fe17e 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:4): Abort
command issued -- 1 1fe181 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:4): Abort
command issued -- 1 1fe185 2002.
Jun 20 22:53:09 Oracle-01 kernel: qla2xxx 0000:09:00.0: scsi(6:2:4): Abort
command issued -- 1 1fe180 2002.
Jun 20 22:53:09 Oracle-01 multipathd: sdq: readsector0 checker reports path is
down
Jun 20 22:53:09 Oracle-01 multipathd: checker failed path 65:0 in map mpath3
Jun 20 22:53:09 Oracle-01 kernel: sd 6:0:2:4: timing out command, waited 300s
Jun 20 22:53:09 Oracle-01 kernel: device-mapper: multipath: Failing path 65:0.
Jun 20 22:53:09 Oracle-01 multipathd: mpath3: remaining active paths: 3
Jun 20 22:53:09 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:53:09 Oracle-01 multipathd: dm-5: devmap already registered
Jun 20 22:53:14 Oracle-01 multipathd: sdq: readsector0 checker reports path is
up
Jun 20 22:53:14 Oracle-01 multipathd: 65:0: reinstated
Jun 20 22:53:14 Oracle-01 multipathd: mpath3: remaining active paths: 4
Jun 20 22:53:14 Oracle-01 multipathd: dm-5: add map (uevent)
Jun 20 22:53:14 Oracle-01 multipathd: dm-5: devmap already registered
在同一时刻,操作系统上也出现了I/O和多路径相关的错误。
通过dmesg命令查看操作系统上的错误信息:
[root@Oracle-01 log]# dmesg|grep error
sd 6:0:2:3: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdp, sector 223618
end_request: I/O error, dev sdat, sector 651264
end_request: I/O error, dev sdat, sector 652288
end_request: I/O error, dev sdat, sector 653312
end_request: I/O error, dev sdat, sector 654336
end_request: I/O error, dev sdas, sector 1024
end_request: I/O error, dev sdas, sector 2048
end_request: I/O error, dev sdas, sector 3072
end_request: I/O error, dev sdas, sector 0
end_request: I/O error, dev sdar, sector 1024
.
.
.
end_request: I/O error, dev sdf, sector 2048
end_request: I/O error, dev sdf, sector 3072
sd 6:0:2:4: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdq, sector 142722
sd 6:0:2:4: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdq, sector 142850
sd 6:0:2:4: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdq, sector 142722
sd 6:0:2:4: SCSI error: return code = 0x00020000
.
.
.
end_request: I/O error, dev sdp, sector 5617186
sd 6:0:2:2: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdo, sector 1474050
sd 6:0:2:3: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdp, sector 1564770
sd 6:0:2:4: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdq, sector 142722
sd 6:0:2:3: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdp, sector 177538
sd 6:0:2:2: SCSI error: return code = 0x00020000
end_request: I/O error, dev sdo, sector 1555986
显然导致问题产生的根据还是操作系统和硬件级别上的,看来存储和多路径的问题依然没有彻底解决。