在Oracle rac平台,11.2之后,有时候会出现“Duplicate voting file found”相关错误。我们可以检查相关信息进行分析
问题1:
ocssd.log: ----------- 2010-12-13 17:13:08.855: [ CLSF][1112013120]Opened hdl:0x2aaab0148ee0 for dev:/dev/raw/raw1: 2010-12-13 17:13:08.855: [ CSSD][1112013120]clssnmvStatusBlkInit: myinfo nodename enode1, uniqueness 1292256281 2010-12-13 17:13:08.855: [ CSSD][1112013120]clssnmvDiskAvailabilityChange: voting file /dev/raw/raw1 now online 2010-12-13 17:13:08.856: [ CSSD][1099827520]clssnmvDiskKillCheck: Aborting, killed by install operation 2010-12-13 17:13:08.856: [ CSSD][1099827520]################################### 2010-12-13 17:13:08.856: [ CSSD][1099827520]clssscExit: CSSD aborting from thread clssnmvKillBlockThread 2010-12-13 17:13:08.856: [ CSSD][1099827520]###################################
问题2:
2013-02-24 17:21:41.776: [ CSSD][17]clssnmvDiskVerify: Successful discovery of 3 disks 2013-02-24 17:21:41.776: [ CSSD][17]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery 2013-02-24 17:21:41.776: [ CSSD][17]clssnmCompleteVFDiscovery: Completing voting file discovery 2013-02-24 17:21:41.777: [ CSSD][17]clssnmvDiskStateChange: state from discovered to deconfigured disk /dev/rdsk/emcpower15 2013-02-24 17:21:41.777: [ CSSD][17]clssnmvDiskStateChange: state from discovered to deconfigured disk /dev/rdsk/emcpower14 2013-02-24 17:21:41.777: [ CSSD][17]clssnmvDiskStateChange: state from discovered to deconfigured disk /dev/rdsk/emcpower13 2013-02-24 17:21:41.777: [ CSSD][17]clssnmvVerifyCommittedConfigVFs: Insufficient voting files found, found 0 of 3 configured, needed 2 voting files 2013-02-24 17:21:41.778: [ CSSD][17](:CSSNM00020:)clssnmvVerifyCommittedConfigVFs: voting file 0, id ae6026ad-91804f21-bfdc9227-aced3ee1 not found 2013-02-24 17:21:41.778: [ CSSD][17](:CSSNM00020:)clssnmvVerifyCommittedConfigVFs: voting file 1, id b848bcd9-24684fa7-bfadc40b-506de4ab not found 2013-02-24 17:21:41.778: [ CSSD][17](:CSSNM00020:)clssnmvVerifyCommittedConfigVFs: voting file 2, id 2694f601-b61d4fbb-bfee0a0c-9a32246e not found 2013-02-24 17:21:41.779: [ CSSD][17]ASSERT clssnm1.c 3375 2013-02-24 17:21:41.779: [ CSSD][17](:CSSNM00021:)clssnmCompleteVFDiscovery: Found 0 voting files, but 2 are required. Terminating due to insufficient configured voting files 2013-02-24 17:21:41.779: [ CSSD][17]################################### 2013-02-24 17:21:41.779: [ CSSD][17]clssscExit: CSSD aborting from thread clssnmvDDiscThread 2013-02-24 17:21:41.779: [ CSSD][17]###################################
主要可能是因 某些存储设备映射更改或添加新LUN。
该问题是由于发现了用于投票磁盘的重复设备造成的。这通常发生在具有多路径磁盘的环境中。从11gR2开始,当发现投票磁盘存在重复设备时,两个磁盘都将被丢弃,这将导致CSSD无法启动投票磁盘。
日志显示如下:
2010-12-13 17:04:44.039: [ SKGFD][1085995328]Discovery with str:: 2010-12-13 17:04:44.039: [ SKGFD][1085995328]UFS discovery with :: 2010-12-13 17:04:44.040: [ SKGFD][1085995328]Fetching UFS disk :/dev/raw/raw1: 2010-12-13 17:04:44.040: [ SKGFD][1085995328]Fetching UFS disk :/dev/raw/raw2: ...... 2010-12-13 17:04:44.040: [ SKGFD][1085995328]Discovery with asmlib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: str :: 2010-12-13 17:04:44.041: [ SKGFD][1085995328]Fetching asmlib disk :ORCL:OCR_VOTE01: ... 2010-12-13 17:04:44.058: [ CSSD][1085995328]clssnmvDiskVerify: Successful discovery for disk /dev/raw/raw1, UID 846056ee-71db4f64-bf0206ce-cb441c4f, Pending CIN 0:1292254381:0, Committed CIN 0:1292254381:0 2010-12-13 17:04:44.058: [ CSSD][1085995328]clssnmvDiskVerify: discovered a potential voting file 2010-12-13 17:04:44.058: [ SKGFD][1085995328]Handle 0x94e3a00 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:OCR_VOTE01: 2010-12-13 17:04:44.058: [ CLSF][1085995328]Opened hdl:0x94e4660 for dev:ORCL:OCR_VOTE01: 2010-12-13 17:04:44.059: [ CSSD][1085995328]clssnmFindVF: found VF by vdin in the discovered queue 2010-12-13 17:04:44.059: [ CSSD][1085995328]clssnmFindVF: Duplicate voting file found in the queue of previously discovered disks queued(/dev/raw/raw1|[846056ee-71db4f64-bf0206ce-cb441c4f]), found(|[846056ee-71db4f64-bf0206ce-cb441c4f]) 2010-12-13 17:04:44.059: [ CSSD][1085995328]clssnmvDiskDestroy: removing the voting disk
创建并映射到ASM使用的底层设备的原始设备存在配置错误,它们归网格用户所有。当CSSD启动时,它发现了原始设备和ASMlib磁盘。/dev/raw/raw1和OCR_VOTE01都指向同一个磁盘。在发现过程中,首先会发现/dev/raw/raw1,因此OCR_VOTE01将作为副本删除。但投票盘在ASM盘上,它没有使用原始设备格式。当使用/dev/raw/raw1作为投票盘时,它无法获得正确的内容,因此报告了“clssnmvDiskKillCheck:中止,被安装操作终止”,CSSD无法继续运行。
第二段日志
2013-02-24 17:21:41.464: [ CSSD][17]clssnmFindVF: found VF by vdin in the discovered queue 2013-02-24 17:21:41.464: [ CSSD][17]clssnmFindVF: Duplicate voting file found in the queue of previously discovered disks queued(/dev/rdsk/emcpower13|[ae6026ad-91804f21-bfdc9227-aced3ee1]), found(/dev/rdsk/c0d13s0|[ae6026ad-91804f21-bfdc9227-aced3ee1]), is not corrupted 2013-02-24 17:21:41.465: [ CSSD][17]clssnmvDiskCreate: Found a duplicate voting file /dev/rdsk/emcpower13 in the discovery queue which appears to be the same physical device as the newly discovered disk /dev/rdsk/c0d13s0. Rejecting both these files 2013-02-24 17:21:41.465: [ CSSD][17]clssnmvDiskDestroy: removing the voting disk /dev/rdsk/c0d13s0 2013-02-24 17:21:41.465: [ SKGFD][17]Lib :UFS:: closing handle 10124a910 for disk :/dev/rdsk/c0d13s0: > ls -ltr /dev/rdsk/c0d13s0 lrwxrwxrwx 1 root root 66 Dec 19 16:31 /dev/rdsk/c0d13s0 -> ../../devices/virtual-devices@100/channel-devices@200/disk@d:a,raw > ls -l /devices/virtual-devices@100/channel-devices@200/disk@d:a,raw crw-rw-r-- 1 grid dba 150, 112 Dec 19 22:27 /devices/virtual-devices@100/channel-devices@200/disk@d:a,raw
这与多路径设置有关,/dev/rdsk/emcpower13和/dev/rdsk/c0d13s0都指向同一设备。/dev/rdsk/c013ds0的底层设备的所有权和权限设置不正确,导致发现并丢弃这两个设备。
解决办法:
针对第一段日志,删除原始设备映射或更改所有节点上1组多路径设备的所有权/权限,以便网格用户看不到这些设备
# chown root:root /dev/raw/raw[1-2] # chmod 600 /dev/raw/raw[1-2]
针对第二段日志
# chown root:sys /devices/virtual-devices@100/channel-devices@200/disk@d:a,raw # chmod 600 /devices/virtual-devices@100/channel-devices@200/disk@d:a,raw after the change, it looks like: > ls -l /devices/virtual-devices@100/channel-devices@200/disk@d:a,raw crw------- 1 root sys 150, 112 Dec 19 22:27 /devices/virtual-devices@100/channel-devices@200/disk@d:a,raw perform the same changes for other voting disks if using normal or high redundancy.
重启集群
#注意root用户,进入grid 安装目录下执行 crsctl stop has -f crsctl start has
翻译参考mos文档: Clusterware Fails to Start due to CSSD Fails to start with "Duplicate voting file found" (Doc ID 1274309.1)