问题现象:
Oracle RAC 节点1重启操作系统后,CRS无法启动。
报错如下:
2022-06-15 19:41:22.077: [cssd(29365)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /oracle/grid/product/11.0/log/cjc-db-01/cssd/ocssd.log
检查日志
检查/oracle/grid/product/11.0/log/cjc-db-01/cssd/ocssd.log没有找到详细信息。
找不多voting files,难道共享存储有问题?
问题分析:
1.对比两节点磁盘挂载挂载权限,完全一致。
查看节点2共享存储权限
root@cjc-db-01:/oradata#ll -rth total 159G -rw-rw---- 1 grid asmadmin 20G Jun 16 09:23 cjcdata5 -rw-rw---- 1 grid asmadmin 20G Jun 16 09:24 cjcdata3 -rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata7 -rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata4 -rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata6 -rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata1 -rw-rw---- 1 grid asmadmin 6.0G Jun 16 09:25 ocr2 -rw-rw---- 1 grid asmadmin 6.0G Jun 16 09:25 ocr3 -rw-rw---- 1 grid asmadmin 6.0G Jun 16 09:25 ocr1 -rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata2
对比两个节点检查NAS挂载参数,完全一致
root@cjc-db-01:/oradata#mount|grep oradata 192.168.0.10:/cjc_db_oradata_01_nfs on /oradata type nfs (rw,relatime,sync,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.10,mountvers=3,mountport=2050,mountproto=tcp,local_lock=none,addr=192.168.0.10) root@cjc-db-02:/root#mount|grep oradata 192.168.0.10:/cjc_db_oradata_01_nfs on /oradata type nfs (rw,relatime,sync,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.10,mountvers=3,mountport=2050,mountproto=tcp,local_lock=none,addr=192.168.0.10)
2.检查voting files、OCR是否正常?
登录节点2,检查集群 voting files、OCR正常,无问题。
检查ocr
grid@cjc-db-02:/home/grid$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 2920 Available space (kbytes) : 259200 ID : 550682119 Device/File Name : +SYS Device/File integrity check succeeded Device/File Name : +DATA Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user
检查ocrdisk
grid@cjc-db-02:/home/grid$ crsctl query css ocrdisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 64c8b1ebd5fd4f19bfa1c7ef88397359 (/dev/rac/ocr1) [SYS] 2. ONLINE 1953c55c29bd4fa9bf3d04d7fe001112 (/dev/rac/ocr2) [SYS] 3. ONLINE 7d719d3170a34ff7bfff5cc5abae15cb (/dev/rac/ocr3) [SYS] Located 3 voting disk(s).
嗯,不对,共享磁盘为什么显示在/dev/rac目录下,不是在/oradata下吗?
登录节点2,检查磁盘路径,显示共享磁盘均在/dev/rac目录下
select path from v$asm_disk;
难道是有软链接?
检查节点2软链接:
root@cjc-db-02:/root#ll -rth /dev/rac/ total 0 lrwxrwxrwx 1 root root 14 Dec 12 2019 ocr1 -> /oradata/ocr1 lrwxrwxrwx 1 root root 14 Dec 12 2019 ocr2 -> /oradata/ocr2 lrwxrwxrwx 1 root root 14 Dec 12 2019 ocr3 -> /oradata/ocr3 lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata1 -> /oradata/cjcdata1 lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata2 -> /oradata/cjcdata2 lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata3 -> /oradata/cjcdata3 lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata4 -> /oradata/cjcdata4 lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata5 -> /oradata/cjcdata5 lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata6 -> /oradata/cjcdata6 lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata7 -> /oradata/cjcdata7
检查节点1软链接,没有?
root@cjc-db-01:/root#ll -rth /dev/rac* ls: cannot access /dev/rac*: No such file or directory
检查历史命令,没有删除软链接的操作。
解决方案:
先新建软链接
新建/dev/rac目录,授权,创建软连接。
注意软连接路径要和另一个节点完全一致。
ln -s /oradata/cjcdata1 /dev/rac/cjcdata1 ln -s /oradata/cjcdata2 /dev/rac/cjcdata2 ln -s /oradata/cjcdata3 /dev/rac/cjcdata3 ln -s /oradata/cjcdata4 /dev/rac/cjcdata4 ln -s /oradata/cjcdata5 /dev/rac/cjcdata5 ln -s /oradata/cjcdata6 /dev/rac/cjcdata6 ln -s /oradata/cjcdata7 /dev/rac/cjcdata7 ln -s /oradata/ocr1 /dev/rac/ocr1 ln -s /oradata/ocr2 /dev/rac/ocr2 ln -s /oradata/ocr3 /dev/rac/ocr3
查看权限和名称
root@cjc-db-01:/dev#ll -rht rac/* lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata1 -> /oradata/cjcdata1 lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata2 -> /oradata/cjcdata2 lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata3 -> /oradata/cjcdata3 lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata4 -> /oradata/cjcdata4 lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata5 -> /oradata/cjcdata5 lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata6 -> /oradata/cjcdata6 lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata7 -> /oradata/cjcdata7 lrwxrwxrwx 1 root root 14 Jun 15 19:51 rac/ocr1 -> /oradata/ocr1 lrwxrwxrwx 1 root root 14 Jun 15 19:51 rac/ocr2 -> /oradata/ocr2 lrwxrwxrwx 1 root root 14 Jun 15 19:51 rac/ocr3 -> /oradata/ocr3
再次启动CRS,集群恢复正常。
再回到最开始的问题,为什么重启服务器后共享存储的软链接会丢失?
是软链接位置的原因,不能放在/dev下,实际上也没必要使用软链接。
经验证,手动在/dev目录下创建任何文件,包括软链接,重启操作系统后文件都会丢失。