How To Deal With Split Brain In Oracle 19c Rac
导读:Oracle 19c Rac脑裂驱逐的方法延续了Oracle 12c Rac的方法。节点存活的优先级是Cohort Size>Weight>lowest numbered node。
1.Oracle 19c Rac 脑裂驱逐的不同场景处理方法:
场景1)当cohorts size不相同时,cohort size数量大的cohorts存活,(weight忽略)。
场景2)当cohorts size相同时,且weight相同时;lowest numbered node存活(注:cohort为null也被驱逐)。
场景3)当cohorts size相同时,且weight不相同时;weight大存活,(lowest numbered node忽略)。
总结规律:节点存活的优先级:Cohort Size>Weight stamp>lowest numbered node。(cohort size/number node简述请看文章尾部的附表)
# 版本是以oracle 19.9 ru为基准
2.为什么要写这篇文章?
1)脑裂问题发生时,oracle cluster会自动处理它,那为什么要了解oracle cluster自动处理的脑裂问题的原理内容呢?
1.1 脑裂故障发生时,可以更快更准确的找到故障原因,因为你了解脑裂处理过程中,谁是最应该被驱逐的节点。
1.2 脑裂故障发生时,可以提升更重要业务的稳定性强壮性,因为更重要的业务在特定节点上运行,脑裂时特定节点会被保留。
2)为什么oracle 12c要引入weight概念呢?
2.1 更多的控制权。客户可以通过weight配置决定Rac脑裂问题处理时存活的特定节点。
weight配置可以通过特定硬件(css\_critical yes节点存活);特定数据库或服务(css\_critical yes或css_critical yes节点存活);特定资源。
3. 通过实验证明我们的结论。
3.1 实验证明:场景1)当cohorts size不相同时,cohort size数量大的cohorts存活,(weight忽略)。
结论转化成实验场景思路:
3个节点rac(cohorts size不相同),节点2配置server权重。模拟故障节点2的私有网络down。
若节点1和节点2的cohort(cohort size数量大)存活,节点2(weight高)被驱逐,则说明cohorts size不相同时,cohort size数量大的cohorts存活,(weight忽略)成立。
若节点1和节点2的cohort(cohort size数量大)被驱逐,节点2(weight高)存活,则说明cohorts size不相同时,cohort size数量大的cohorts存活,(weight忽略)不成立。
3个节点rac,节点2配置server权重 [root@rac2 bin]# ./oifcfg getif enp0s3 10.0.0.0 global public enp0s8 192.168.56.0 global cluster_interconnect,asm [root@rac2 bin]# ./olsnodes -s -n rac1 1 Active rac2 2 Active rac3 3 Active [root@rac2 bin]# ./crsctl get server css\_critical CRS-5092: Current value of the server attribute CSS_CRITICAL is yes. [root@rac2 bin]# date 2022年 03月 01日 星期二 05:09:33 EST [root@rac2 bin]# ifdown enp0s8 成功停用连接 "enp0s8"(D-Bus 活动路径:/org/freedesktop/NetworkManager/ActiveConnection/13 [grid@rac3:/u01/app/grid/diag/crs/rac3/crs/trace]$olsnodes -s -n rac1 1 Active rac2 2 Inactive rac3 3 Active # 实验过程来看,cohorts size大的(节点1与节点3的cohort)存活,weight高的节点2被驱逐。 ## 从ocssd.trc中分析我们的结论:当cohorts size不相同时,cohort size数量大的cohorts存活,(weight忽略)。 # 通常被驱逐节点信息更有分析的价值 节点1与节点3权重相同,且与节点2不同(配置server权重):从clssnmrCheckNodeWeight上的各项描述可知晓。 节点1与节点3是(cohorts size=2)大的:Surviving cohort: 1,3 节点2是(cohorts size=1)小的:My cohort: 2 # 以上信息是ocssd.bin进程通过voting disk得知。 节点1与节点3是(cohorts size=2)大的存活,节点2是(cohorts size=1)小的被驱逐:参考clssnmCheckDskInfo行内容 # ocssd.trc(节点2),因节点2被驱逐,所以分析它的ocssd.trc 2022-03-01 05:10:05.688 : CSSD:1632941824: clssnmrCheckNodeWeight: node(1) has weight stamp(541570462) pebbles (0) goldstars (0) flags (3) SpoolVersion (0) 节点2配置server权重,所以goldstars (1),由0变成1。 2022-03-01 05:10:05.688 : CSSD:1632941824: clssnmrCheckNodeWeight: node(2) has weight stamp(541570462) pebbles (0) goldstars (1) flags (b) SpoolVersion (0) 2022-03-01 05:10:05.688 : CSSD:1632941824: clssnmrCheckNodeWeight: node(3) has weight stamp(541570462) pebbles (0) goldstars (0) flags (3) SpoolVersion (0) 2022-03-01 05:10:05.688 : CSSD:1632941824: clssnmrCheckNodeWeight: Server pool version not consistent 2022-03-01 05:10:05.688 : CSSD:1632941824: clssnmrCheckNodeWeight: stamp(541570462), completed(3/3) 2022-03-01 05:10:05.688 : CSSD:1632941824: [ INFO] clssnmCheckDskInfo: My cohort: 2 # cohorts size=1 2022-03-01 05:10:05.688 : CSSD:1632941824: [ INFO] clssnmCheckDskInfo: Surviving cohort: 1,3 # cohorts size=2 2022-03-01 05:10:05.688 : CSSD:1632941824: [ INFO] clssnmChangeState: oldstate 3 newstate 0 clssnmr.c 3075 2022-03-01 05:10:05.688 : CSSD:1632941824: (:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 2, rac2, loses to cohort of 2 nodes led by node 1, rac1, based on map type 2 since the cohort is larger # cohort is larger(cohort of 2 nodes led by node 1) is active ## 结论:当cohorts size不相同时,cohort size数量大的cohorts存活,(weight忽略)。
3.2 实验证明:场景2)当cohorts size相同时,且weight相同时;lowest numbered node存活(注:null也被驱逐)。
结论转化成实验场景思路:
2个节点rac(cohorts size相同),2个节点均不配置权重。模拟故障节点2的私有网络down。
若节点1存活,则说明cohorts size相同时,且weight相同时;lowest numbered node存活成立。
若节点1被驱逐,则说明cohorts size相同时,且weight相同时;lowest numbered node存活不成立。
2个节点Rac,未配置server权重情况 [root@rac2 bin]# ./oifcfg getif enp0s3 10.0.0.0 global public enp0s8 192.168.56.0 global cluster_interconnect,asm [root@rac2 bin]# ./olsnodes -s -n rac1 1 Active rac2 2 Active [root@rac2 bin]# ./crsctl get server css\_critical CRS-5092: Current value of the server attribute CSS_CRITICAL is no. [root@rac2 bin]# ifdown enp0s8 [root@rac1 bin]# ./olsnodes -s -n rac1 1 Active rac2 2 Inactive # 实验过程来分析,lowest numbered node(节点1)存活。 ## 从ocssd.trc中分析我们的结论:当cohorts size相同时,且weight相同时;lowest numbered node存活(注:null也被驱逐)。 # 通常被驱逐节点信息更有分析的价值 这个实验案例因出现cohort=null,担心实验结果不准确,故做了2次。 第一次:2022-02-28 21:07:43 第一次实验时,cohort验证前节点2就已经被剔除集群,故出现clssnmCheckDskInfo: My cohort: NULL(怕这里是出现错误的论点,故又重新做了1次)。同时证明了:cohort可以是null,若节点的cohort是null表明这个节点在cohort验证前就已经被驱逐。 第二次:2022-02-28 22:59:09 第二次实验时,完美的证明了当cohorts size相同时,且weight相同时;lowest numbered node存活 # ocssd.trc(节点2)(第一次:2022-02-28 21:07:43),因节点2被驱逐,所以分析它的ocssd.trc 2022-02-28 21:07:43.770 : CSSD:1114588928: [ INFO] clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 541548100, wrtcnt, 599339, LATS 524224, lastSeqNo 599336, uniqueness 1646100093, timestamp 1646100463/526874 2022-02-28 21:07:43.771 : CSSD:1103550208: clssnmrCheckNodeWeight: node(1) has weight stamp(541548099) pebbles (0) goldstars (0) flags (3) SpoolVersion (0) 2022-02-28 21:07:43.771 : CSSD:1103550208: clssnmrCheckNodeWeight: node(2) has weight stamp(541548099) pebbles (0) goldstars (0) flags (b) SpoolVersion (0) 2022-02-28 21:07:43.771 : CSSD:1103550208: clssnmrCheckNodeWeight: Server pool version not consistent 2022-02-28 21:07:43.771 : CSSD:1103550208: clssnmrCheckNodeWeight: stamp(541548099), completed(2/2) # 为什么会出现clssnmCheckDskInfo: My cohort: NULL这种情况呢,这里的number node竟是null? # 答:number node是null说明此时节点已被集群驱逐,此时ocssd进程无法通过voting disk中的信息。另外从尾部(local node is already evicted)信息也可以发现。节点2确实已经被驱逐。 原理:number node其实是每个节点的ocssd进程把node name和node number以队列的方式记录在voting disk中,方便节点间进行通信交流。 2022-02-28 21:07:43.771 : CSSD:1103550208: [ INFO] clssnmCheckDskInfo: My cohort: NULL 2022-02-28 21:07:43.771 : CSSD:1103550208: [ INFO] clssnmCheckDskInfo: Surviving cohort: 1 2022-02-28 21:07:43.771 : CSSD:1103550208: [ INFO] clssnmChangeState: oldstate 3 newstate 0 clssnmr.c 3075 2022-02-28 21:07:43.771 : CSSD:1103550208: [ ERROR] clssscWriteCAlogEvent: CALOG init not done 2022-02-28 21:07:43.771 : CSSD:1103550208: (:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 0 nodes with leader 65535, , loses to cohort of 1 nodes led by node 1, rac1, based on map type 2 since the local node is already evicted # ocssd.trc(节点2)(第二次:2022-02-28 22:59:09),因节点2被驱逐,所以分析它的ocssd.trc 2022-02-28 22:59:09.244 : CSSD:3202848512: [ INFO] clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 541554728, wrtcnt, 618613, LATS 7209584, lastSeqNo 618610, uniqueness 1646106722, timestamp 1646107149/7212544 检验权重节点1与节点2权重相同,lowest numbered node存活【节点1存活,My cohort: 1(节点1);< My cohort: 2(节点2)】。 2022-02-28 22:59:09.244 : CSSD:3196540672: clssnmrCheckNodeWeight: node(1) has weight stamp(541554727) pebbles (0) goldstars (0) flags (3) SpoolVersion (0) 2022-02-28 22:59:09.244 : CSSD:3196540672: clssnmrCheckNodeWeight: node(2) has weight stamp(541554727) pebbles (0) goldstars (0) flags (b) SpoolVersion (0) 2022-02-28 22:59:09.244 : CSSD:3196540672: clssnmrCheckNodeWeight: Server pool version not consistent 2022-02-28 22:59:09.244 : CSSD:3196540672: clssnmrCheckNodeWeight: stamp(541554727), completed(2/2) 2022-02-28 22:59:09.244 : CSSD:3196540672: [ INFO] clssnmCheckDskInfo: My cohort: 2 2022-02-28 22:59:09.244 : CSSD:3196540672: [ INFO] clssnmCheckDskInfo: Surviving cohort: 1 2022-02-28 22:59:09.244 : CSSD:3196540672: [ INFO] clssnmChangeState: oldstate 3 newstate 0 clssnmr.c 3075 2022-02-28 22:59:09.245 : CSSD:3196540672: (:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 2, rac2, loses to cohort of 1 nodes led by node 1, rac1, based on map type 2 since the cohort is the only one with public network access ## 场景2)结论:当cohorts size相同时,且weight相同时;lowest numbered node存活(注:null也被驱逐)。
3.3 实验证明:场景3)当cohorts size相同时,且weight不相同时;weight大存活,(lowest numbered node忽略)。
结论转化成实验场景思路:
2个节点rac(cohorts size相同),节点2配置server权重。模拟故障节点2的私有网络down。
若节点2存活,则说明cohorts size相同时,且weight不相同时;weight大存活成立。
若节点2被驱逐,则说明cohorts size相同时,且weight不相同时;weight大存活不成立。
2个节点Rac,节点2配置server权重情况 [root@rac2 bin]# ./oifcfg getif enp0s3 10.0.0.0 global public enp0s8 192.168.56.0 global cluster_interconnect,asm [root@rac2 bin]# ./olsnodes -s -n rac1 1 Active rac2 2 Active [root@rac2 bin]# ./crsctl get server css\_critical CRS-5092: Current value of the server attribute CSS_CRITICAL is no. [root@rac2 bin]# ./crsctl set server css\_critical yes CRS-4416: Server attribute 'CSS_CRITICAL' successfully changed. Restart Oracle High Availability Services for new value to take effect. ./crsctl stop has ./crsctl start has [root@rac2 bin]# ./crsctl get server css\_critical CRS-5092: Current value of the server attribute CSS_CRITICAL is yes. [root@rac2 bin]# date 2022年 03月 01日 星期二 02:16:41 EST [root@rac2 bin]# ifdown enp0s8 成功停用连接 "enp0s8"(D-Bus 活动路径:/org/freedesktop/NetworkManager/ActiveConnection/9) [root@rac2 bin]# ./olsnodes -s -n rac1 1 Inactive rac2 2 Active # 实验过程分析,cohorts size相同时,weight大的cohort存活 ## 从ocssd.trc中分析我们的结论:当cohorts size相同时,且weight不相同时;weight大存活,(lowest numbered node忽略)。 # 通常被驱逐节点信息更有分析的价值 这个实验案例因出现cohort=null,担心实验结果不准确,故做了2次。 第一次:2022-03-01 02:17:14 第一次实验时,权重验证前节点1就已经被剔除集群,故clssnmCheckDskInfo: My cohort: NULL(怕这里是出现错误的论点,故又重新做了1次)。同时证明了:cohort可以是null,若节点的cohort是null 其实表明这个节点在权重验证前已经被驱逐。 第二次:2022-03-01 03:18:52 第二次实验时,2个节点rac(cohorts size相同),节点2配置server权重,故节点权重(goldstars)大;节点2存活,则说明cohorts size相同时,且weight不相同时;weight大存活成立 # 无论是cohort为null,还是cohort为1时,rac使用保证了配置server权重的节点存活。 #(第一次:2022-03-01 02:17:14) # 节点1的ocssd.trc,因节点1被驱逐,所以分析它的ocssd.trc 2022-03-01 02:17:14.359 : CSSD:345351936: [ INFO] clssnmvDHBValidateNCopy: node 2, rac2, has a disk HB, but no network HB, DHB has rcfg 541554732, wrtcnt, 633947, LATS 19097864, lastSeqNo 633944, uniqueness 1646118528, timestamp 1646119033/19094454 2022-03-01 02:17:14.359 : CSSD:132638464: clssnmrCheckNodeWeight: node(1) has weight stamp(541554731) pebbles (0) goldstars (0) flags (3) SpoolVersion (0) 2022-03-01 02:17:14.359 : CSSD:132638464: clssnmrCheckNodeWeight: node(2) has weight stamp(541554731) pebbles (0) goldstars (1) flags (b) SpoolVersion (0) 2022-03-01 02:17:14.359 : CSSD:132638464: clssnmrCheckNodeWeight: Server pool version not consistent 2022-03-01 02:17:14.359 : CSSD:132638464: clssnmrCheckNodeWeight: stamp(541554731), completed(2/2) 2022-03-01 02:17:14.360 : CSSD:132638464: [ INFO] clssnmCheckDskInfo: My cohort: NULL 2022-03-01 02:17:14.360 : CSSD:132638464: [ INFO] clssnmCheckDskInfo: Surviving cohort: 2 2022-03-01 02:17:14.360 : CSSD:132638464: [ INFO] clssnmChangeState: oldstate 3 newstate 0 clssnmr.c 3075 2022-03-01 02:17:14.360 : CSSD:132638464: [ ERROR] clssscWriteCAlogEvent: CALOG init not done cohort为null的原因,下面的描述已经说明。检查cohorts时,节点2已被驱逐 2022-03-01 02:17:14.360 : CSSD:132638464: (:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 0 nodes with leader 65535, , loses to cohort of 1 nodes led by node 2, rac2, based on map type 2 since the local node is already evicted #(第二次:2022-03-01 03:18:52) # 节点1的ocssd.trc,因节点1被驱逐,所以分析它的ocssd.trc 2022-03-01 03:18:52.125 : CSSD:1275066112: [ INFO] clssnmHBInfo: This node has lost connectivity with all the other nodes in the cluster, therefore setting Network Timeout = 0 2022-03-01 03:18:52.127 : CSSD:1487120128: [ INFO] clssnmvDHBValidateNCopy: node 2, rac2, has a disk HB, but no network HB, DHB has rcfg 541570458, wrtcnt, 644669, LATS 22795554, lastSeqNo 644666, uniqueness 1646122452, timestamp 1646122731/22792384 2022-03-01 03:18:52.127 : CSSD:1273489152: clssnmrCheckNodeWeight: node(1) has weight stamp(541570457) pebbles (1) goldstars (0) flags (3) SpoolVersion (0) 2022-03-01 03:18:52.127 : CSSD:1273489152: clssnmrCheckNodeWeight: node(2) has weight stamp(541570457) pebbles (0) goldstars (1) flags (b) SpoolVersion (0) 2022-03-01 03:18:52.127 : CSSD:1273489152: clssnmrCheckNodeWeight: Server pool version not consistent 2022-03-01 03:18:52.127 : CSSD:1273489152: clssnmrCheckNodeWeight: stamp(541570457), completed(2/2) 2022-03-01 03:18:52.127 : CSSD:1273489152: [ INFO] clssnmCheckDskInfo: My cohort: 1 2022-03-01 03:18:52.127 : CSSD:1273489152: [ INFO] clssnmCheckDskInfo: Surviving cohort: 2 2022-03-01 03:18:52.127 : CSSD:1273489152: [ INFO] clssnmChangeState: oldstate 3 newstate 0 clssnmr.c 3075 cohort相同;节点2权重大;尽管节点1的number node是1,节点2的number node是2,但是节点2还是存活下来。证明了在配置了权重时,由权重决定保留谁,number node则忽略。 证明了场景3的结论:当cohorts size相同时,且weight不相同时;weight大存活,(lowest numbered node忽略)。 2022-03-01 03:18:52.127 : CSSD:1273489152: (:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 1, rac1, loses to cohort of 1 nodes led by node 2, rac2, based on map type 2 since the cohort has higher cumulative gold star weight ## 结论:场景3)当cohorts size相同时,且weight不相同时;weight大存活,(lowest numbered node忽略)。
4.附表
核心词语解释如下: # lowest number node number node其实是每个节点的ocssd进程把node name和node number以队列的方式记录在votind disk中,方便节点间进行通信交流。 [root@rac1 bin]# ./olsnodes -n rac1 1 <=node number rac2 2 <=node number # 以下均是截取ocssd.trc的内容 clssnmCheckDskInfo: My cohort: 1 <=其中这个'1'就是 number node,通常这个'1'与node1对应,但不是绝对的。例如下面的案例中节点2就出现 My cohort: NULL的情况,因为在检查权重时发现节点2已剔除集群,所以节点2不能从获取voting disk发现节点的node name和node number。 # cohorts size cohorts size相同时,2个node中抓取ocssd.trc中的一部分 My cohort:1 <=node number=1;cohrots size=1 Surviving cohort:2<=node number=2;cohrots size=1 cohorts size不相同时,4个node中抓取ocssd.trc中的一部分 My cohort:1 <=<=node number=1;cohrots size=1 Surviving cohort:2,3,4 <=node number=2,3,4;cohrots size=3(即2,3,4数量和)
5.参考文献
12C:Which Node Will Survive when Split Brain Takes Place(Doc ID 1951726.1)
Brain: What’s new in Oracle Database 12.1.0.2c?
########################################################################################
版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!【QQ交流群:53993419】
QQ:14040928 E-mail:dbadoudou@163.com
本文链接: http://blog.itpub.net/26442936/viewspace-2868705/
########################################################################################