问题现象:
Master:192.168.1.101 6377 Slave0:192.168.1.102 6377 Slave1:192.168.1.103 6377
Redis哨兵集群,Master关闭后,两个Slave都无法自动提升为Master。
sentinel日志如下:
Next failover delay: I will not start a failover before ...
详细日志如下:
26960:X 30 Oct 2025 21:59:13.162 # Next failover delay: I will not start a failover before Thu Oct 30 22:05:12 2025 26960:X 30 Oct 2025 22:01:37.514 * +reboot master cjc 192.168.1.101 6377 26960:X 30 Oct 2025 22:01:37.569 # -sdown master cjc 192.168.1.101 6377 26960:X 30 Oct 2025 22:01:37.569 # -odown master cjc 192.168.1.101 6377 26960:X 30 Oct 2025 22:04:59.399 # +sdown master cjc 192.168.1.101 6377 26960:X 30 Oct 2025 22:05:00.548 # +odown master cjc 192.168.1.101 6377 #quorum 3/2 26960:X 30 Oct 2025 22:05:12.271 # +new-epoch 7 26960:X 30 Oct 2025 22:05:12.273 # +vote-for-leader 6e925984391e935ba02b7c03b26fa5a15c77988e 7 26960:X 30 Oct 2025 22:05:12.312 # Next failover delay: I will not start a failover before Thu Oct 30 22:11:12 2025 26960:X 30 Oct 2025 22:06:38.098 * +reboot master cjc 192.168.1.101 6377 26960:X 30 Oct 2025 22:06:38.199 # -sdown master cjc 192.168.1.101 6377 26960:X 30 Oct 2025 22:06:38.199 # -odown master cjc 192.168.1.101 6377 26960:X 30 Oct 2025 22:10:35.921 # +sdown master cjc 192.168.1.101 6377 26960:X 30 Oct 2025 22:10:36.012 # +odown master cjc 192.168.1.101 6377 #quorum 2/2 ......
问题分析:
1.检查两个slave权重参数
之前出现过,手动在线修改某一个slave权重参数为0:
config set slave-priority 0
手动执行切换后:
sentinel failover cjc
之前改参数的slave节点的redis.conf配置文件里的权重slave-priority也被改成0。
如果所有slave权重都是0,将无法自动提升为master。
但是本次案例和slave权重参数无关,两个slave的 slave-priority都是100。
config get slave-priority
2.检查两个slave的redis.conf配置文件
发现可疑参数:
rename-command CONFIG cjc123456abcdefgqaz
原CONFIG命令被重命名了,而在自动切换时,需要通过config修改redis.conf,sentinel.conf内容,修改 REPLICA OF 或 SLAVE OF 等参数,修改为新master信息,因为无法识别CONFIG命令,导致自动切换失败。
解决方案:
注释掉 rename-command CONFIG 配置,重启slave。
再次停止master,可以自动切换了,192.168.1.102:6377被提升为新Master。
新master:
10489:S 30 Oct 2025 22:17:11.452 * MASTER <-> REPLICA sync started 10489:S 30 Oct 2025 22:17:11.452 # Error condition on socket for SYNC: Connection refused 10489:S 30 Oct 2025 22:17:12.457 * Connecting to MASTER 192.168.1.101:6377 10489:S 30 Oct 2025 22:17:12.457 * MASTER <-> REPLICA sync started 10489:S 30 Oct 2025 22:17:12.457 # Error condition on socket for SYNC: Connection refused 10489:M 30 Oct 2025 22:17:13.391 * Discarding previously cached master state. 10489:M 30 Oct 2025 22:17:13.391 # Setting secondary replication ID to aaaxxxsssfffgghhjjj, valid up to offset: 10382. New replication ID is ggeessllaajjsdsdggg 10489:M 30 Oct 2025 22:17:13.391 * MASTER MODE enabled (user request from 'id=7 addr=192.168.1.101:63198 laddr=192.168.1.102:6377 fd=11 name=sentinel-342a55c0-cmd age=46 idle=0 flags=x db=0 sub=0 psub=0 multi=4 qbuf=188 qbuf-free=40766 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default redir=-1') 10489:M 30 Oct 2025 22:17:13.394 # CONFIG REWRITE executed with success. 10489:M 30 Oct 2025 22:17:13.617 * Replica 192.168.1.103:6377 asks for synchronization 10489:M 30 Oct 2025 22:17:13.617 * Partial resynchronization request from 192.168.1.103:6377 accepted. Sending 161 bytes of backlog starting from offset 10382. 10489:M 30 Oct 2025 22:21:27.050 * 10 changes in 300 seconds. Saving... 10489:M 30 Oct 2025 22:21:27.050 * Background saving started by pid 10782 10782:C 30 Oct 2025 22:21:27.054 * DB saved on disk
slave1指向新master:
25637:S 30 Oct 2025 22:17:12.620 * MASTER <-> REPLICA sync started 25637:S 30 Oct 2025 22:17:12.620 # Error condition on socket for SYNC: Connection refused 25637:S 30 Oct 2025 22:17:13.619 * Connecting to MASTER 192.168.1.102:6377 25637:S 30 Oct 2025 22:17:13.619 * MASTER <-> REPLICA sync started 25637:S 30 Oct 2025 22:17:13.619 * REPLICAOF 192.168.1.102:6377 enabled (user request from 'id=6 addr=192.168.1.101:47275 laddr=192.168.1.103:6377 fd=10 name=sentinel-342a55c0-cmd age=49 idle=0 flags=x db=0 sub=0 psub=0 multi=4 qbuf=338 qbuf-free=40616 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default redir=-1') 25637:S 30 Oct 2025 22:17:13.622 # CONFIG REWRITE executed with success. 25637:S 30 Oct 2025 22:17:13.622 * Non blocking connect for SYNC fired the event. 25637:S 30 Oct 2025 22:17:13.622 * Master replied to PING, replication can continue... 25637:S 30 Oct 2025 22:17:13.623 * Trying a partial resynchronization (request aaaxxxsssfffgghhjjj:10382). 25637:S 30 Oct 2025 22:17:13.623 * Successful partial resynchronization with master. 25637:S 30 Oct 2025 22:17:13.623 # Master replication ID changed to ggeessllaajjsdsdggg 25637:S 30 Oct 2025 22:17:13.623 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization. 25637:S 30 Oct 2025 22:21:25.084 * 10 changes in 300 seconds. Saving... 25637:S 30 Oct 2025 22:21:25.084 * Background saving started by pid 26173 26173:C 30 Oct 2025 22:21:25.087 * DB saved on disk 26173:C 30 Oct 2025 22:21:25.088 * RDB: 0 MB of memory used by copy-on-write 25637:S 30 Oct 2025 22:21:25.185 * Background saving terminated with success
欢迎关注我的公众号《 IT小Chen》