一 安装配置MySQL一主多从复制架构
1.1 操作系统基础配置(主从节点)
1.1.1 修改节点主机名
mgr(centos 7.4):
shell> hostnamectl status shell> cat /etc/hostname shell> hostnamectl set-hostname mgr shell> su -
master:
shell> hostname shell> vi /etc/sysconfig/network shell> hostname master shell> su –
Slave1:
shell> hostname shell> vi /etc/sysconfig/network shell> hostname slave1 shell> su -
Slave2:
shell> hostname shell> vi /etc/sysconfig/network shell> hostname slave2 shell> su -
1.1.2 关闭防火墙
Centos 7.x关闭firewalld服务:
shell> systemctl stop firewalld
关闭操作系统自启动
shell> systemctl disable firewalld
检查关闭情况
shell> systemctl is-enabled firewalld
Centos 6.x关闭firewalld服务:
shell> service iptables status
关闭操作系统自启动
shell> chkconfig iptables off
检查关闭情况
shell> chkconfig iptables --list
关闭selinux:
编辑/etc/selinux/config 文件
shell> vi /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=disabled # SELINUXTYPE= can take one of three two values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy. Only selected processes are protected. # mls - Multi Level Security protection. SELINUXTYPE=targeted
1 .1.3 配置本地yum源
shell> mount /dev/cdrom /mnt/ 或者 shell> mount /dev/sr0 /mnt/ shell> df –h shell> cd /etc/yum.repos.d shell> mkdir bak shell> mv *.repo bak/ shell> vi yum.repo [base] Name=base Baseurl=file:///mnt Enabled=1 Gpgcheck=0 shell> yum clean all shell> yum makecache
1 .1.4 RPM包安装
MySQL软件安装所需的依赖包(通常情况下,可略过此步)
yum -y install gcc* gcc-c++ ncurses* ncurses-devel* cmake* make* perl* bison* libaio-devel* libgcrypt*
1 .1.5 内核参数设置
编辑/etc/security/limits.conf 添加如下内容:
shell>vi /etc/security/limits.conf mysqlsoftnproc 2047 mysqlhardnproc 16384 mysqlsoftnofile1024 mysqlhardnofile65536 mysqlsoftstack 10240
编辑/etc/pam.d/login 添加如下内容:
shell>vi /etc/pam.d/login session required pam_limits.so
编辑vi /etc/profile添加如下内容:
if [ /$USER = "mysql" ]; then if [ /$SHELL = "/bin/ksh" ]; then ulimit -p 16384 ulimit -n 65536 else ulimit -u 16384 -n 65536 fi umask 022 fi
1 .1.6 创建安装用户
创建mysql用户:
shell> groupadd -g 1001 mysql shell> useradd -u 1001 -g mysql mysql shell> echo 'mysql' | passwd --stdin mysql
1.1.7 创建安装目录结构
shell> mkdir -p /usr/local/mysql shell> mkdir -p /var/lib/mysql/data
1 .1.8 /etc/hosts写入
192.168.239.59 mgr 192.168.238.110 master 192.168.238.111 slave1 192.168.238.112 slave2
1 .1.9 时间和时区检查
检查操作系统时区和时间:
shell> date -R
1 .1.10 内存和交换空间检查
检查内存和交换空间:
shell> free -m
交换空间最小要求:
If RAM between 2 GB and 8 GB then 2 times the size fo RAM.
If RAM between 8 GB to 32 GB then 1.5 times the size of RAM.
If RAM greater than 32 GB then 32 GB.
一般来说,在大内存的情况下,交换空间也尽可能的多配置一些,在100G以上内存的主机,建议交换空间配置不会小于64G
1.2 MySQL软件安装部署(主从节点)
1.2.1 MySQ二进制安装包下载链接
下载地址:
https://downloads.mysql.com/archives/community/
1.2.2 二进制包方式安装MySQL数据库软件
shell> mkdir -p /usr/local/mysql shell> mkdir -p /var/lib/mysql/data shell> mkdir -p /var/lib/mysql/binlog shell> mkdir -p /var/lib/mysql/relaylog shell> cd /soft shell> tar zxvf mysql-5.7.23-linux-glibc2.12-x86_64.tar.gz shell> cd mysql-5.7.23-linux-glibc2.12-x86_64 shell> mv * /usr/local/mysql shell> chown -R mysql:mysql /usr/local/mysql shell> chmod -R 775 /usr/local/mysql shell> chown -R mysql:mysql /var/lib/mysql shell> chmod -R 775 /var/lib/mysql shell> cd /usr/local/mysql shell> ./bin/mysqld --initialize --basedir=/usr/local/mysql --datadir=/var/lib/mysql/data --user=mysql
注意:主从节点MySQL数据库初始化时,及时保存数据库root账户临时密码,后续数据库账户登录或者改密都需要用到,若忘记保留临时密码,可用以下方式重置账户密码。
shell> mysqld_safe --defaults-file=/etc/my.cnf --skip-grant-tables & shell> mysql mysql> update mysql.user set authentication_string=password('mysql') where user='root' and host='localhost'; mysql> flush privileges; mysql> alter user 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'mysql'; mysql> alter user 'root'@'localhost' IDENTIFIED BY 'mysql' PASSWORD EXPIRE NEVER; shell> mysqladmin -uroot -pmysql shutdown
1 .2.3 配置环境变量
shell> vi /etc/profile export PATH=/usr/local/mysql/bin:$PATH shell> source /etc/profile
1 .2.4 配置MySQL服务自启动
shell> cp support-files/mysql.server /etc/rc.d/init.d/ shell> chmod +x /etc/rc.d/init.d/mysql.server shell> chkconfig --add mysql.server shell> chkconfig mysql.server --list
1 .2.5 配置master节点my.cnf
shell> vi /etc/my.cnf [client] port = 3306 socket = /tmp/mysql.sock #default-character-set=utf8 [mysqld] #skip-grant-tables port = 3306 socket = /tmp/mysql.sock pid-file = /var/lib/mysql/data/mysql.pid basedir = /usr/local/mysql datadir = /var/lib/mysql/data log_error = /var/lib/mysql/data/mysql.err max_connections = 3000 character-set-server = utf8 #skip-name-resolve = 1 wait_timeout = 10800 interactive_timeout = 10800 server-id = 110 auto_increment_offset = 1 auto_increment_increment = 1 general_log = on general_log_file = /var/lib/mysql/mysql-general.log log-bin = /var/lib/mysql/binlog/mysql-bin binlog-format = ROW expire_logs_days = 7 sync_binlog = 1000 #log-slave-updates = 1 #binlog_rows_query_log_events = 1 max_binlog_size = 1024M relay-log = /var/lib/mysql/relaylog/relay-bin innodb_flush_log_at_trx_commit = 2 innodb_log_buffer_size = 16M innodb_log_file_size = 524288000 innodb_log_files_in_group = 3 innodb_buffer_pool_size = 4294967296 sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES [mysql] #no-auto-rehash #prompt="\\u@\\h:\\d \\r:\\m:\\s>" [myisamchk] key_buffer_size = 20M sort_buffer_size = 20M read_buffer = 2M write_buffer = 2M [mysqlhotcopy] interactive-timeout [mysqldump] quick max_allowed_packet = 16M [mysqld_safe] #pid-file = /var/lib/mysql/data/mysql.pid
1.2.6 配置slave节点my.cnf
shell> vi /etc/my.cnf [client] port = 3306 socket = /tmp/mysql.sock #default-character-set=utf8 [mysqld] #skip-grant-tables port = 3306 socket = /tmp/mysql.sock pid-file = /var/lib/mysql/data/mysql.pid basedir = /usr/local/mysql datadir = /var/lib/mysql/data log_error = /var/lib/mysql/data/mysql.err max_connections = 3000 character-set-server = utf8 #skip-name-resolve = 1 wait_timeout = 10800 interactive_timeout = 10800 server-id = 111 auto_increment_offset = 1 auto_increment_increment = 1 general_log = on general_log_file = /var/lib/mysql/mysql-general.log log-bin = /var/lib/mysql/binlog/mysql-bin binlog-format = ROW expire_logs_days = 7 sync_binlog = 1000 #log-slave-updates = 1 #binlog_rows_query_log_events = 1 max_binlog_size = 1024M relay-log = /var/lib/mysql/relaylog/relay-bin innodb_flush_log_at_trx_commit = 2 innodb_log_buffer_size = 16M innodb_log_file_size = 524288000 innodb_log_files_in_group = 3 innodb_buffer_pool_size = 4294967296 sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES [mysql] #no-auto-rehash #prompt="\\u@\\h:\\d \\r:\\m:\\s>" [myisamchk] key_buffer_size = 20M sort_buffer_size = 20M read_buffer = 2M write_buffer = 2M [mysqlhotcopy] interactive-timeout [mysqldump] quick max_allowed_packet = 16M [mysqld_safe] #pid-file = /var/lib/mysql/data/mysql.pid
1
.2.7 启动MySQL服务并修改root@localhost密码
shell> /etc/init.d/mysql.server start shell> mysqladmin -uroot -p password 'mysql' shell> cd /usr/local/mysql/ shell> ./bin/mysql_secure_installation
1
.3 配置一主多从复制架构(主从节点)
1.3.1 Master节点创建repl复制用户
mysql> create user 'repl'@'192.168.238.%' identified by 'mysql'; mysql> grant replication slave on *.* to 'repl'@'192.168.238.%'; mysql> flush privileges; mysql> show master status;
1
.3.2 Slave1节点创建repl复制用户
mysql> create user 'repl'@'192.168.238.%' identified by 'mysql'; mysql> grant replication slave on *.* to 'repl'@'192.168.238.%'; mysql> flush privileges;
1.3.3 Slave2节点创建repl复制用户
mysql> create user 'repl'@'192.168.238.%' identified by 'mysql'; mysql> grant replication slave on *.* to 'repl'@'192.168.238.%'; mysql> flush privileges;
1
.3.4 Slave1开启复制
mysql> change master to master_host='10.168.238.110',master_port=3306,master_user='repl',master_password='mysql',master_log_file='mysql-bin.000002',master_log_pos=407; mysql> start slave; mysql> show slave status\G;
1
.3.5 Slave2开启复制
mysql> change master to master_host='10.168.238.110',master_port=3306,master_user='repl',master_password='mysql',master_log_file='mysql-bin.000002',master_log_pos=407; mysql> start slave; mysql> show slave status\G;
二 应用节点安装配置MySQL Router
2.1 MySQ Router二进制安装包下载链接
下载地址:
https://downloads.mysql.com/archives/router/
2.2 二进制包方式安装配置MySQL Router
shell> cd /soft/ shell> tar zxvf mysql-router-8.0.12-linux-glibc2.12-x86-64bit.tar.gz shell> mkdir -p /usr/local/mysql-router-8.0.12 shell> cd mysql-router-8.0.12-linux-glibc2.12-x86-64bit shell> mv * /usr/local/mysql-router-8.0.12/ shell> cp /usr/local/mysql-router-8.0.12/share/doc/mysqlrouter/sample_mysqlrouter.conf /usr/local/mysql-router-8.0.12/mysqlrouter.conf shell> vi /usr/local/mysql-router-8.0.12/mysqlrouter.conf [DEFAULT] logging_folder = /usr/local/mysql-router-8.0.12/log plugin_folder = /usr/local/mysql-router-8.0.12/lib/mysqlrouter config_folder = /usr/local/mysql-router-8.0.12/etc runtime_folder = /usr/local/mysql-router-8.0.12/run data_folder = /usr/local/mysql-router-8.0.12/data thread_stack_size = 128 [logger] level = DEBUG [routing:basic_failover] bind_port = 7001 bind_address = 192.168.238.97 destinations = 192.168.238.74:3306,192.168.238.75:3306 routing_strategy = first-available max_connections = 1000 max_connect_errors = 100000 client_connect_timeout = 9 connect_timeout = 3 [routing:ro_route] bind_port = 7002 bind_address = 192.168.238.97 destinations = 192.168.238.74:3306,192.168.238.75:3306 routing_strategy=round-robin max_connections = 1000 max_connect_errors = 100000 client_connect_timeout = 9 connect_timeout = 3
2
.3 启动MySQL Router服务
shell> chown -R mysql:mysql /usr/local/mysql-router-8.0.12 shell> chmod -R 775 /usr/local/mysql-router-8.0.12 shell> /usr/local/mysql-router-8.0.12/bin/mysqlrouter --config /usr/local/mysql-router-8.0.12/mysqlrouter.conf &
三 安装配置MySQL MHA高可用架构
3.1 MHA rpm安装包下载链接
下载地址:https://code.google.com/archive/p/mysql-master-ha/downloads
或者尝试下载地址:
https://github.com/linyue515/mysql-master-ha
3.2 MHA rpm依赖包(perl)下载链接
下载地址:
https://opsx.alibaba.com/mirror/search?q=perl-Log-Dispatch&lang=zh-CN
el6:
el7:
3.3 安装MHA rpm依赖包
el6:
shell> yum install -y perl-Config-Tiny perl-Mail-Sender perl-Mail-Sendmail perl-Email-Date-Format perl-MIME-Types perl-MIME-Lite perl-Parallel-ForkManager perl-Params-Validate perl-Log-Dispatch shell> rpm -qa|egrep "perl-Config-Tiny|perl-Mail-Sender|perl-Mail-Sendmail|perl-Email-Date-Format|perl-MIME-Types|perl-MIME-Lite|perl-Parallel-ForkManager|perl-Params-Validate|perl-Log-Dispatch" shell> yum install -y perl-Config-Tiny-2.12-7.1.el6.noarch.rpm shell> yum install -y perl-Mail-Sender-0.8.16-3.el6.noarch.rpm shell> yum install -y perl-Mail-Sendmail-0.79-12.el6.noarch.rpm shell> yum install -y perl-Email-Date-Format-1.002-5.el6.noarch.rpm shell> yum install -y perl-MIME-Types-1.28-2.el6.noarch.rpm shell> yum install -y perl-MIME-Lite-3.027-2.el6.noarch.rpm shell> yum install -y perl-Parallel-ForkManager-1.20-1.el6.noarch.rpm shell> yum install -y perl-Params-Validate-0.92-3.el6.x86_64.rpm shell> yum install -y perl-Log-Dispatch-2.27-1.el6.noarch.rpm
el7:
shell> yum install -y perl-Config-Tiny perl-Mail-Sender perl-Mail-Sendmail perl-Email-Date-Format perl-MIME-Types perl-MIME-Lite perl-Parallel-ForkManager perl-Params-Validate perl-Log-Dispatch shell> rpm -qa|egrep "perl-Config-Tiny|perl-Mail-Sender|perl-Mail-Sendmail|perl-Email-Date-Format|perl-MIME-Types|perl-MIME-Lite|perl-Parallel-ForkManager|perl-Params-Validate|perl-Log-Dispatch" shell> yum install -y perl-Config-Tiny-2.14-7.el7.noarch.rpm shell> yum install -y perl-Mail-Sender-0.8.23-1.el7.noarch.rpm shell> yum install -y perl-Mail-Sendmail-0.79-21.el7.noarch.rpm shell> yum install -y perl-Email-Date-Format-1.002-15.el7.noarch.rpm shell> yum install -y perl-MIME-Types-1.38-2.el7.noarch.rpm shell> yum install -y perl-MIME-Lite-3.030-1.el7.noarch.rpm shell> yum install -y perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm shell> yum install -y perl-Params-Validate-1.08-4.el7.x86_64.rpm shell> yum install -y perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm
3
.4 安装MHA manager节点
shell> cd /soft/ shell> ls -rtl mha4mysql-*|awk '{print $NF}' mha4mysql-manager-0.57-0.el7.noarch.rpm mha4mysql-node-0.57-0.el7.noarch.rpm shell> yum install -y mha4mysql-node-0.57-0.el7.noarch.rpm shell> yum install -y mha4mysql-manager-0.57-0.el7.noarch.rpm shell> rpm -qa|grep mha4 mha4mysql-node-0.57-0.el7.noarch mha4mysql-manager-0.57-0.el7.noarch shell> rpm -ql mha4mysql-node-0.57-0.el7.noarch /usr/bin/apply_diff_relay_logs /usr/bin/filter_mysqlbinlog /usr/bin/purge_relay_logs /usr/bin/save_binary_logs /usr/share/man/man1/apply_diff_relay_logs.1.gz /usr/share/man/man1/filter_mysqlbinlog.1.gz /usr/share/man/man1/purge_relay_logs.1.gz /usr/share/man/man1/save_binary_logs.1.gz /usr/share/perl5/vendor_perl/MHA/BinlogHeaderParser.pm /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFinder.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFinderElp.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFinderXid.pm /usr/share/perl5/vendor_perl/MHA/NodeConst.pm /usr/share/perl5/vendor_perl/MHA/NodeUtil.pm /usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm shell> rpm -ql mha4mysql-manager-0.57-0.el7.noarch /usr/bin/masterha_check_repl /usr/bin/masterha_check_ssh /usr/bin/masterha_check_status /usr/bin/masterha_conf_host /usr/bin/masterha_manager /usr/bin/masterha_master_monitor /usr/bin/masterha_master_switch /usr/bin/masterha_secondary_check /usr/bin/masterha_stop /usr/share/man/man1/masterha_check_repl.1.gz /usr/share/man/man1/masterha_check_ssh.1.gz /usr/share/man/man1/masterha_check_status.1.gz /usr/share/man/man1/masterha_conf_host.1.gz /usr/share/man/man1/masterha_manager.1.gz /usr/share/man/man1/masterha_master_monitor.1.gz /usr/share/man/man1/masterha_master_switch.1.gz /usr/share/man/man1/masterha_secondary_check.1.gz /usr/share/man/man1/masterha_stop.1.gz /usr/share/perl5/vendor_perl/MHA/Config.pm /usr/share/perl5/vendor_perl/MHA/DBHelper.pm /usr/share/perl5/vendor_perl/MHA/FileStatus.pm /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm /usr/share/perl5/vendor_perl/MHA/ManagerAdmin.pm /usr/share/perl5/vendor_perl/MHA/ManagerAdminWrapper.pm /usr/share/perl5/vendor_perl/MHA/ManagerConst.pm /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm /usr/share/perl5/vendor_perl/MHA/MasterFailover.pm /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm /usr/share/perl5/vendor_perl/MHA/MasterRotate.pm /usr/share/perl5/vendor_perl/MHA/SSHCheck.pm /usr/share/perl5/vendor_perl/MHA/Server.pm /usr/share/perl5/vendor_perl/MHA/ServerManager.pm
3 .5 安装MHA node节点
shell> cd /soft/ shell> ls -rtl mha4mysql-*|awk '{print $NF}' mha4mysql-node-0.57-0.el7.noarch.rpm shell> yum install -y mha4mysql-node-0.57-0.el7.noarch.rpm shell> rpm -qa|grep mha4 mha4mysql-node-0.57-0.el7.noarch shell> rpm -ql mha4mysql-node-0.57-0.el7.noarch /usr/bin/apply_diff_relay_logs /usr/bin/filter_mysqlbinlog /usr/bin/purge_relay_logs /usr/bin/save_binary_logs /usr/share/man/man1/apply_diff_relay_logs.1.gz /usr/share/man/man1/filter_mysqlbinlog.1.gz /usr/share/man/man1/purge_relay_logs.1.gz /usr/share/man/man1/save_binary_logs.1.gz /usr/share/perl5/vendor_perl/MHA/BinlogHeaderParser.pm /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFinder.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFinderElp.pm /usr/share/perl5/vendor_perl/MHA/BinlogPosFinderXid.pm /usr/share/perl5/vendor_perl/MHA/NodeConst.pm /usr/share/perl5/vendor_perl/MHA/NodeUtil.pm /usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm
3.6 配置MHA节点间SSH互信
创建SSH连接公钥和私钥
[root@mgr ~]# ssh-keygen -t rsa [root@mgr .ssh]# cat *.pub > authorized_keys [root@master ~]# ssh-keygen -t rsa [root@slave1 ~]# ssh-keygen -t rsa [root@slave2 ~]# ssh-keygen -t rsa
将SSH公钥共享
[root@mgr .ssh]# scp authorized_keys master:/root/.ssh/ [root@master .ssh]# cat *.pub >> authorized_keys [root@master .ssh]# scp authorized_keys slave1:/root/.ssh/ [root@slave1 .ssh]# cat *.pub >> authorized_keys [root@slave1 .ssh]# scp authorized_keys slave2:/root/.ssh/ [root@slave2 .ssh]# cat *.pub >> authorized_keys [root@slave2 .ssh]# scp authorized_keys mgr:/root/.ssh/ [root@slave2 .ssh]# scp authorized_keys master:/root/.ssh/ [root@slave2 .ssh]# scp authorized_keys slave1:/root/.ssh/
生效SSH无密钥连接
[root@mgr ~]# ssh master [root@mgr ~]# ssh slave1 [root@mgr ~]# ssh slave2 [root@master ~]# ssh mgr [root@master ~]# ssh slave1 [root@master ~]# ssh slave2 [root@slave1 ~]# ssh mgr [root@slave1 ~]# ssh master [root@slave1 ~]# ssh slave2 [root@slave2 ~]# ssh mgr [root@slave2 ~]# ssh master [root@slave2 ~]# ssh slave1
注意:不能禁止 password 登陆,否则会出现错误
3.7 一主多从节点复制注意事项
注意一:
binlog-do-db 和 replicate-ignore-db 设置必须一致。 MHA在启动时候会检测过滤规则,如果过滤规则不同,MHA将不启动监控和故障转移。
注意二:
设置从库对外提供 只读服务,之所以没有写进 /etc/my.cnf 配置文件,是因为slave节点随时可能会提升为master节点
[root@slave1 ~]# mysql -uroot -pmysql -e"set global read_only=on" [root@slave2 ~]# mysql -uroot -pmysql -e"set global read_only=on"
注意三:
MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。
[root@slave1 ~]# mysql -uroot -pmysql -e"set global relay_log_purge=off" [root@slave2 ~]# mysql -uroot -pmysql -e"set global relay_log_purge=off"
3
.8 配置Slave节点relaylog自动清理脚本
Slave1节点:
[root@slave1 ~]# vi /root/scripts/purge_relay_log.sh #!/bin/bash user=root passwd=mysql port=3306 log_dir='/var/log/masterha/relaylogs_purge' work_dir='/var/tmp' purge='/usr/bin/purge_relay_logs' if [ ! -d $log_dir ] then mkdir $log_dir -p fi $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1 [root@slave1 ~]# crontab -e 30 23 * * * /bin/bash /root/scripts/purge_relay_log.sh Slave2节点: [root@slave2 ~]# vi /root/scripts/purge_relay_log.sh #!/bin/bash user=root passwd=mysql port=3306 log_dir='/var/log/masterha/relaylogs_purge' work_dir='/var/tmp' purge='/usr/bin/purge_relay_logs' if [ ! -d $log_dir ] then mkdir $log_dir -p fi $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
[root@slave2 ~]# crontab -e 30 23 * * * /bin/bash /root/scripts/purge_relay_log.sh
日志输出参考:
shell> more /var/log/masterha/relaylogs_purge/purge_relay_logs.log 2019-01-10 18:22:16: purge_relay_logs script started. Found relay_log.info: /var/lib/mysql/data/relay-log.info Removing hard linked relay log files relay-bin* under /var/tmp.. done. Current relay log file: /var/lib/mysql/relaylog/relay-bin.000028 Archiving unused relay log files (up to /var/lib/mysql/relaylog/relay-bin.000027) ... Creating hard link for /var/lib/mysql/relaylog/relay-bin.000001 under /var/tmp/relay-bin.000001 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000002 under /var/tmp/relay-bin.000002 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000003 under /var/tmp/relay-bin.000003 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000004 under /var/tmp/relay-bin.000004 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000005 under /var/tmp/relay-bin.000005 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000006 under /var/tmp/relay-bin.000006 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000007 under /var/tmp/relay-bin.000007 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000008 under /var/tmp/relay-bin.000008 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000009 under /var/tmp/relay-bin.000009 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000010 under /var/tmp/relay-bin.000010 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000011 under /var/tmp/relay-bin.000011 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000012 under /var/tmp/relay-bin.000012 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000013 under /var/tmp/relay-bin.000013 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000014 under /var/tmp/relay-bin.000014 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000015 under /var/tmp/relay-bin.000015 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000016 under /var/tmp/relay-bin.000016 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000017 under /var/tmp/relay-bin.000017 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000018 under /var/tmp/relay-bin.000018 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000019 under /var/tmp/relay-bin.000019 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000020 under /var/tmp/relay-bin.000020 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000021 under /var/tmp/relay-bin.000021 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000022 under /var/tmp/relay-bin.000022 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000023 under /var/tmp/relay-bin.000023 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000024 under /var/tmp/relay-bin.000024 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000025 under /var/tmp/relay-bin.000025 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000026 under /var/tmp/relay-bin.000026 .. ok. Creating hard link for /var/lib/mysql/relaylog/relay-bin.000027 under /var/tmp/relay-bin.000027 .. ok. Creating hard links for unused relay log files completed. Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_l og_purge=0; .. ok. Removing hard linked relay log files relay-bin* under /var/tmp.. done. 2019-01-10 18:22:20: All relay log purging operations succeeded.
3
.9 配置MySQL MHA
[root@mgr ~]# vi /etc/masterha/app1.cnf [server default] ###set manager workdir manager_workdir=/var/log/masterha/app1.log ###set manager logfile manager_log=/var/log/masterha/app1/manager.log ###get master node binary log dir master_binlog_dir=/var/lib/mysql/binlog ###set high availability auto failover switch script master_ip_failover_script= /etc/masterha/master_ip_failover ###set high availability manual planned online switch script master_ip_online_change_script= /etc/masterha/master_ip_online_change ###monitor user password password=mysql ###monitor user user=root ###Set the monitoring of master node, the time interval between pings, the default is 3 seconds, and the ###failover is automatically performed when there is no response after three attempts ping_interval=1 ###Set the save location of binlog in case of remote mysql switch remote_workdir=/tmp ###set repl user password repl_password=mysql ###set repl user repl_user=repl ###Sets the script for the alarm to be sent when the switch occurs report_script=/usr/bin/send_report ### secondary_check_script= /usr/bin/masterha_secondary_check -s master -s slave1 ###Set the fault host script to be closed after the fault occurs shutdown_script="" ###set ssh login user ssh_user=root [server1] hostname=master port=3306 [server2] hostname=slave1 port=3306 ###Set as the candidate master. If this parameter is set, the slave slave will be promoted to the master ###slave even if this master slave is not the latest slave in the cluster candidate_master=1 ###By default if a slave master behind the 100 m relay logs, MHA will not choose the slave as a new ###master, because for the slave recovery will take a long time, by setting the check repl delay = 0, MHA ###trigger switch when choosing a new master will ignore replication delay, this parameter to set up a ###candidate master = 1 host is very useful, Because this candidate master must be the new master in the process of switching check_repl_delay=0 [server3] hostname=slave2 port=3306
3
.10 配置Vip及MAH切换脚本
Master节点添加VIP:
[root@master ~]# /sbin/ifconfig eth0:1 192.168.238.200/24 [root@master ~]# ip a|grep eth0 2: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.238.110/24 brd 192.168.238.255 scope global eth0 inet 192.168.238.200/24 brd 192.168.238.255 scope global secondary eth0:1 [root@mgr ~]# vi /etc/masterha/master_ip_failover #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '192.168.238.200/24'; my $key = '1'; my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_ master_ip=ip --new_master_port=port\n"; } [root@mgr ~]# vi /etc/masterha/master_ip_online_change #!/usr/bin/env perl use strict; use warnings FATAL =>'all'; use Getopt::Long; my $vip = '192.168.238.200/24'; # Virtual IP my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; my $exit_code = 0; my ( $command, $orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user, ); GetOptions( 'command=s' => \$command, 'orig_master_is_new_slave' => \$orig_master_is_new_slave, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, 'orig_master_password=s' => \$orig_master_password, 'orig_master_ssh_user=s' => \$orig_master_ssh_user, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, 'new_master_ssh_user=s' => \$new_master_ssh_user, ); exit &main(); sub main { #print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { print "\n\n\n***************************************************************\n"; print "Disabling the VIP - $vip on old master: $orig_master_host\n"; print "***************************************************************\n\n\n\n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "\n\n\n***************************************************************\n"; print "Enabling the VIP - $vip on new master: $new_master_host \n"; print "***************************************************************\n\n\n\n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover –command=start|stop|stopssh|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=po rt –new_master_host=host –new_master_ip=ip –new_master_port=port\n"; }
3
.11 检查SSH配置
通过masterha_check_ssh脚本检查MHA Manger到所有MHA Node的SSH连接状态:
[root@mgr ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf [root@mgr ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf Sun Jan 13 07:45:07 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Jan 13 07:45:07 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Jan 13 07:45:07 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Jan 13 07:45:07 2019 - [info] Starting SSH connection tests.. Sun Jan 13 07:45:10 2019 - [debug] Sun Jan 13 07:45:07 2019 - [debug] Connecting via SSH from root@master(192.168.238.110:22) to root@slave1(192.168.238.111:22).. Sun Jan 13 07:45:08 2019 - [debug] ok. Sun Jan 13 07:45:08 2019 - [debug] Connecting via SSH from root@master(192.168.238.110:22) to root@slave2(192.168.238.112:22).. Sun Jan 13 07:45:10 2019 - [debug] ok. Sun Jan 13 07:45:11 2019 - [debug] Sun Jan 13 07:45:08 2019 - [debug] Connecting via SSH from root@slave1(192.168.238.111:22) to root@master(192.168.238.110:22).. Sun Jan 13 07:45:09 2019 - [debug] ok. Sun Jan 13 07:45:09 2019 - [debug] Connecting via SSH from root@slave1(192.168.238.111:22) to root@slave2(192.168.238.112:22).. Sun Jan 13 07:45:10 2019 - [debug] ok. Sun Jan 13 07:45:11 2019 - [debug] Sun Jan 13 07:45:08 2019 - [debug] Connecting via SSH from root@slave2(192.168.238.112:22) to root@master(192.168.238.110:22).. Sun Jan 13 07:45:09 2019 - [debug] ok. Sun Jan 13 07:45:09 2019 - [debug] Connecting via SSH from root@slave2(192.168.238.112:22) to root@slave1(192.168.238.111:22).. Sun Jan 13 07:45:10 2019 - [debug] ok. Sun Jan 13 07:45:11 2019 - [info] All SSH connection tests passed successfully.
3
.12 检查复制集群状态
Slave节点添加mysqlbinlog、mysql工具软链:
[root@slave1 ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog [root@slave1 ~]# ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql [root@slave2 ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog [root@slave2 ~]# ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
通过masterha_check_rep脚本整个复制集群数据同步情况:
[root@mgr ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf Sun Jan 13 07:49:21 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Jan 13 07:49:21 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Jan 13 07:49:21 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Jan 13 07:49:21 2019 - [info] MHA::MasterMonitor version 0.57. Sun Jan 13 07:49:23 2019 - [info] GTID failover mode = 0 Sun Jan 13 07:49:23 2019 - [info] Dead Servers: Sun Jan 13 07:49:23 2019 - [info] Alive Servers: Sun Jan 13 07:49:23 2019 - [info] master(192.168.238.110:3306) Sun Jan 13 07:49:23 2019 - [info] slave1(192.168.238.111:3306) Sun Jan 13 07:49:23 2019 - [info] slave2(192.168.238.112:3306) Sun Jan 13 07:49:23 2019 - [info] Alive Slaves: Sun Jan 13 07:49:23 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Sun Jan 13 07:49:23 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Sun Jan 13 07:49:23 2019 - [info] Primary candidate for the new Master (candidate_master is set) Sun Jan 13 07:49:23 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Sun Jan 13 07:49:23 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Sun Jan 13 07:49:23 2019 - [info] Current Alive Master: master(192.168.238.110:3306) Sun Jan 13 07:49:23 2019 - [info] Checking slave configurations.. Sun Jan 13 07:49:23 2019 - [info] Checking replication filtering settings.. Sun Jan 13 07:49:23 2019 - [info] binlog_do_db= , binlog_ignore_db= Sun Jan 13 07:49:23 2019 - [info] Replication filtering check ok. Sun Jan 13 07:49:23 2019 - [info] GTID (with auto-pos) is not supported Sun Jan 13 07:49:23 2019 - [info] Starting SSH connection tests.. Sun Jan 13 07:49:26 2019 - [info] All SSH connection tests passed successfully. Sun Jan 13 07:49:26 2019 - [info] Checking MHA Node version.. Sun Jan 13 07:49:27 2019 - [info] Version check ok. Sun Jan 13 07:49:27 2019 - [info] Checking SSH publickey authentication settings on the current master.. Sun Jan 13 07:49:28 2019 - [info] HealthCheck: SSH to master is reachable. Sun Jan 13 07:49:29 2019 - [info] Master MHA Node version is 0.57. Sun Jan 13 07:49:29 2019 - [info] Checking recovery script configurations on master(192.168.238.110:3306).. Sun Jan 13 07:49:29 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --start_file=mysql-bin.000014 Sun Jan 13 07:49:29 2019 - [info] Connecting to root@192.168.238.110(master:22).. Creating /tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql/binlog, up to mysql-bin.000015 Sun Jan 13 07:49:29 2019 - [info] Binlog setting check done. Sun Jan 13 07:49:29 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sun Jan 13 07:49:29 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave1 --slave_ip=192.168.238.111 --slave_port=3306 --workdir=/tmp --target_version=5.7.23-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/data/relay-log.info --relay_dir=/var/lib/mysql/data/ --slave_pass=xxx Sun Jan 13 07:49:29 2019 - [info] Connecting to root@192.168.238.111(slave1:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/data/relay-log.info ... ok. Relay log found at /var/lib/mysql/relaylog, up to relay-bin.000032 Temporary relay log file is /var/lib/mysql/relaylog/relay-bin.000032 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun Jan 13 07:49:32 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave2 --slave_ip=192.168.238.112 --slave_port=3306 --workdir=/tmp --target_version=5.7.23-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/data/relay-log.info --relay_dir=/var/lib/mysql/data/ --slave_pass=xxx Sun Jan 13 07:49:32 2019 - [info] Connecting to root@192.168.238.112(slave2:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/data/relay-log.info ... ok. Relay log found at /var/lib/mysql/relaylog, up to relay-bin.000032 Temporary relay log file is /var/lib/mysql/relaylog/relay-bin.000032 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun Jan 13 07:49:33 2019 - [info] Slaves settings check done. Sun Jan 13 07:49:33 2019 - [info] master(192.168.238.110:3306) (current master) +--slave1(192.168.238.111:3306) +--slave2(192.168.238.112:3306) Sun Jan 13 07:49:33 2019 - [info] Checking replication health on slave1.. Sun Jan 13 07:49:33 2019 - [info] ok. Sun Jan 13 07:49:33 2019 - [info] Checking replication health on slave2.. Sun Jan 13 07:49:33 2019 - [info] ok. Sun Jan 13 07:49:33 2019 - [info] Checking master_ip_failover_script status: Sun Jan 13 07:49:33 2019 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=master --orig_master_ip=192.168.238.110 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.238.200/24=== Checking the Status of the script.. OK Sun Jan 13 07:49:33 2019 - [info] OK. Sun Jan 13 07:49:33 2019 - [warning] shutdown_script is not defined. Sun Jan 13 07:49:33 2019 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
3
.13 检查MHA manager节点状态
通过master_check_status脚本检查Manager的状态:
[root@mgr ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
3
.14 启动MHA manager监控
[root@mgr ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 & [1] 22620 [root@mgr ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:22620) is running(0:PING_OK), master:master
注意:使用nohup命令后台方式启动MHA监控服务时,当CRT软件关闭或者退出会导致MHA监控服务异常宕机,疑似SecureCRT软件BUG,异常日志如下:
Sun Jan 13 08:05:08 2019 - [info] Got terminate signal. Exit.
以脚本方式启动MHA监控服务
[root@mgr ~]# vi /etc/masterha/mha_mgr_monitor.sh #!/bin/bash /usr/bin/nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 & [root@mgr masterha]# ./mha_mgr_monitor.sh
监控输出日志:
[root@mgr ~]# tail -f /var/log/masterha/app1/manager.log Sun Jan 13 07:59:00 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Jan 13 07:59:00 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Jan 13 07:59:00 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Jan 13 07:59:00 2019 - [info] MHA::MasterMonitor version 0.57. Sun Jan 13 07:59:01 2019 - [info] GTID failover mode = 0 Sun Jan 13 07:59:01 2019 - [info] Dead Servers: Sun Jan 13 07:59:01 2019 - [info] Alive Servers: Sun Jan 13 07:59:01 2019 - [info] master(192.168.238.110:3306) Sun Jan 13 07:59:01 2019 - [info] slave1(192.168.238.111:3306) Sun Jan 13 07:59:01 2019 - [info] slave2(192.168.238.112:3306) Sun Jan 13 07:59:01 2019 - [info] Alive Slaves: Sun Jan 13 07:59:01 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Sun Jan 13 07:59:01 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Sun Jan 13 07:59:01 2019 - [info] Primary candidate for the new Master (candidate_master is set) Sun Jan 13 07:59:01 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Sun Jan 13 07:59:01 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Sun Jan 13 07:59:01 2019 - [info] Current Alive Master: master(192.168.238.110:3306) Sun Jan 13 07:59:01 2019 - [info] Checking slave configurations.. Sun Jan 13 07:59:01 2019 - [info] Checking replication filtering settings.. Sun Jan 13 07:59:01 2019 - [info] binlog_do_db= , binlog_ignore_db= Sun Jan 13 07:59:01 2019 - [info] Replication filtering check ok. Sun Jan 13 07:59:01 2019 - [info] GTID (with auto-pos) is not supported Sun Jan 13 07:59:01 2019 - [info] Starting SSH connection tests.. Sun Jan 13 07:59:05 2019 - [info] All SSH connection tests passed successfully. Sun Jan 13 07:59:05 2019 - [info] Checking MHA Node version.. Sun Jan 13 07:59:06 2019 - [info] Version check ok. Sun Jan 13 07:59:06 2019 - [info] Checking SSH publickey authentication settings on the current master.. Sun Jan 13 07:59:07 2019 - [info] HealthCheck: SSH to master is reachable. Sun Jan 13 07:59:07 2019 - [info] Master MHA Node version is 0.57. Sun Jan 13 07:59:07 2019 - [info] Checking recovery script configurations on master(192.168.238.110:3306).. Sun Jan 13 07:59:07 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --start_file=mysql-bin.000014 Sun Jan 13 07:59:07 2019 - [info] Connecting to root@192.168.238.110(master:22).. Creating /tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql/binlog, up to mysql-bin.000015 Sun Jan 13 07:59:08 2019 - [info] Binlog setting check done. Sun Jan 13 07:59:08 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sun Jan 13 07:59:08 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave1 --slave_ip=192.168.238.111 --slave_port=3306 --workdir=/tmp --target_version=5.7.23-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/data/relay-log.info --relay_dir=/var/lib/mysql/data/ --slave_pass=xxx Sun Jan 13 07:59:08 2019 - [info] Connecting to root@192.168.238.111(slave1:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/data/relay-log.info ... ok. Relay log found at /var/lib/mysql/relaylog, up to relay-bin.000032 Temporary relay log file is /var/lib/mysql/relaylog/relay-bin.000032 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun Jan 13 07:59:11 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave2 --slave_ip=192.168.238.112 --slave_port=3306 --workdir=/tmp --target_version=5.7.23-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/data/relay-log.info --relay_dir=/var/lib/mysql/data/ --slave_pass=xxx Sun Jan 13 07:59:11 2019 - [info] Connecting to root@192.168.238.112(slave2:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/data/relay-log.info ... ok. Relay log found at /var/lib/mysql/relaylog, up to relay-bin.000032 Temporary relay log file is /var/lib/mysql/relaylog/relay-bin.000032 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun Jan 13 07:59:12 2019 - [info] Slaves settings check done. Sun Jan 13 07:59:12 2019 - [info] master(192.168.238.110:3306) (current master) +--slave1(192.168.238.111:3306) +--slave2(192.168.238.112:3306) Sun Jan 13 07:59:12 2019 - [info] Checking master_ip_failover_script status: Sun Jan 13 07:59:12 2019 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=master --orig_master_ip=192.168.238.110 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.238.200/24=== Checking the Status of the script.. OK Sun Jan 13 07:59:12 2019 - [info] OK. Sun Jan 13 07:59:12 2019 - [warning] shutdown_script is not defined. Sun Jan 13 07:59:12 2019 - [info] Set master ping interval 1 seconds. Sun Jan 13 07:59:12 2019 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s master -s slave1 Sun Jan 13 07:59:12 2019 - [info] Starting ping health check on master(192.168.238.110:3306).. Sun Jan 13 07:59:12 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
3
.15 关闭MHA manager监控
[root@mgr ~]# masterha_stop --conf=/etc/masterha/app1.cnf Stopped app1 successfully. [1]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 [root@mgr ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
四 MHA高可用架构下master节点切换
4.1 Master节点自动故障切换
模拟Master节点MySQL服务异常宕机
[root@master ~]# mysqladmin -uroot -pmysql shutdown [root@master ~]# ps -ef|grep mysqld|grep -v grep Mgr节点切换日志输出: Mon Jan 14 13:38:47 2019 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Mon Jan 14 13:38:47 2019 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s master -s slave1 --user=root --master_host=master --master_ip=192.168.238.110 --master_port=3306 --master_user=root --master_password=mysql --ping_type=SELECT Mon Jan 14 13:38:47 2019 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin Monitoring server master is reachable, Master is not reachable from master. OK. Mon Jan 14 13:38:48 2019 - [info] HealthCheck: SSH to master is reachable. Mon Jan 14 13:38:48 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.238.110' (111)) Mon Jan 14 13:38:48 2019 - [warning] Connection failed 2 time(s).. Monitoring server slave1 is reachable, Master is not reachable from slave1. OK. Mon Jan 14 13:38:48 2019 - [info] Master is not reachable from all other monitoring servers. Failover should start. Mon Jan 14 13:38:49 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.238.110' (111)) Mon Jan 14 13:38:49 2019 - [warning] Connection failed 3 time(s).. Mon Jan 14 13:38:50 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.238.110' (111)) Mon Jan 14 13:38:50 2019 - [warning] Connection failed 4 time(s).. Mon Jan 14 13:38:50 2019 - [warning] Master is not reachable from health checker! Mon Jan 14 13:38:50 2019 - [warning] Master master(192.168.238.110:3306) is not reachable! Mon Jan 14 13:38:50 2019 - [warning] SSH is reachable. Mon Jan 14 13:38:50 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status.. Mon Jan 14 13:38:50 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Jan 14 13:38:50 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Mon Jan 14 13:38:50 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Mon Jan 14 13:38:51 2019 - [info] GTID failover mode = 0 Mon Jan 14 13:38:51 2019 - [info] Dead Servers: Mon Jan 14 13:38:51 2019 - [info] master(192.168.238.110:3306) Mon Jan 14 13:38:51 2019 - [info] Alive Servers: Mon Jan 14 13:38:51 2019 - [info] slave1(192.168.238.111:3306) Mon Jan 14 13:38:51 2019 - [info] slave2(192.168.238.112:3306) Mon Jan 14 13:38:51 2019 - [info] Alive Slaves: Mon Jan 14 13:38:51 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Jan 14 13:38:51 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Mon Jan 14 13:38:51 2019 - [info] Primary candidate for the new Master (candidate_master is set) Mon Jan 14 13:38:51 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Jan 14 13:38:51 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Mon Jan 14 13:38:51 2019 - [info] Checking slave configurations.. Mon Jan 14 13:38:51 2019 - [info] Checking replication filtering settings.. Mon Jan 14 13:38:51 2019 - [info] Replication filtering check ok. Mon Jan 14 13:38:51 2019 - [info] Master is down! Mon Jan 14 13:38:51 2019 - [info] Terminating monitoring script. Mon Jan 14 13:38:51 2019 - [info] Got exit code 20 (Master dead). Mon Jan 14 13:38:51 2019 - [info] MHA::MasterFailover version 0.57. Mon Jan 14 13:38:51 2019 - [info] Starting master failover. Mon Jan 14 13:38:51 2019 - [info] Mon Jan 14 13:38:51 2019 - [info] * Phase 1: Configuration Check Phase.. Mon Jan 14 13:38:51 2019 - [info] Mon Jan 14 13:38:52 2019 - [info] GTID failover mode = 0 Mon Jan 14 13:38:52 2019 - [info] Dead Servers: Mon Jan 14 13:38:52 2019 - [info] master(192.168.238.110:3306) Mon Jan 14 13:38:52 2019 - [info] Checking master reachability via MySQL(double check)... Mon Jan 14 13:38:52 2019 - [info] ok. Mon Jan 14 13:38:52 2019 - [info] Alive Servers: Mon Jan 14 13:38:52 2019 - [info] slave1(192.168.238.111:3306) Mon Jan 14 13:38:52 2019 - [info] slave2(192.168.238.112:3306) Mon Jan 14 13:38:52 2019 - [info] Alive Slaves: Mon Jan 14 13:38:52 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Jan 14 13:38:52 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Mon Jan 14 13:38:52 2019 - [info] Primary candidate for the new Master (candidate_master is set) Mon Jan 14 13:38:52 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Jan 14 13:38:52 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Mon Jan 14 13:38:52 2019 - [info] Starting Non-GTID based failover. Mon Jan 14 13:38:52 2019 - [info] Mon Jan 14 13:38:52 2019 - [info] ** Phase 1: Configuration Check Phase completed. Mon Jan 14 13:38:52 2019 - [info] Mon Jan 14 13:38:52 2019 - [info] * Phase 2: Dead Master Shutdown Phase.. Mon Jan 14 13:38:52 2019 - [info] Mon Jan 14 13:38:52 2019 - [info] Forcing shutdown so that applications never connect to the current master.. Mon Jan 14 13:38:52 2019 - [info] Executing master IP deactivation script: Mon Jan 14 13:38:52 2019 - [info] /etc/masterha/master_ip_failover --orig_master_host=master --orig_master_ip=192.168.238.110 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.238.200/24=== Disabling the VIP on old master: master Mon Jan 14 13:38:53 2019 - [info] done. Mon Jan 14 13:38:53 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Mon Jan 14 13:38:53 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed. Mon Jan 14 13:38:53 2019 - [info] Mon Jan 14 13:38:53 2019 - [info] * Phase 3: Master Recovery Phase.. Mon Jan 14 13:38:53 2019 - [info] Mon Jan 14 13:38:53 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Mon Jan 14 13:38:53 2019 - [info] Mon Jan 14 13:38:53 2019 - [info] The latest binary log file/position on all slaves is mysql-bin.000014:674 Mon Jan 14 13:38:53 2019 - [info] Latest slaves (Slaves that received relay log files to the latest): Mon Jan 14 13:38:53 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Jan 14 13:38:53 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Mon Jan 14 13:38:53 2019 - [info] Primary candidate for the new Master (candidate_master is set) Mon Jan 14 13:38:53 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Jan 14 13:38:53 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Mon Jan 14 13:38:53 2019 - [info] The oldest binary log file/position on all slaves is mysql-bin.000014:674 Mon Jan 14 13:38:53 2019 - [info] Oldest slaves: Mon Jan 14 13:38:53 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Jan 14 13:38:53 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Mon Jan 14 13:38:53 2019 - [info] Primary candidate for the new Master (candidate_master is set) Mon Jan 14 13:38:53 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Jan 14 13:38:53 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Mon Jan 14 13:38:53 2019 - [info] Mon Jan 14 13:38:53 2019 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Mon Jan 14 13:38:53 2019 - [info] Mon Jan 14 13:38:53 2019 - [info] Fetching dead master's binary logs.. Mon Jan 14 13:38:53 2019 - [info] Executing command on the dead master master(192.168.238.110:3306): save_binary_logs --command=save --start_file=mysql-bin.000014 --start_pos=674 --binlog_dir=/var/lib/mysql/binlog --output_file=/tmp/saved_master_binlog_from_master_3306_20190114133851.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 Creating /tmp if not exists.. ok. Concat binary/relay logs from mysql-bin.000014 pos 674 to mysql-bin.000015 EOF into /tmp/saved_master_binlog_from_master_3306_20190114133851.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 154.. ok. No need to dump effective binlog data from /var/lib/mysql/binlog/mysql-bin.000014 (pos starts 674, filesize 674). Skipping. Dumping binlog head events (rotate events), skipping format description events from /var/lib/mysql/binlog/mysql-bin.000015.. Binlog Checksum enabled dumped up to pos 154. ok. Dumping effective binlog data from /var/lib/mysql/binlog/mysql-bin.000015 position 154 to tail(177).. ok. sh: mysqlbinlog: command not found Failed to save binary log: /tmp/saved_master_binlog_from_master_3306_20190114133851.binlog is broken! at /usr/bin/save_binary_logs line 176 Mon Jan 14 13:38:54 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln760] Failed to save binary log events from the orig master. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Mon Jan 14 13:38:54 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] * Phase 3.3: Determining New Master Phase.. Mon Jan 14 13:38:54 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Mon Jan 14 13:38:54 2019 - [info] All slaves received relay logs to the same position. No need to resync each other. Mon Jan 14 13:38:54 2019 - [info] Searching new master from slaves.. Mon Jan 14 13:38:54 2019 - [info] Candidate masters from the configuration file: Mon Jan 14 13:38:54 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Jan 14 13:38:54 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Mon Jan 14 13:38:54 2019 - [info] Primary candidate for the new Master (candidate_master is set) Mon Jan 14 13:38:54 2019 - [info] Non-candidate masters: Mon Jan 14 13:38:54 2019 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Mon Jan 14 13:38:54 2019 - [info] New master is slave1(192.168.238.111:3306) Mon Jan 14 13:38:54 2019 - [info] Starting master failover.. Mon Jan 14 13:38:54 2019 - [info] From: master(192.168.238.110:3306) (current master) +--slave1(192.168.238.111:3306) +--slave2(192.168.238.112:3306) To: slave1(192.168.238.111:3306) (new master) +--slave2(192.168.238.112:3306) Mon Jan 14 13:38:54 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Mon Jan 14 13:38:54 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Mon Jan 14 13:38:54 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] * Phase 3.4: Master Log Apply Phase.. Mon Jan 14 13:38:54 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Mon Jan 14 13:38:54 2019 - [info] Starting recovery on slave1(192.168.238.111:3306).. Mon Jan 14 13:38:54 2019 - [info] This server has all relay logs. Waiting all logs to be applied.. Mon Jan 14 13:38:54 2019 - [info] done. Mon Jan 14 13:38:54 2019 - [info] All relay logs were successfully applied. Mon Jan 14 13:38:54 2019 - [info] Getting new master's binlog name and position.. Mon Jan 14 13:38:54 2019 - [info] mysql-bin.000001:1031 Mon Jan 14 13:38:54 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='slave1 or 192.168.238.111', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=1031, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Mon Jan 14 13:38:54 2019 - [info] Executing master IP activate script: Mon Jan 14 13:38:54 2019 - [info] /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=master --orig_master_ip=192.168.238.110 --orig_master_port=3306 --new_master_host=slave1 --new_master_ip=192.168.238.111 --new_master_port=3306 --new_master_user='root' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_password IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.238.200/24=== Enabling the VIP - 192.168.238.200/24 on the new master - slave1 Mon Jan 14 13:38:54 2019 - [info] OK. Mon Jan 14 13:38:54 2019 - [info] Setting read_only=0 on slave1(192.168.238.111:3306).. Mon Jan 14 13:38:54 2019 - [info] ok. Mon Jan 14 13:38:54 2019 - [info] ** Finished master recovery successfully. Mon Jan 14 13:38:54 2019 - [info] * Phase 3: Master Recovery Phase completed. Mon Jan 14 13:38:54 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] * Phase 4: Slaves Recovery Phase.. Mon Jan 14 13:38:54 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Mon Jan 14 13:38:54 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] -- Slave diff file generation on host slave2(192.168.238.112:3306) started, pid: 3394. Check tmp log /var/log/masterha/app1.log/slave2_3306_20190114133851.log if it takes time.. Mon Jan 14 13:38:55 2019 - [info] Mon Jan 14 13:38:55 2019 - [info] Log messages from slave2 ... Mon Jan 14 13:38:56 2019 - [info] Mon Jan 14 13:38:54 2019 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Mon Jan 14 13:38:56 2019 - [info] End of log messages from slave2. Mon Jan 14 13:38:56 2019 - [info] -- slave2(192.168.238.112:3306) has the latest relay log events. Mon Jan 14 13:38:56 2019 - [info] Generating relay diff files from the latest slave succeeded. Mon Jan 14 13:38:56 2019 - [info] Mon Jan 14 13:38:56 2019 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Mon Jan 14 13:38:56 2019 - [info] Mon Jan 14 13:38:56 2019 - [info] -- Slave recovery on host slave2(192.168.238.112:3306) started, pid: 3396. Check tmp log /var/log/masterha/app1.log/slave2_3306_20190114133851.log if it takes time.. Mon Jan 14 13:38:57 2019 - [info] Mon Jan 14 13:38:57 2019 - [info] Log messages from slave2 ... Mon Jan 14 13:38:57 2019 - [info] Mon Jan 14 13:38:56 2019 - [info] Starting recovery on slave2(192.168.238.112:3306).. Mon Jan 14 13:38:56 2019 - [info] This server has all relay logs. Waiting all logs to be applied.. Mon Jan 14 13:38:56 2019 - [info] done. Mon Jan 14 13:38:56 2019 - [info] All relay logs were successfully applied. Mon Jan 14 13:38:56 2019 - [info] Resetting slave slave2(192.168.238.112:3306) and starting replication from the new master slave1(192.168.238.111:3306).. Mon Jan 14 13:38:56 2019 - [info] Executed CHANGE MASTER. Mon Jan 14 13:38:56 2019 - [info] Slave started. Mon Jan 14 13:38:57 2019 - [info] End of log messages from slave2. Mon Jan 14 13:38:57 2019 - [info] -- Slave recovery on host slave2(192.168.238.112:3306) succeeded. Mon Jan 14 13:38:57 2019 - [info] All new slave servers recovered successfully. Mon Jan 14 13:38:57 2019 - [info] Mon Jan 14 13:38:57 2019 - [info] * Phase 5: New master cleanup phase.. Mon Jan 14 13:38:57 2019 - [info] Mon Jan 14 13:38:57 2019 - [info] Resetting slave info on the new master.. Mon Jan 14 13:38:57 2019 - [info] slave1: Resetting slave info succeeded. Mon Jan 14 13:38:57 2019 - [info] Master failover to slave1(192.168.238.111:3306) completed successfully. Mon Jan 14 13:38:57 2019 - [info] Deleted server1 entry from /etc/masterha/app1.cnf . Mon Jan 14 13:38:57 2019 - [info] ----- Failover Report ----- app1: MySQL Master failover master(192.168.238.110:3306) to slave1(192.168.238.111:3306) succeeded Master master(192.168.238.110:3306) is down! Check MHA Manager logs at mgr:/var/log/masterha/app1/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on master(192.168.238.110:3306) The latest slave slave1(192.168.238.111:3306) has all relay logs for recovery. Selected slave1(192.168.238.111:3306) as a new master. slave1(192.168.238.111:3306): OK: Applying all logs succeeded. slave1(192.168.238.111:3306): OK: Activated master IP address. slave2(192.168.238.112:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. slave2(192.168.238.112:3306): OK: Applying all logs succeeded. Slave started, replicating from slave1(192.168.238.111:3306) slave1(192.168.238.111:3306): Resetting slave info succeeded. Master failover to slave1(192.168.238.111:3306) completed successfully. Mon Jan 14 13:38:57 2019 - [info] Sending mail.. sh: /usr/bin/send_report: No such file or directory Mon Jan 14 13:38:57 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2066] Failed to send mail with return code 127:0 注意:当看到以上日志输出,说明候选master节点已经正式被切换成master节点。当master节点宕机并发生切换之后,MHA监控服务会随之被关闭掉。
[root@mgr ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING). [root@master ~]# ip a|grep eth0 2: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.238.110/24 brd 192.168.238.255 scope global eth0 [root@slave1 ~]# ip a|grep eth0 2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.238.111/24 brd 192.168.238.255 scope global eth0 inet 192.168.238.200/24 brd 192.168.238.255 scope global secondary eth0:1
不难看出,vip已经飘到候选节点。
4.2 修复原Master并加入复制集群(回切)
将原master节点MySQL服务正常启动
[root@master ~]# /etc/init.d/mysqld start Starting MySQL... [ OK ]
注意:当原Master恢复后,不会自动转成Slave,需要手动处理,并将原Master加入到复制集群。
需要在MHA manager管理节点获取切换时记录新Master二进制文件及事件的位置
[root@mgr ~]# cat /var/log/masterha/app1/manager.log|grep -i "All other slaves should start replication from here" Mon Jan 14 13:38:54 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='slave1 or 192.168.238.111', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=1031, MASTER_USER='repl', MASTER_PASSWORD='xxx';
将原Master节点加入到复制集群
[root@master ~]# mysql -uroot –pmysql mysql> CHANGE MASTER TO MASTER_HOST='192.168.238.111', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=1031, MASTER_USER='repl', MASTER_PASSWORD='mysql'; mysql> start slave; mysql> show slave status\G; … Slave_IO_Running: Yes Slave_SQL_Running: Yes …
由于原Master节点已被置为新的Slave节点,需要设置原Master只读服务,之所以没有写进 /etc/my.cnf 配置文件,是因为slave节点随时可能会提升为master节点
[root@master ~]# mysql -uroot -pmysql -e"set global read_only=on"[root@master ~]# mysql -uroot -pmysql -e"set global relay_log_purge=off"
MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。
原Master 即新Slave节点部署relaylog自动清理脚本
[root@master ~]# vi /root/scripts/purge_relay_log.sh #!/bin/bash user=root passwd=mysql port=3306 log_dir='/var/log/masterha/relaylogs_purge' work_dir='/var/tmp' purge='/usr/bin/purge_relay_logs' if [ ! -d $log_dir ] then mkdir $log_dir -p fi $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1 [root@master ~]# crontab -e 30 23 * * * /bin/bash /root/scripts/purge_relay_log.sh [root@master ~]# chmod +x /root/scripts/purge_relay_log.sh
4
.3 Master节点手动在线切换
[root@mgr ~]# cat /etc/masterha/mha_mgr_monitor.sh #!/bin/bash /usr/bin/nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
注意:由于在MHA服务启动脚本里添加--remove_dead_master_conf选项,切换后会自动将Master节点信息从/etc/masterha/app1.cnf中移除,如果要回切,需要重新配置MHA。
--remove_dead_master_conf:该参数代表当发生自动故障切换后,原Master相关信息将会从 /etc/masterha/app1.cnf 配置文件中移除。
--ignore_last_failover:在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/var/tmp产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。
重新配置MHA
[root@mgr ~]# vi /etc/masterha/app1.cnf [server default] manager_log=/var/log/masterha/app1/manager.log manager_workdir=/var/log/masterha/app1 master_binlog_dir=/var/lib/mysql/binlog master_ip_failover_script=/etc/masterha/master_ip_failover master_ip_online_change_script=/etc/masterha/master_ip_online_change password=mysql ping_interval=1 remote_workdir=/tmp repl_password=mysql repl_user=repl secondary_check_script=/usr/bin/masterha_secondary_check -s slave1 -s master report_script="" shutdown_script="" ssh_user=root user=root [server1] hostname=slave1 port=3306 [server2] candidate_master=1 check_repl_delay=0 hostname=master port=3306 [server3] hostname=slave2 port=3306
检查SSH配置:
[root@mgr ~]# /usr/bin/masterha_check_ssh --conf=/etc/masterha/app1.cnf Tue Jan 15 13:13:29 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 13:13:29 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 13:13:29 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 13:13:29 2019 - [info] Starting SSH connection tests.. Tue Jan 15 13:13:32 2019 - [debug] Tue Jan 15 13:13:30 2019 - [debug] Connecting via SSH from root@slave2(192.168.238.112:22) to root@slave1(192.168.238.111:22).. Tue Jan 15 13:13:31 2019 - [debug] ok. Tue Jan 15 13:13:31 2019 - [debug] Connecting via SSH from root@slave2(192.168.238.112:22) to root@master(192.168.238.110:22).. Tue Jan 15 13:13:32 2019 - [debug] ok. Tue Jan 15 13:13:32 2019 - [debug] Tue Jan 15 13:13:29 2019 - [debug] Connecting via SSH from root@slave1(192.168.238.111:22) to root@master(192.168.238.110:22).. Tue Jan 15 13:13:30 2019 - [debug] ok. Tue Jan 15 13:13:30 2019 - [debug] Connecting via SSH from root@slave1(192.168.238.111:22) to root@slave2(192.168.238.112:22).. Tue Jan 15 13:13:32 2019 - [debug] ok. Tue Jan 15 13:13:32 2019 - [debug] Tue Jan 15 13:13:29 2019 - [debug] Connecting via SSH from root@master(192.168.238.110:22) to root@slave1(192.168.238.111:22).. Tue Jan 15 13:13:31 2019 - [debug] ok. Tue Jan 15 13:13:31 2019 - [debug] Connecting via SSH from root@master(192.168.238.110:22) to root@slave2(192.168.238.112:22).. Tue Jan 15 13:13:32 2019 - [debug] ok. Tue Jan 15 13:13:32 2019 - [info] All SSH connection tests passed successfully.
检查复制集群状态:
[root@mgr ~]# /usr/bin/masterha_check_repl --conf=/etc/masterha/app1.cnf …… Tue Jan 15 13:14:19 2019 - [info] Connecting to root@192.168.238.110(master:22).. Can't exec "mysqlbinlog": No such file or directory at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106. mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options at /usr/bin/apply_diff_relay_logs line 493 Testing mysql connection and privileges..sh: mysql: command not found mysql command failed with rc 127:0!
检查时发现mysqlbinlog、mysql程序不存在,这是MHA未检测到对应的环境变量,手动做软链接
[root@master ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog [root@master ~]# ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
继续检查:
[root@mgr ~]# /usr/bin/masterha_check_repl --conf=/etc/masterha/app1.cnf Tue Jan 15 13:28:02 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 13:28:02 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 13:28:02 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 13:28:02 2019 - [info] MHA::MasterMonitor version 0.57. Tue Jan 15 13:28:03 2019 - [info] GTID failover mode = 0 Tue Jan 15 13:28:03 2019 - [info] Dead Servers: Tue Jan 15 13:28:03 2019 - [info] Alive Servers: Tue Jan 15 13:28:03 2019 - [info] slave1(192.168.238.111:3306) Tue Jan 15 13:28:03 2019 - [info] master(192.168.238.110:3306) Tue Jan 15 13:28:03 2019 - [info] slave2(192.168.238.112:3306) Tue Jan 15 13:28:03 2019 - [info] Alive Slaves: Tue Jan 15 13:28:03 2019 - [info] master(192.168.238.110:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 13:28:03 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Jan 15 13:28:03 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 13:28:03 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 13:28:03 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Jan 15 13:28:03 2019 - [info] Current Alive Master: slave1(192.168.238.111:3306) Tue Jan 15 13:28:03 2019 - [info] Checking slave configurations.. Tue Jan 15 13:28:03 2019 - [info] Checking replication filtering settings.. Tue Jan 15 13:28:03 2019 - [info] binlog_do_db= , binlog_ignore_db= Tue Jan 15 13:28:03 2019 - [info] Replication filtering check ok. Tue Jan 15 13:28:03 2019 - [info] GTID (with auto-pos) is not supported Tue Jan 15 13:28:03 2019 - [info] Starting SSH connection tests.. Tue Jan 15 13:28:07 2019 - [info] All SSH connection tests passed successfully. Tue Jan 15 13:28:07 2019 - [info] Checking MHA Node version.. Tue Jan 15 13:28:08 2019 - [info] Version check ok. Tue Jan 15 13:28:08 2019 - [info] Checking SSH publickey authentication settings on the current master.. Tue Jan 15 13:28:08 2019 - [info] HealthCheck: SSH to slave1 is reachable. Tue Jan 15 13:28:09 2019 - [info] Master MHA Node version is 0.57. Tue Jan 15 13:28:09 2019 - [info] Checking recovery script configurations on slave1(192.168.238.111:3306).. Tue Jan 15 13:28:09 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --start_file=mysql-bin.000041 Tue Jan 15 13:28:09 2019 - [info] Connecting to root@192.168.238.111(slave1:22).. Creating /tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql/binlog, up to mysql-bin.000041 Tue Jan 15 13:28:09 2019 - [info] Binlog setting check done. Tue Jan 15 13:28:09 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Tue Jan 15 13:28:09 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=master --slave_ip=192.168.238.110 --slave_port=3306 --workdir=/tmp --target_version=5.7.23-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/data/relay-log.info --relay_dir=/var/lib/mysql/data/ --slave_pass=xxx Tue Jan 15 13:28:09 2019 - [info] Connecting to root@192.168.238.110(master:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/data/relay-log.info ... ok. Relay log found at /var/lib/mysql/relaylog, up to relay-bin.000083 Temporary relay log file is /var/lib/mysql/relaylog/relay-bin.000083 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Jan 15 13:28:10 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave2 --slave_ip=192.168.238.112 --slave_port=3306 --workdir=/tmp --target_version=5.7.23-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/data/relay-log.info --relay_dir=/var/lib/mysql/data/ --slave_pass=xxx Tue Jan 15 13:28:10 2019 - [info] Connecting to root@192.168.238.112(slave2:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/data/relay-log.info ... ok. Relay log found at /var/lib/mysql/relaylog, up to relay-bin.000086 Temporary relay log file is /var/lib/mysql/relaylog/relay-bin.000086 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Jan 15 13:28:10 2019 - [info] Slaves settings check done. Tue Jan 15 13:28:10 2019 - [info] slave1(192.168.238.111:3306) (current master) +--master(192.168.238.110:3306) +--slave2(192.168.238.112:3306) Tue Jan 15 13:28:10 2019 - [info] Checking replication health on master.. Tue Jan 15 13:28:10 2019 - [info] ok. Tue Jan 15 13:28:10 2019 - [info] Checking replication health on slave2.. Tue Jan 15 13:28:10 2019 - [info] ok. Tue Jan 15 13:28:10 2019 - [info] Checking master_ip_failover_script status: Tue Jan 15 13:28:10 2019 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=slave1 --orig_master_ip=192.168.238.111 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.238.200/24=== Checking the Status of the script.. OK Tue Jan 15 13:28:10 2019 - [info] OK. Tue Jan 15 13:28:10 2019 - [warning] shutdown_script is not defined. Tue Jan 15 13:28:10 2019 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
检查自动切换后MHA监控服务状态:
[root@mgr ~]# /usr/bin/masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
注意:如果MHA监控服务处于正常启动状态,则需要先关闭。
[root@mgr ~]# masterha_stop --conf=/etc/masterha/app1.cnf MHA Manager is not running on app1(2:NOT_RUNNING).
否则如下报错:
[root@mgr ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.238.110 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 Tue Jan 15 14:18:24 2019 - [info] MHA::MasterRotate version 0.57. Tue Jan 15 14:18:24 2019 - [info] Starting online master switch.. Tue Jan 15 14:18:24 2019 - [info] Tue Jan 15 14:18:24 2019 - [info] * Phase 1: Configuration Check Phase.. Tue Jan 15 14:18:24 2019 - [info] Tue Jan 15 14:18:24 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 14:18:24 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:18:24 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:18:25 2019 - [info] GTID failover mode = 0 Tue Jan 15 14:18:25 2019 - [info] Current Alive Master: slave1(192.168.238.111:3306) Tue Jan 15 14:18:25 2019 - [info] Alive Slaves: Tue Jan 15 14:18:25 2019 - [info] master(192.168.238.110:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:18:25 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Jan 15 14:18:25 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 14:18:25 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:18:25 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on slave1(192.168.238.111:3306)? (YES/no): YES Tue Jan 15 14:18:27 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Jan 15 14:18:27 2019 - [info] ok. Tue Jan 15 14:18:27 2019 - [info] Checking MHA is not monitoring or doing failover.. Tue Jan 15 14:18:27 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again. Tue Jan 15 14:18:27 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53.
检查候选Master主从复制延时
[root@master ~]# mysql -uroot -pmysql mysql> show slave status\G; ...... Slave_IO_Running: Yes Slave_SQL_Running: Yes ...... Seconds_Behind_Master: 0 ......
--orig_master_is_new_slave:切换时加上此选项是将原 master 变更为 slave 节点,如果不加此选项,原来的 master 将不 启用
--running_updates_limit=10000:故障切换时,候选master如果有延迟的话,mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover时relay日志的大小决定
执行手动在线切换(switchover)
[root@mgr ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.238.110 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 Tue Jan 15 14:40:58 2019 - [info] MHA::MasterRotate version 0.57. Tue Jan 15 14:40:58 2019 - [info] Starting online master switch.. Tue Jan 15 14:40:58 2019 - [info] Tue Jan 15 14:40:58 2019 - [info] * Phase 1: Configuration Check Phase.. Tue Jan 15 14:40:58 2019 - [info] Tue Jan 15 14:40:58 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 14:40:58 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:40:58 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:40:59 2019 - [info] GTID failover mode = 0 Tue Jan 15 14:40:59 2019 - [info] Current Alive Master: slave1(192.168.238.111:3306) Tue Jan 15 14:40:59 2019 - [info] Alive Slaves: Tue Jan 15 14:40:59 2019 - [info] master(192.168.238.110:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:40:59 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Jan 15 14:40:59 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 14:40:59 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:40:59 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on slave1(192.168.238.111:3306)? (YES/no): YES Tue Jan 15 14:41:02 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Jan 15 14:41:02 2019 - [info] ok. Tue Jan 15 14:41:02 2019 - [info] Checking MHA is not monitoring or doing failover.. Tue Jan 15 14:41:02 2019 - [info] Checking replication health on master.. Tue Jan 15 14:41:02 2019 - [info] ok. Tue Jan 15 14:41:02 2019 - [info] Checking replication health on slave2.. Tue Jan 15 14:41:02 2019 - [info] ok. Tue Jan 15 14:41:02 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln1218] 192.168.238.110 is not alive! Tue Jan 15 14:41:02 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln232] Failed to get new master! Tue Jan 15 14:41:02 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53. [root@mgr ~]# cat /etc/masterha/app1.cnf [server default] manager_log=/var/log/masterha/app1/manager.log manager_workdir=/var/log/masterha/app1 master_binlog_dir=/var/lib/mysql/binlog master_ip_failover_script=/etc/masterha/master_ip_failover master_ip_online_change_script=/etc/masterha/master_ip_online_change password=mysql ping_interval=1 remote_workdir=/tmp repl_password=mysql repl_user=repl secondary_check_script=/usr/bin/masterha_secondary_check -s slave1 -s master report_script="" shutdown_script="" ssh_user=root user=root [server1] hostname=slave1 port=3306 [server2] candidate_master=1 check_repl_delay=0 hostname=master port=3306 [server3] hostname=slave2 port=3306
注意:手动在线切换的时候,注意区分主机名和IP地址
报错:Tue Jan 15 14:41:02 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln1218] 192.168.238.110 is not alive!
出现该报错一般有两种情况:
1、 MHA配置文件/etc/masterha/app1.cnf候选节点配置no_master=1选项,限制该节点成为新Master的可能
2、按官方文档描述,参数--new_master_host =(hostname),而不是用IP地址。若MHA配置文件/etc/masterha/app1.cnf用的是IP地址,那么在切换的时候使用IP地址也是可行的。
[root@mgr ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=master --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 Tue Jan 15 14:44:15 2019 - [info] MHA::MasterRotate version 0.57. Tue Jan 15 14:44:15 2019 - [info] Starting online master switch.. Tue Jan 15 14:44:15 2019 - [info] Tue Jan 15 14:44:15 2019 - [info] * Phase 1: Configuration Check Phase.. Tue Jan 15 14:44:15 2019 - [info] Tue Jan 15 14:44:15 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 14:44:15 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:44:15 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:44:16 2019 - [info] GTID failover mode = 0 Tue Jan 15 14:44:16 2019 - [info] Current Alive Master: slave1(192.168.238.111:3306) Tue Jan 15 14:44:16 2019 - [info] Alive Slaves: Tue Jan 15 14:44:16 2019 - [info] master(192.168.238.110:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:44:16 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Jan 15 14:44:16 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 14:44:16 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:44:16 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on slave1(192.168.238.111:3306)? (YES/no): YES Tue Jan 15 14:44:18 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Jan 15 14:44:18 2019 - [info] ok. Tue Jan 15 14:44:18 2019 - [info] Checking MHA is not monitoring or doing failover.. Tue Jan 15 14:44:18 2019 - [info] Checking replication health on master.. Tue Jan 15 14:44:18 2019 - [info] ok. Tue Jan 15 14:44:18 2019 - [info] Checking replication health on slave2.. Tue Jan 15 14:44:18 2019 - [info] ok. Tue Jan 15 14:44:18 2019 - [info] master can be new master. Tue Jan 15 14:44:18 2019 - [info] From: slave1(192.168.238.111:3306) (current master) +--master(192.168.238.110:3306) +--slave2(192.168.238.112:3306) To: master(192.168.238.110:3306) (new master) +--slave2(192.168.238.112:3306) +--slave1(192.168.238.111:3306) Starting master switch from slave1(192.168.238.111:3306) to master(192.168.238.110:3306)? (yes/NO): yes Tue Jan 15 14:44:44 2019 - [info] Checking whether master(192.168.238.110:3306) is ok for the new master.. Tue Jan 15 14:44:44 2019 - [info] ok. Tue Jan 15 14:44:44 2019 - [info] slave1(192.168.238.111:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Tue Jan 15 14:44:44 2019 - [info] slave1(192.168.238.111:3306): Resetting slave pointing to the dummy host. Tue Jan 15 14:44:44 2019 - [info] ** Phase 1: Configuration Check Phase completed. Tue Jan 15 14:44:44 2019 - [info] Tue Jan 15 14:44:44 2019 - [info] * Phase 2: Rejecting updates Phase.. Tue Jan 15 14:44:44 2019 - [info] Tue Jan 15 14:44:44 2019 - [info] Executing master ip online change script to disable write on the current master: Tue Jan 15 14:44:44 2019 - [info] /etc/masterha/master_ip_online_change --command=stop --orig_master_host=slave1 --orig_master_ip=192.168.238.111 --orig_master_port=3306 --orig_master_user='root' --new_master_host=master --new_master_ip=192.168.238.110 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx *************************************************************** Disabling the VIP - 192.168.238.200/24 on old master: slave1 *************************************************************** Tue Jan 15 14:44:45 2019 - [info] ok. Tue Jan 15 14:44:45 2019 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Tue Jan 15 14:44:45 2019 - [info] Executing FLUSH TABLES WITH READ LOCK.. Tue Jan 15 14:44:45 2019 - [info] ok. Tue Jan 15 14:44:45 2019 - [info] Orig master binlog:pos is mysql-bin.000041:636. Tue Jan 15 14:44:45 2019 - [info] Waiting to execute all relay logs on master(192.168.238.110:3306).. Tue Jan 15 14:44:45 2019 - [info] master_pos_wait(mysql-bin.000041:636) completed on master(192.168.238.110:3306). Executed 0 events. Tue Jan 15 14:44:45 2019 - [info] done. Tue Jan 15 14:44:45 2019 - [info] Getting new master's binlog name and position.. Tue Jan 15 14:44:45 2019 - [info] mysql-bin.000016:154 Tue Jan 15 14:44:45 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='master or 192.168.238.110', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000016', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Jan 15 14:44:45 2019 - [info] Executing master ip online change script to allow write on the new master: Tue Jan 15 14:44:45 2019 - [info] /etc/masterha/master_ip_online_change --command=start --orig_master_host=slave1 --orig_master_ip=192.168.238.111 --orig_master_port=3306 --orig_master_user='root' --new_master_host=master --new_master_ip=192.168.238.110 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx *************************************************************** Enabling the VIP - 192.168.238.200/24 on new master: master *************************************************************** Tue Jan 15 14:44:45 2019 - [info] ok. Tue Jan 15 14:44:45 2019 - [info] Setting read_only=0 on master(192.168.238.110:3306).. Tue Jan 15 14:44:45 2019 - [info] ok. Tue Jan 15 14:44:45 2019 - [info] Tue Jan 15 14:44:45 2019 - [info] * Switching slaves in parallel.. Tue Jan 15 14:44:45 2019 - [info] Tue Jan 15 14:44:45 2019 - [info] -- Slave switch on host slave2(192.168.238.112:3306) started, pid: 19046 Tue Jan 15 14:44:45 2019 - [info] Tue Jan 15 14:44:46 2019 - [info] Log messages from slave2 ... Tue Jan 15 14:44:46 2019 - [info] Tue Jan 15 14:44:45 2019 - [info] Waiting to execute all relay logs on slave2(192.168.238.112:3306).. Tue Jan 15 14:44:45 2019 - [info] master_pos_wait(mysql-bin.000041:636) completed on slave2(192.168.238.112:3306). Executed 0 events. Tue Jan 15 14:44:45 2019 - [info] done. Tue Jan 15 14:44:45 2019 - [info] Resetting slave slave2(192.168.238.112:3306) and starting replication from the new master master(192.168.238.110:3306).. Tue Jan 15 14:44:46 2019 - [info] Executed CHANGE MASTER. Tue Jan 15 14:44:46 2019 - [info] Slave started. Tue Jan 15 14:44:46 2019 - [info] End of log messages from slave2 ... Tue Jan 15 14:44:46 2019 - [info] Tue Jan 15 14:44:46 2019 - [info] -- Slave switch on host slave2(192.168.238.112:3306) succeeded. Tue Jan 15 14:44:46 2019 - [info] Unlocking all tables on the orig master: Tue Jan 15 14:44:46 2019 - [info] Executing UNLOCK TABLES.. Tue Jan 15 14:44:46 2019 - [info] ok. Tue Jan 15 14:44:46 2019 - [info] Starting orig master as a new slave.. Tue Jan 15 14:44:46 2019 - [info] Resetting slave slave1(192.168.238.111:3306) and starting replication from the new master master(192.168.238.110:3306).. Tue Jan 15 14:44:47 2019 - [info] Executed CHANGE MASTER. Tue Jan 15 14:44:47 2019 - [info] Slave started. Tue Jan 15 14:44:47 2019 - [info] All new slave servers switched successfully. Tue Jan 15 14:44:47 2019 - [info] Tue Jan 15 14:44:47 2019 - [info] * Phase 5: New master cleanup phase.. Tue Jan 15 14:44:47 2019 - [info] Tue Jan 15 14:44:47 2019 - [info] master: Resetting slave info succeeded. Tue Jan 15 14:44:47 2019 - [info] Switching master to master(192.168.238.110:3306) completed successfully.
已经成功完成在线手动切换。
4.4 Master节点手动故障切换
修改MHA配置文件
[root@mgr ~]# vi /etc/masterha/app1.cnf [server default] manager_log=/var/log/masterha/app1/manager.log manager_workdir=/var/log/masterha/app1 master_binlog_dir=/var/lib/mysql/binlog master_ip_failover_script=/etc/masterha/master_ip_failover master_ip_online_change_script=/etc/masterha/master_ip_online_change password=mysql ping_interval=1 remote_workdir=/tmp repl_password=mysql repl_user=repl secondary_check_script=/usr/bin/masterha_secondary_check -s master -s slave1 report_script="" shutdown_script="" ssh_user=root user=root [server1] hostname=master port=3306 [server2] candidate_master=1 check_repl_delay=0 hostname=slave1 port=3306 [server3] hostname=slave2 port=3306 [root@mgr ~]# /usr/bin/masterha_check_ssh --conf=/etc/masterha/app1.cnf Tue Jan 15 15:20:04 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 15:20:04 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 15:20:04 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 15:20:04 2019 - [info] Starting SSH connection tests.. Tue Jan 15 15:20:06 2019 - [debug] Tue Jan 15 15:20:04 2019 - [debug] Connecting via SSH from root@master(192.168.238.110:22) to root@slave1(192.168.238.111:22).. Tue Jan 15 15:20:05 2019 - [debug] ok. Tue Jan 15 15:20:05 2019 - [debug] Connecting via SSH from root@master(192.168.238.110:22) to root@slave2(192.168.238.112:22).. Tue Jan 15 15:20:06 2019 - [debug] ok. Tue Jan 15 15:20:06 2019 - [debug] Tue Jan 15 15:20:04 2019 - [debug] Connecting via SSH from root@slave1(192.168.238.111:22) to root@master(192.168.238.110:22).. Tue Jan 15 15:20:05 2019 - [debug] ok. Tue Jan 15 15:20:05 2019 - [debug] Connecting via SSH from root@slave1(192.168.238.111:22) to root@slave2(192.168.238.112:22).. Tue Jan 15 15:20:06 2019 - [debug] ok. Tue Jan 15 15:20:07 2019 - [debug] Tue Jan 15 15:20:05 2019 - [debug] Connecting via SSH from root@slave2(192.168.238.112:22) to root@master(192.168.238.110:22).. Tue Jan 15 15:20:06 2019 - [debug] ok. Tue Jan 15 15:20:06 2019 - [debug] Connecting via SSH from root@slave2(192.168.238.112:22) to root@slave1(192.168.238.111:22).. Tue Jan 15 15:20:06 2019 - [debug] ok. Tue Jan 15 15:20:07 2019 - [info] All SSH connection tests passed successfully. [root@mgr ~]# /usr/bin/masterha_check_repl --conf=/etc/masterha/app1.cnf Tue Jan 15 15:20:13 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 15:20:13 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 15:20:13 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 15:20:13 2019 - [info] MHA::MasterMonitor version 0.57. Tue Jan 15 15:20:14 2019 - [info] GTID failover mode = 0 Tue Jan 15 15:20:14 2019 - [info] Dead Servers: Tue Jan 15 15:20:14 2019 - [info] Alive Servers: Tue Jan 15 15:20:14 2019 - [info] master(192.168.238.110:3306) Tue Jan 15 15:20:14 2019 - [info] slave1(192.168.238.111:3306) Tue Jan 15 15:20:14 2019 - [info] slave2(192.168.238.112:3306) Tue Jan 15 15:20:14 2019 - [info] Alive Slaves: Tue Jan 15 15:20:14 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 15:20:14 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Tue Jan 15 15:20:14 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 15:20:14 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 15:20:14 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Tue Jan 15 15:20:14 2019 - [info] Current Alive Master: master(192.168.238.110:3306) Tue Jan 15 15:20:14 2019 - [info] Checking slave configurations.. Tue Jan 15 15:20:14 2019 - [info] read_only=1 is not set on slave slave1(192.168.238.111:3306). Tue Jan 15 15:20:14 2019 - [info] Checking replication filtering settings.. Tue Jan 15 15:20:14 2019 - [info] binlog_do_db= , binlog_ignore_db= Tue Jan 15 15:20:14 2019 - [info] Replication filtering check ok. Tue Jan 15 15:20:14 2019 - [info] GTID (with auto-pos) is not supported Tue Jan 15 15:20:14 2019 - [info] Starting SSH connection tests.. Tue Jan 15 15:20:18 2019 - [info] All SSH connection tests passed successfully. Tue Jan 15 15:20:18 2019 - [info] Checking MHA Node version.. Tue Jan 15 15:20:19 2019 - [info] Version check ok. Tue Jan 15 15:20:19 2019 - [info] Checking SSH publickey authentication settings on the current master.. Tue Jan 15 15:20:19 2019 - [info] HealthCheck: SSH to master is reachable. Tue Jan 15 15:20:20 2019 - [info] Master MHA Node version is 0.57. Tue Jan 15 15:20:20 2019 - [info] Checking recovery script configurations on master(192.168.238.110:3306).. Tue Jan 15 15:20:20 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --start_file=mysql-bin.000016 Tue Jan 15 15:20:20 2019 - [info] Connecting to root@192.168.238.110(master:22).. Creating /tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql/binlog, up to mysql-bin.000016 Tue Jan 15 15:20:20 2019 - [info] Binlog setting check done. Tue Jan 15 15:20:20 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Tue Jan 15 15:20:20 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave1 --slave_ip=192.168.238.111 --slave_port=3306 --workdir=/tmp --target_version=5.7.23-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/data/relay-log.info --relay_dir=/var/lib/mysql/data/ --slave_pass=xxx Tue Jan 15 15:20:20 2019 - [info] Connecting to root@192.168.238.111(slave1:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/data/relay-log.info ... ok. Relay log found at /var/lib/mysql/relaylog, up to relay-bin.000002 Temporary relay log file is /var/lib/mysql/relaylog/relay-bin.000002 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Jan 15 15:20:21 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave2 --slave_ip=192.168.238.112 --slave_port=3306 --workdir=/tmp --target_version=5.7.23-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/data/relay-log.info --relay_dir=/var/lib/mysql/data/ --slave_pass=xxx Tue Jan 15 15:20:21 2019 - [info] Connecting to root@192.168.238.112(slave2:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/data/relay-log.info ... ok. Relay log found at /var/lib/mysql/relaylog, up to relay-bin.000002 Temporary relay log file is /var/lib/mysql/relaylog/relay-bin.000002 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Jan 15 15:20:24 2019 - [info] Slaves settings check done. Tue Jan 15 15:20:24 2019 - [info] master(192.168.238.110:3306) (current master) +--slave1(192.168.238.111:3306) +--slave2(192.168.238.112:3306) Tue Jan 15 15:20:24 2019 - [info] Checking replication health on slave1.. Tue Jan 15 15:20:24 2019 - [info] ok. Tue Jan 15 15:20:24 2019 - [info] Checking replication health on slave2.. Tue Jan 15 15:20:24 2019 - [info] ok. Tue Jan 15 15:20:24 2019 - [info] Checking master_ip_failover_script status: Tue Jan 15 15:20:24 2019 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=master --orig_master_ip=192.168.238.110 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.238.200/24=== Checking the Status of the script.. OK Tue Jan 15 15:20:24 2019 - [info] OK. Tue Jan 15 15:20:24 2019 - [warning] shutdown_script is not defined. Tue Jan 15 15:20:24 2019 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK. [root@mgr ~]# [root@mgr ~]# /usr/bin/masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:19706) is running(0:PING_OK), master:master
注意:手动故障切换前,检查下复制集群相关情况及状态。并且要先关闭MHA服务,再模拟master节点服务宕机,否则必然会触发自动故障切换。
[root@mgr ~]# masterha_stop --conf=/etc/masterha/app1.cnf Stopped app1 successfully.
开始手动故障切换:
[root@mgr ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=master --dead_master_port=3306 --new_master_host=slave1 --new_master_port=3306 --ignore_last_failover --dead_master_ip=is not set. Using 192.168.238.110. Tue Jan 15 16:23:09 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 16:23:09 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 16:23:09 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 16:23:09 2019 - [info] MHA::MasterFailover version 0.57. Tue Jan 15 16:23:09 2019 - [info] Starting master failover. Tue Jan 15 16:23:09 2019 - [info] Tue Jan 15 16:23:09 2019 - [info] * Phase 1: Configuration Check Phase.. Tue Jan 15 16:23:09 2019 - [info] Tue Jan 15 16:23:10 2019 - [info] GTID failover mode = 0 Tue Jan 15 16:23:10 2019 - [info] Dead Servers: Tue Jan 15 16:23:10 2019 - [info] master(192.168.238.110:3306) Tue Jan 15 16:23:10 2019 - [info] Checking master reachability via MySQL(double check)... Tue Jan 15 16:23:10 2019 - [info] ok. Tue Jan 15 16:23:10 2019 - [info] Alive Servers: Tue Jan 15 16:23:10 2019 - [info] slave1(192.168.238.111:3306) Tue Jan 15 16:23:10 2019 - [info] slave2(192.168.238.112:3306) Tue Jan 15 16:23:10 2019 - [info] Alive Slaves: Tue Jan 15 16:23:10 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 16:23:10 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Tue Jan 15 16:23:10 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 16:23:10 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 16:23:10 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Master master(192.168.238.110:3306) is dead. Proceed? (yes/NO): yes Tue Jan 15 16:23:53 2019 - [info] Starting Non-GTID based failover. Tue Jan 15 16:23:53 2019 - [info] Tue Jan 15 16:23:53 2019 - [info] ** Phase 1: Configuration Check Phase completed. Tue Jan 15 16:23:53 2019 - [info] Tue Jan 15 16:23:53 2019 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Jan 15 16:23:53 2019 - [info] Tue Jan 15 16:23:54 2019 - [info] HealthCheck: SSH to master is reachable. Tue Jan 15 16:23:54 2019 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Jan 15 16:23:54 2019 - [info] Executing master IP deactivation script: Tue Jan 15 16:23:54 2019 - [info] /etc/masterha/master_ip_failover --orig_master_host=master --orig_master_ip=192.168.238.110 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.238.200/24=== Disabling the VIP on old master: master Tue Jan 15 16:23:55 2019 - [info] done. Tue Jan 15 16:23:55 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue Jan 15 16:23:55 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Jan 15 16:23:55 2019 - [info] Tue Jan 15 16:23:55 2019 - [info] * Phase 3: Master Recovery Phase.. Tue Jan 15 16:23:55 2019 - [info] Tue Jan 15 16:23:55 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Jan 15 16:23:55 2019 - [info] Tue Jan 15 16:23:55 2019 - [info] The latest binary log file/position on all slaves is mysql-bin.000017:636 Tue Jan 15 16:23:55 2019 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Jan 15 16:23:55 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 16:23:55 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Tue Jan 15 16:23:55 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 16:23:55 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 16:23:55 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Tue Jan 15 16:23:55 2019 - [info] The oldest binary log file/position on all slaves is mysql-bin.000017:636 Tue Jan 15 16:23:55 2019 - [info] Oldest slaves: Tue Jan 15 16:23:55 2019 - [info] slave1(192.168.238.111:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 16:23:55 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Tue Jan 15 16:23:55 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 16:23:55 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 16:23:55 2019 - [info] Replicating from 192.168.238.110(192.168.238.110:3306) Tue Jan 15 16:23:55 2019 - [info] Tue Jan 15 16:23:55 2019 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Tue Jan 15 16:23:55 2019 - [info] Tue Jan 15 16:23:55 2019 - [info] Fetching dead master's binary logs.. Tue Jan 15 16:23:55 2019 - [info] Executing command on the dead master master(192.168.238.110:3306): save_binary_logs --command=save --start_file=mysql-bin.000017 --start_pos=636 --binlog_dir=/var/lib/mysql/binlog --output_file=/tmp/saved_master_binlog_from_master_3306_20190115162309.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 Creating /tmp if not exists.. ok. Concat binary/relay logs from mysql-bin.000017 pos 636 to mysql-bin.000017 EOF into /tmp/saved_master_binlog_from_master_3306_20190115162309.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 154.. ok. Dumping effective binlog data from /var/lib/mysql/binlog/mysql-bin.000017 position 636 to tail(659).. ok. Binlog Checksum enabled Concat succeeded. saved_master_binlog_from_master_3306_20190115162309.binlog 100% 177 171.5KB/s 00:00 Tue Jan 15 16:23:57 2019 - [info] scp from root@192.168.238.110:/tmp/saved_master_binlog_from_master_3306_20190115162309.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_master_3306_20190115162309.binlog succeeded. Tue Jan 15 16:23:57 2019 - [info] HealthCheck: SSH to slave1 is reachable. Tue Jan 15 16:23:59 2019 - [info] HealthCheck: SSH to slave2 is reachable. Tue Jan 15 16:23:59 2019 - [info] Tue Jan 15 16:23:59 2019 - [info] * Phase 3.3: Determining New Master Phase.. Tue Jan 15 16:23:59 2019 - [info] Tue Jan 15 16:23:59 2019 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Tue Jan 15 16:23:59 2019 - [info] All slaves received relay logs to the same position. No need to resync each other. Tue Jan 15 16:23:59 2019 - [info] slave1 can be new master. Tue Jan 15 16:23:59 2019 - [info] New master is slave1(192.168.238.111:3306) Tue Jan 15 16:23:59 2019 - [info] Starting master failover.. Tue Jan 15 16:23:59 2019 - [info] From: master(192.168.238.110:3306) (current master) +--slave1(192.168.238.111:3306) +--slave2(192.168.238.112:3306) To: slave1(192.168.238.111:3306) (new master) +--slave2(192.168.238.112:3306) Starting master switch from master(192.168.238.110:3306) to slave1(192.168.238.111:3306)? (yes/NO): yes Tue Jan 15 16:24:07 2019 - [info] New master decided manually is slave1(192.168.238.111:3306) Tue Jan 15 16:24:07 2019 - [info] Tue Jan 15 16:24:07 2019 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Tue Jan 15 16:24:07 2019 - [info] Tue Jan 15 16:24:07 2019 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Tue Jan 15 16:24:07 2019 - [info] Sending binlog.. saved_master_binlog_from_master_3306_20190115162309.binlog 100% 177 231.4KB/s 00:00 Tue Jan 15 16:24:08 2019 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_master_3306_20190115162309.binlog to root@slave1:/tmp/saved_master_binlog_from_master_3306_20190115162309.binlog succeeded. Tue Jan 15 16:24:08 2019 - [info] Tue Jan 15 16:24:08 2019 - [info] * Phase 3.4: Master Log Apply Phase.. Tue Jan 15 16:24:08 2019 - [info] Tue Jan 15 16:24:08 2019 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Tue Jan 15 16:24:08 2019 - [info] Starting recovery on slave1(192.168.238.111:3306).. Tue Jan 15 16:24:08 2019 - [info] Generating diffs succeeded. Tue Jan 15 16:24:08 2019 - [info] Waiting until all relay logs are applied. Tue Jan 15 16:24:08 2019 - [info] done. Tue Jan 15 16:24:08 2019 - [info] Getting slave status.. Tue Jan 15 16:24:08 2019 - [info] This slave(slave1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000017:636). No need to recover from Exec_Master_Log_Pos. Tue Jan 15 16:24:08 2019 - [info] Connecting to the target slave host slave1, running recover script.. Tue Jan 15 16:24:08 2019 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=slave1 --slave_ip=192.168.238.111 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_master_3306_20190115162309.binlog --workdir=/tmp --target_version=5.7.23-log --timestamp=20190115162309 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx Tue Jan 15 16:24:09 2019 - [info] MySQL client version is 5.7.23. Using --binary-mode. Applying differential binary/relay log files /tmp/saved_master_binlog_from_master_3306_20190115162309.binlog on slave1:3306. This may take long time... Applying log files succeeded. Tue Jan 15 16:24:09 2019 - [info] All relay logs were successfully applied. Tue Jan 15 16:24:09 2019 - [info] Getting new master's binlog name and position.. Tue Jan 15 16:24:09 2019 - [info] mysql-bin.000052:1118 Tue Jan 15 16:24:09 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='slave1 or 192.168.238.111', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000052', MASTER_LOG_POS=1118, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Jan 15 16:24:09 2019 - [info] Executing master IP activate script: Tue Jan 15 16:24:09 2019 - [info] /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=master --orig_master_ip=192.168.238.110 --orig_master_port=3306 --new_master_host=slave1 --new_master_ip=192.168.238.111 --new_master_port=3306 --new_master_user='root' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_password IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.238.200/24=== Enabling the VIP - 192.168.238.200/24 on the new master - slave1 Tue Jan 15 16:24:09 2019 - [info] OK. Tue Jan 15 16:24:09 2019 - [info] ** Finished master recovery successfully. Tue Jan 15 16:24:09 2019 - [info] * Phase 3: Master Recovery Phase completed. Tue Jan 15 16:24:09 2019 - [info] Tue Jan 15 16:24:09 2019 - [info] * Phase 4: Slaves Recovery Phase.. Tue Jan 15 16:24:09 2019 - [info] Tue Jan 15 16:24:09 2019 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Tue Jan 15 16:24:09 2019 - [info] Tue Jan 15 16:24:09 2019 - [info] -- Slave diff file generation on host slave2(192.168.238.112:3306) started, pid: 21551. Check tmp log /var/log/masterha/app1/slave2_3306_20190115162309.log if it takes time.. Tue Jan 15 16:24:10 2019 - [info] Tue Jan 15 16:24:10 2019 - [info] Log messages from slave2 ... Tue Jan 15 16:24:10 2019 - [info] Tue Jan 15 16:24:09 2019 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Tue Jan 15 16:24:10 2019 - [info] End of log messages from slave2. Tue Jan 15 16:24:10 2019 - [info] -- slave2(192.168.238.112:3306) has the latest relay log events. Tue Jan 15 16:24:10 2019 - [info] Generating relay diff files from the latest slave succeeded. Tue Jan 15 16:24:10 2019 - [info] Tue Jan 15 16:24:10 2019 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Tue Jan 15 16:24:10 2019 - [info] Tue Jan 15 16:24:10 2019 - [info] -- Slave recovery on host slave2(192.168.238.112:3306) started, pid: 21553. Check tmp log /var/log/masterha/app1/slave2_3306_20190115162309.log if it takes time.. saved_master_binlog_from_master_3306_20190115162309.binlog 100% 177 249.8KB/s 00:00 Tue Jan 15 16:24:13 2019 - [info] Tue Jan 15 16:24:13 2019 - [info] Log messages from slave2 ... Tue Jan 15 16:24:13 2019 - [info] Tue Jan 15 16:24:10 2019 - [info] Sending binlog.. Tue Jan 15 16:24:11 2019 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_master_3306_20190115162309.binlog to root@slave2:/tmp/saved_master_binlog_from_master_3306_20190115162309.binlog succeeded. Tue Jan 15 16:24:11 2019 - [info] Starting recovery on slave2(192.168.238.112:3306).. Tue Jan 15 16:24:11 2019 - [info] Generating diffs succeeded. Tue Jan 15 16:24:11 2019 - [info] Waiting until all relay logs are applied. Tue Jan 15 16:24:11 2019 - [info] done. Tue Jan 15 16:24:11 2019 - [info] Getting slave status.. Tue Jan 15 16:24:11 2019 - [info] This slave(slave2)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000017:636). No need to recover from Exec_Master_Log_Pos. Tue Jan 15 16:24:11 2019 - [info] Connecting to the target slave host slave2, running recover script.. Tue Jan 15 16:24:11 2019 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=slave2 --slave_ip=192.168.238.112 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_master_3306_20190115162309.binlog --workdir=/tmp --target_version=5.7.23-log --timestamp=20190115162309 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx Tue Jan 15 16:24:12 2019 - [info] MySQL client version is 5.7.23. Using --binary-mode. Applying differential binary/relay log files /tmp/saved_master_binlog_from_master_3306_20190115162309.binlog on slave2:3306. This may take long time... Applying log files succeeded. Tue Jan 15 16:24:12 2019 - [info] All relay logs were successfully applied. Tue Jan 15 16:24:12 2019 - [info] Resetting slave slave2(192.168.238.112:3306) and starting replication from the new master slave1(192.168.238.111:3306).. Tue Jan 15 16:24:12 2019 - [info] Executed CHANGE MASTER. Tue Jan 15 16:24:12 2019 - [info] Slave started. Tue Jan 15 16:24:13 2019 - [info] End of log messages from slave2. Tue Jan 15 16:24:13 2019 - [info] -- Slave recovery on host slave2(192.168.238.112:3306) succeeded. Tue Jan 15 16:24:13 2019 - [info] All new slave servers recovered successfully. Tue Jan 15 16:24:13 2019 - [info] Tue Jan 15 16:24:13 2019 - [info] * Phase 5: New master cleanup phase.. Tue Jan 15 16:24:13 2019 - [info] Tue Jan 15 16:24:13 2019 - [info] Resetting slave info on the new master.. Tue Jan 15 16:24:13 2019 - [info] slave1: Resetting slave info succeeded. Tue Jan 15 16:24:13 2019 - [info] Master failover to slave1(192.168.238.111:3306) completed successfully. Tue Jan 15 16:24:13 2019 - [info] ----- Failover Report ----- app1: MySQL Master failover master(192.168.238.110:3306) to slave1(192.168.238.111:3306) succeeded Master master(192.168.238.110:3306) is down! Check MHA Manager logs at mgr for details. Started manual(interactive) failover. Invalidated master IP address on master(192.168.238.110:3306) The latest slave slave1(192.168.238.111:3306) has all relay logs for recovery. Selected slave1(192.168.238.111:3306) as a new master. slave1(192.168.238.111:3306): OK: Applying all logs succeeded. slave1(192.168.238.111:3306): OK: Activated master IP address. slave2(192.168.238.112:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. slave2(192.168.238.112:3306): OK: Applying all logs succeeded. Slave started, replicating from slave1(192.168.238.111:3306) slave1(192.168.238.111:3306): Resetting slave info succeeded. Master failover to slave1(192.168.238.111:3306) completed successfully.
恢复原Master并加入到复制集群
Tue Jan 15 16:24:09 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='slave1 or 192.168.238.111', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000052', MASTER_LOG_POS=1118, MASTER_USER='repl', MASTER_PASSWORD='xxx';
五 参考资料
5.1 MySQL MHA安装和配置
5.2 keepalived文件配置
六 安装及配置ProxySQL
6.1 下载ProxySQL介质
下载链接:
https://github.com/sysown/proxysql/releases
选择:
6.2 安装ProxySQL软件
shell> yum install -y proxysql-2.0.1-1-centos67.x86_64.rpm shell> rpm -ql proxysql-2.0.1-1.x86_64 /etc/init.d/proxysql /etc/proxysql.cnf /usr/bin/proxysql /usr/share/proxysql/tools/proxysql_galera_checker.sh /usr/share/proxysql/tools/proxysql_galera_writer.pl
6.3 启动ProxySQL服务
启动ProxySQL服务
shell> /etc/init.d/proxysql start Starting ProxySQL: 2019-02-12 14:12:14 [INFO] Using config file /etc/proxysql.cnf 2019-02-12 14:12:14 [INFO] No SSL keys/certificates found in datadir (/var/lib/proxysql). Generating new keys/certificates. DONE! shell> ps -ef|grep proxysql|grep -v grep|awk '{print $1,$2,$3,$8,$9,$10,$11,$12}'|column -t proxysql 9350 1 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql proxysql 9351 9350 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
注意:proxysql有个配置文件 /etc/proxysql.cnf,只在第一次启动的时候有用,后续所有的配置修改都是对SQLite数据库操作,并且不会更新到proxysql.cnf文件中。ProxySQL绝大部分配置都可以在线修改,配置存储在 /var/lib/proxysql/proxysql.db 中,后面会陆续介绍它的在线配置。
6.4 MySQL服务节点master端创建用户
创建监控用户monitor
create user 'monitor'@'192.168.238.110' identified by 'monitor'; GRANT SUPER, REPLICATION CLIENT ON *.* TO 'monitor'@'192.168.238.110'; create user 'monitor'@'192.168.238.111' identified by 'monitor'; GRANT SUPER, REPLICATION CLIENT ON *.* TO 'monitor'@'192.168.238.111';
创建业务用户proxysql
create user 'proxysql'@'192.168.238.110' identified by 'proxysql'; GRANT ALL ON *.* TO 'proxysql'@'192.168.238.110'; create user 'proxysql'@'192.168.238.111' identified by 'proxysql'; GRANT ALL ON *.* TO 'proxysql'@'192.168.238.111';
主从节点确认:
select user,host,authentication_string from mysql.user;
6.5 配置ProxySQL中mysql_servers表
通过管理端口6032登入ProxySQL数据库
shell> export MYSQL_PS1="(\u@\h:\p) [\d]> " shell> mysql -uadmin -padmin -h127.0.0.1 -P6032
将MySQL主从信息插入到mysql_servers表中,master节点做写放入HG100,slave节点做从放入HG1000
insert into mysql_servers(hostgroup_id,hostname,port,weight,max_connections,max_replication_lag,comment) values(100,'192.168.238.111',3306,1,1000,10,'welcome to proxysql'); insert into mysql_servers(hostgroup_id,hostname,port,weight,max_connections,max_replication_lag,comment) values(1000,'192.168.238.110',3306,1,1000,10,'welcome to proxysql'); insert into mysql_servers(hostgroup_id,hostname,port,weight,max_connections,max_replication_lag,comment) values(1000,'192.168.238.112',3306,1,1000,10,'welcome to proxysql');
并查询确认:
(admin@127.0.0.1:6032) [(none)]> select hostgroup_id,hostname,port,status,weight from mysql_servers; +--------------+-----------------+------+--------+--------+ | hostgroup_id | hostname | port | status | weight | +--------------+-----------------+------+--------+--------+ | 100 | 192.168.238.111 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.110 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.112 | 3306 | ONLINE | 1 | +--------------+-----------------+------+--------+--------+
6.6 配置ProxySQL中mysql_users表
通过管理端口6032登入ProxySQL数据库
shell> export MYSQL_PS1="(\u@\h:\p) [\d]> " shell> mysql -uadmin -padmin -h127.0.0.1 -P6032
在ProxySQL中创建用户proxysql,当业务端通过proxysql用户发起连接连至中间件,中间件能够通过此账户透传至MySQL服务后端
insert into mysql_users(username,password,active,default_hostgroup,transaction_persistent) values('proxysql','proxysql',1,100,1);
并查询确认:
select username,password,active,default_hostgroup,transaction_persistent from mysql_users; +----------+----------+--------+-------------------+------------------------+ | username | password | active | default_hostgroup | transaction_persistent | +----------+----------+--------+-------------------+------------------------+ | proxysql | proxysql | 1 | 100 | 1 | +----------+----------+--------+-------------------+------------------------+
6
.7 配置ProxySQL中监控用户
ProxySQL监控用户默认配置:
show variables where Variable_name in('mysql-monitor_username','mysql-monitor_password'); +------------------------+---------+ | Variable_name | Value | +------------------------+---------+ | mysql-monitor_password | monitor | | mysql-monitor_username | monitor | +------------------------+---------+
若监控用户不是默认值,则需要修改变量进行手动配置:
set mysql-monitor_username='monitor'; set mysql-monitor_password='monitor';
6
.8 加载和保存ProxySQL中的配置
修改ProxySQL配置时,直接操作的是 MEMORAY中的记录,以下命令可用于加载或保存
load mysql servers to runtime; load mysql users to runtime; load mysql variables to runtime; save mysql servers to disk; save mysql users to disk; save mysql variables to disk;
6.9 MySQL客户端测试并验证
当前slave1节点是主节点,客户端通过业务端口6033连至主数据库,并进行相关操作
shell> mysql -uproxysql -pproxysql -h192.168.238.111 -P6033 -e"show global variables like 'hostname'" Warning: Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | hostname | slave1 | +---------------+--------+ shell> mysql -uproxysql -pproxysql -h192.168.238.111 -P6033 mysql> show databases; mysql> use hzmc1; mysql> show tables; mysql> desc cons7; mysql> delete from cons7 where id=1; mysql> delete from cons7 where id=2; mysql> delete from cons7 where id=3; mysql> select count(*) from cons7;
查看ProxySQL库中stats_mysql_query_digest审计表,记录显示所有读写操作在100组进行,组100是主库,即说明读写未分离。若想实现读写分离,需要继续配置,自定义路由规则。
(admin@127.0.0.1:6032) [(none)]> select hostgroup,schemaname,username,digest_text,count_star from stats_mysql_query_digest; +-----------+--------------------+----------+----------------------------------+------------+ | hostgroup | schemaname | username | digest_text | count_star | +-----------+--------------------+----------+----------------------------------+------------+ | 100 | hzmc1 | proxysql | select count(*) from cons7 | 2 | | 100 | hzmc1 | proxysql | delete from cons7 where id=? | 3 | | 100 | hzmc1 | proxysql | show databases | 1 | | 100 | hzmc1 | proxysql | show tables | 2 | | 100 | information_schema | proxysql | SELECT DATABASE() | 1 | | 100 | information_schema | proxysql | show databases | 1 | | 100 | information_schema | proxysql | select USER() | 1 | | 100 | hzmc1 | proxysql | desc cons7 | 1 | | 100 | information_schema | proxysql | show global variables like ? | 19 | | 100 | information_schema | proxysql | select @@version_comment limit ? | 20 | +-----------+--------------------+----------+----------------------------------+------------+
观察各HG组情况:
(admin@127.0.0.1:6032) [(none)]> select hostgroup_id,hostname,port,status,weight from runtime_mysql_servers; +--------------+-----------------+------+--------+--------+ | hostgroup_id | hostname | port | status | weight | +--------------+-----------------+------+--------+--------+ | 100 | 192.168.238.111 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.112 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.110 | 3306 | ONLINE | 1 | +--------------+-----------------+------+--------+--------+
6 .10 自定义ProxySQL中路由规则
配置ProxySQL中mysql_query_rules表,向该表插入规则
INSERT INTO mysql_query_rules(active,match_pattern,destination_hostgroup,apply) VALUES(1,'^SELECT.*FOR UPDATE$',100,1); INSERT INTO mysql_query_rules(active,match_pattern,destination_hostgroup,apply) VALUES(1,'^SELECT',1000,1);
加载并保存配置
load mysql query rules to runtime; save mysql query rules to disk;
检查确认:
select rule_id,active,match_pattern,destination_hostgroup,apply from runtime_mysql_query_rules; +---------+--------+----------------------+-----------------------+-------+ | rule_id | active | match_pattern | destination_hostgroup | apply | +---------+--------+----------------------+-----------------------+-------+ | 1 | 1 | ^SELECT.*FOR UPDATE$ | 100 | 1 | | 2 | 1 | ^SELECT | 1000 | 1 | +---------+--------+----------------------+-----------------------+-------+
6
.11 MySQL客户端(继续)测试并验证
再继续测试之前,先要将ProxySQL审计记录清空:
(admin@127.0.0.1:6032) [(none)]> select * from stats_mysql_query_digest_reset; (admin@127.0.0.1:6032) [(none)]> select * from stats_mysql_query_digest; shell> mysql -uproxysql -pproxysql -h192.168.238.111 -P6033 mysql> delete from cons7 where id=7; mysql> delete from cons7 where id=8; mysql> delete from cons7 where id=9; mysql> select count(*) from cons7; mysql> select count(*) from cons7; mysql> select count(*) from cons7;
查看ProxySQL库中stats_mysql_query_digest审计表,记录显示detete写操作在100组进行,select读操作在1000组进行,即说明读写已经实现分离。
(admin@127.0.0.1:6032) [(none)]> select hostgroup,schemaname,username,digest_text,count_star from stats_mysql_query_digest; +-----------+--------------------+----------+----------------------------------+------------+ | hostgroup | schemaname | username | digest_text | count_star | +-----------+--------------------+----------+----------------------------------+------------+ | 1000 | hzmc1 | proxysql | select count(*) from cons7 | 3 | | 100 | hzmc1 | proxysql | show tables | 1 | | 100 | hzmc1 | proxysql | delete from cons7 where id=? | 3 | | 100 | hzmc1 | proxysql | show databases | 1 | | 100 | information_schema | proxysql | select USER() | 1 | | 1000 | information_schema | proxysql | SELECT DATABASE() | 1 | | 100 | information_schema | proxysql | select @@version_comment limit ? | 1 | +-----------+--------------------+----------+----------------------------------+------------+
观察各HG组情况:
(admin@127.0.0.1:6032) [(none)]> select hostgroup_id,hostname,port,status,weight from runtime_mysql_servers; +--------------+-----------------+------+--------+--------+ | hostgroup_id | hostname | port | status | weight | +--------------+-----------------+------+--------+--------+ | 100 | 192.168.238.111 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.112 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.110 | 3306 | ONLINE | 1 | +--------------+-----------------+------+--------+--------+
6
.12 将MHA与ProxySQL配合使用
在MHA架构下,所有的从节点全局变量read_only选项都会设置成ON,而在ProxySQL中存在mysql_replication_hostgroups 表,且该表的主要作用是监视指定主机组中所有服务器的read_only值,并且根据read_only的值将服务器分配给写入器或读取器主机组,定义 hostgroup 的主从关系。ProxySQL monitor 模块会监控 HG 后端所有servers 的 read_only 变量,如果发现从库的 read_only 变为0、主库变为1,则认为角色互换了,自动改写 mysql_servers 表里面 hostgroup 关系,达到自动 Failover 效果。
定义写组和只读组,并进行加载和保存
insert into mysql_replication_hostgroups(writer_hostgroup,reader_hostgroup,comment) values(100,1000,'实现读写分离自动切换高可用');
load mysql servers to runtime;
save mysql servers to disk;
检查确认:
(admin@127.0.0.1:6032) [(none)]> select * from runtime_mysql_replication_hostgroups; +------------------+------------------+------------+-----------------------------------------+ | writer_hostgroup | reader_hostgroup | check_type | comment | +------------------+------------------+------------+-----------------------------------------+ | 100 | 1000 | read_only | 实现读写分离自动切换高可用 | +------------------+------------------+------------+-----------------------------------------+
发现1000 HG组中也有主节点,意味着会有少量查询也会路由到主库,这个由路由规则决定,无法避免。
(admin@127.0.0.1:6032) [(none)]> select hostgroup_id,hostname,port,status,weight from runtime_mysql_servers; +--------------+-----------------+------+--------+--------+ | hostgroup_id | hostname | port | status | weight | +--------------+-----------------+------+--------+--------+ | 100 | 192.168.238.111 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.111 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.112 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.110 | 3306 | ONLINE | 1 | +--------------+-----------------+------+--------+--------+
七 安装及配置MySQL Sniffer
7.1 下载MySQL Sniffer介质
MySQL Sniffer 是一个基于 MySQL 协议的抓包工具,实时抓取 MySQLServer 端或 Client 端请求,并格式化输出。输出内容包括访问时间、访问用户、来源 IP、访问 Database、命令耗时、返回数据行数、执行语句等。有批量抓取多个端口,后台运行,日志分割等多种使用方式,操作便捷,输出友好。
同时也适用抓取 Atlas 端的请求,Atlas 是奇虎开源的一款基于MySQL协议的数据中间层项目,项目地址:。
同类型工具还有vc-mysql-sniffer,以及 tshark 的 -e mysql.query 参数来解析 MySQL 协议。
下载地址:
https://www.cnblogs.com/zhoujinyi/p/6497231.html
选择:
7.2 安装MySQL Sniffer软件
建议在 centos6.2 及以上编译安装,并用 root 运行。
安装依赖包:
glib2-devel(2.28.8)、libpcap-devel(1.4.0)、libnet-devel(1.1.6)
执行:
yum install -y git yum install -y cmake yum install -y gcc* yum install -y glib2-devel yum install -y /soft/libpcap-devel-1.4.0-4.20130826git2dbcaa1.el6.x86_64.rpm yum install -y /soft/libnet-1.1.6-7.el6.x86_64.rpm yum install -y /soft/libnet-devel-1.1.6-7.el6.x86_64.rpm
部分安装包,系统ISO镜像中未包含,需要额外去开源平台下载,下载地址:
()(64bit)
执行安装
git clone git@github.com:Qihoo360/mysql-sniffer.git cd mysql-sniffer mkdir proj cd proj cmake ../ make cd bin/
7 .3 监控MySQL服务端操作
获取mysql-sniffer工具用法
shell> ./mysql-sniffer -h Usage ./mysql-sniffer [-d] -i eth0 -p 3306,3307,3308 -l /var/log/mysql-sniffer/ -e stderr [-d] -i eth0 -r 3000-4000 -d daemon mode. -s how often to split the log file(minute, eg. 1440). if less than 0, split log everyday -i interface. Default to eth0 -p port, default to 3306. Multiple ports should be splited by ','. eg. 3306,3307 this option has no effect when -f is set. -r port range, Don't use -r and -p at the same time -l query log DIRECTORY. Make sure that the directory is accessible. Default to stdout. -e error log FILENAME or 'stderr'. if set to /dev/null, runtime error will not be recorded -f filename. use pcap file instead capturing the network interface -w white list. dont capture the port. Multiple ports should be splited by ','. -t truncation length. truncate long query if it's longer than specified length. Less than 0 means no truncation -n keeping tcp stream count, if not set, default is 65536. if active tcp count is larger than the specified count, mysql-sniffer will remove the oldest one
对MySQL 3306端口执行监控
shell> ./mysql-sniffer -i eth0 -p 3306 2019-02-12 16:38:40 monitor 192.168.238.111 NULL 0ms 1 SHOW SLAVE STATUS 2019-02-12 16:38:40 monitor 192.168.238.111 NULL 0ms 1 SHOW SLAVE STATUS 2019-02-12 16:38:40 monitor 192.168.238.111 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:40 monitor 192.168.238.111 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:40 monitor 192.168.238.110 NULL 0ms 0 SHOW SLAVE STATUS 2019-02-12 16:38:40 monitor 192.168.238.110 NULL 0ms 0 SHOW SLAVE STATUS 2019-02-12 16:38:40 monitor 192.168.238.110 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:41 monitor 192.168.238.111 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:41 monitor 192.168.238.111 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:41 monitor 192.168.238.110 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:43 monitor 192.168.238.111 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:43 monitor 192.168.238.111 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:43 monitor 192.168.238.110 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:44 monitor 192.168.238.111 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:44 monitor 192.168.238.111 NULL 0ms 1 SELECT @@global.read_only 2019-02-12 16:38:44 monitor 192.168.238.110 NULL 0ms 1 SELECT @@global.read_only
八 使用Keepalived软件创建vip
8.1 rpm包方式安装KeepAlived
Centos安装介质中内嵌了keepalived软件rpm包,直接调用此包安装即可(若需要相关依赖包,建议配置本地YUM源,进行安装)
shell> cd /mnt/Packages/ shell> ls -l keepalived* shell> rpm -ivh keepalived-1.2.13-5.el6_6.x86_64.rpm Preparing... ########################################### [100%] 1:keepalived ########################################### [100%] shell> rpm -qa|grep -i keepalived keepalived-1.2.13-5.el6_6.x86_64 shell> rpm -ql keepalived-1.2.13-5.el6_6.x86_64 /etc/keepalived /etc/keepalived/keepalived.conf /etc/rc.d/init.d/keepalived /etc/sysconfig/keepalived /usr/bin/genhash /usr/libexec/keepalived /usr/sbin/keepalived /usr/share/doc/keepalived-1.2.13 /usr/share/doc/keepalived-1.2.13/AUTHOR /usr/share/doc/keepalived-1.2.13/CONTRIBUTORS /usr/share/doc/keepalived-1.2.13/COPYING /usr/share/doc/keepalived-1.2.13/ChangeLog /usr/share/doc/keepalived-1.2.13/NOTE_vrrp_vmac.txt /usr/share/doc/keepalived-1.2.13/README /usr/share/doc/keepalived-1.2.13/TODO /usr/share/doc/keepalived-1.2.13/VERSION /usr/share/doc/keepalived-1.2.13/keepalived.conf.SYNOPSIS /usr/share/doc/keepalived-1.2.13/samples /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.HTTP_GET.port /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.IPv6 /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.SMTP_CHECK /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.SSL_GET /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.fwmark /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.inhibit /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.misc_check /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.misc_check_arg /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.quorum /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.sample /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.status_code /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.track_interface /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.virtual_server_group /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.virtualhost /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.vrrp /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.vrrp.localcheck /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.vrrp.lvs_syncd /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.vrrp.routes /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.vrrp.scripts /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.vrrp.static_ipaddress /usr/share/doc/keepalived-1.2.13/samples/keepalived.conf.vrrp.sync /usr/share/doc/keepalived-1.2.13/samples/sample.misccheck.smbcheck.sh /usr/share/man/man1/genhash.1.gz /usr/share/man/man5/keepalived.conf.5.gz /usr/share/man/man8/keepalived.8.gz /usr/share/snmp/mibs/KEEPALIVED-MIB.txt
8.2 源码编译方式安装KeepAlived
安装KeepAlived软件依赖包
shell> yum -y install openssl openssl-devel
源码安装KeepAlived
shell> mkdir -p /usr/local/keepalived shell> wget shell> tar -zxvf keepalived-1.3.5.tar.gz shell> cd /soft/keepalived-1.3.5/ shell> ./configure --prefix=/usr/local/keepalived checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /usr/bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes ...... Keepalived configuration ------------------------ Keepalived version : 1.3.5 Compiler : gcc Preprocessor flags : Compiler flags : -Wall -Wunused -Wstrict-prototypes -Wextra -g -O2 Linker flags : Extra Lib : -lcrypto -lssl -lnl Use IPVS Framework : Yes IPVS use libnl : Yes IPVS syncd attributes : No IPVS 64 bit stats : No fwmark socket support : Yes Use VRRP Framework : Yes Use VRRP VMAC : Yes Use VRRP authentication : Yes With ip rules/routes : Yes SNMP vrrp support : No SNMP checker support : No SNMP RFCv2 support : No SNMP RFCv3 support : No DBUS support : No SHA1 support : No Use Debug flags : No Stacktrace support : No Memory alloc check : No libnl version : 1 Use IPv4 devconf : No Use libiptc : No Use libipset : No init type : systemd Build genhash : Yes Build documentation : No shell> make && make install Making all in lib make[1]: Entering directory `/soft/keepalived-1.3.5/lib' make all-am make[2]: Entering directory `/soft/keepalived-1.3.5/lib' ...... make[2]: Leaving directory `/soft/keepalived-1.3.5/doc' make[1]: Leaving directory `/soft/keepalived-1.3.5/doc' Making install in genhash make[1]: Entering directory `/soft/keepalived-1.3.5/genhash' make[2]: Entering directory `/soft/keepalived-1.3.5/genhash' /usr/bin/mkdir -p '/usr/local/keepalived/bin' /usr/bin/install -c genhash '/usr/local/keepalived/bin' make[2]: Nothing to be done for `install-data-am'. make[2]: Leaving directory `/soft/keepalived-1.3.5/genhash' make[1]: Leaving directory `/soft/keepalived-1.3.5/genhash' Making install in bin_install make[1]: Entering directory `/soft/keepalived-1.3.5/bin_install' make[2]: Entering directory `/soft/keepalived-1.3.5/bin_install' make[2]: Nothing to be done for `install-exec-am'. make[2]: Nothing to be done for `install-data-am'. make[2]: Leaving directory `/soft/keepalived-1.3.5/bin_install' make[1]: Leaving directory `/soft/keepalived-1.3.5/bin_install' make[1]: Entering directory `/soft/keepalived-1.3.5' make[2]: Entering directory `/soft/keepalived-1.3.5' make[2]: Nothing to be done for `install-exec-am'. make[2]: Nothing to be done for `install-data-am'. make[2]: Leaving directory `/soft/keepalived-1.3.5' make[1]: Leaving directory `/soft/keepalived-1.3.5'
Centos 6.x拷贝安装文件到指定位置
shell> cp /usr/local/keepalived/sbin/keepalived /usr/sbin/ shell> cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/ shell> cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/ shell> mkdir -p /etc/keepalived
Centos 7.x拷贝安装文件到指定位置
shell> cp /usr/local/keepalived/sbin/keepalived /usr/sbin/ shell> cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/ shell> cp /soft/keepalived-1.3.5/keepalived/etc/init.d/keepalived /etc/init.d/ shell> mkdir -p /etc/keepalived shell> more /lib/systemd/system/keepalived.service|grep -i pidfile shell> vi /lib/systemd/system/keepalived.service ... shell>PIDFile=/usr/local/keepalived/var/run/keepalived.pid PIDFile=/var/run/keepalived.pid ... shell> systemctl daemon-reload shell> vi /etc/keepalived/keepalived.conf #编辑KeepAlived参数文件实现负载均衡和故障切换 shell> systemctl start keepalived
注:KeepAlived源码安装包下载地址
8.3 配置master节点KeepAlived.conf
shell> mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak shell> vi /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 1173740146@qq.com } notification_email_from 1173740146@qq.com smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id ProxySQL-HA } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 51 priority 100 nopreempt advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.238.200 } }
8
.4 配置slave1节点KeepAlived.conf
shell> mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak shell> vi /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 1173740146@qq.com } notification_email_from 1173740146@qq.com smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id ProxySQL-HA } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 51 priority 150 nopreempt advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.238.200 } }
8
.5 启动KeepAlived服务并观察
Master节点:
[root@master keepalived]# /etc/init.d/keepalived start Starting keepalived: [ OK ] [root@master keepalived]# ps -ef|grep -i keepalived|grep -v grep|awk '{print $1,$2,$3,$8,$9}'|column -t root 11040 1 /usr/sbin/keepalived -D root 11042 11040 /usr/sbin/keepalived -D root 11043 11040 /usr/sbin/keepalived -D shell> tail -f /var/log/messages Feb 12 17:10:12 master Keepalived[11039]: Starting Keepalived v1.2.13 (03/19,2015) Feb 12 17:10:12 master Keepalived[11040]: Starting Healthcheck child process, pid=11042 Feb 12 17:10:12 master Keepalived[11040]: Starting VRRP child process, pid=11043 Feb 12 17:10:12 master Keepalived_vrrp[11043]: Netlink reflector reports IP 192.168.238.110 added Feb 12 17:10:12 master Keepalived_vrrp[11043]: Netlink reflector reports IP fe80::4cb4:a3ff:feea:e3cf added Feb 12 17:10:12 master Keepalived_vrrp[11043]: Registering Kernel netlink reflector Feb 12 17:10:12 master Keepalived_vrrp[11043]: Registering Kernel netlink command channel Feb 12 17:10:12 master Keepalived_vrrp[11043]: Registering gratuitous ARP shared channel Feb 12 17:10:12 master Keepalived_vrrp[11043]: Opening file '/etc/keepalived/keepalived.conf'. Feb 12 17:10:12 master Keepalived_vrrp[11043]: Configuration is using : 62886 Bytes Feb 12 17:10:12 master Keepalived_vrrp[11043]: Using LinkWatch kernel netlink reflector... Feb 12 17:10:12 master Keepalived_vrrp[11043]: VRRP_Instance(VI_1) Entering BACKUP STATE Feb 12 17:10:12 master Keepalived_vrrp[11043]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)] Feb 12 17:10:12 master kernel: IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP) Feb 12 17:10:12 master kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes) Feb 12 17:10:12 master kernel: IPVS: ipvs loaded. Feb 12 17:10:12 master Keepalived_healthcheckers[11042]: Netlink reflector reports IP 192.168.238.110 added Feb 12 17:10:12 master Keepalived_healthcheckers[11042]: Netlink reflector reports IP fe80::4cb4:a3ff:feea:e3cf added Feb 12 17:10:12 master Keepalived_healthcheckers[11042]: Registering Kernel netlink reflector Feb 12 17:10:12 master Keepalived_healthcheckers[11042]: Registering Kernel netlink command channel Feb 12 17:10:12 master Keepalived_healthcheckers[11042]: Opening file '/etc/keepalived/keepalived.conf'. Feb 12 17:10:12 master Keepalived_healthcheckers[11042]: Configuration is using : 7429 Bytes Feb 12 17:10:12 master Keepalived_healthcheckers[11042]: Using LinkWatch kernel netlink reflector...
VRRP(虚拟路由冗余协议)单播通信日志监控
[root@master keepalived]# tcpdump -i eth0 vrrp -n tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 17:30:07.492625 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20 17:30:08.492971 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20 17:30:09.493195 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20 17:30:10.493729 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20 17:30:11.494140 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20
Slave1节点:
[root@slave1 keepalived]# /etc/init.d/keepalived start Starting keepalived: [ OK ] [root@slave1 ~]# ps -ef|grep -i keepalived|grep -v grep|awk '{print $1,$2,$3,$8,$9}'|column -t root 10429 1 /usr/sbin/keepalived -D root 10431 10429 /usr/sbin/keepalived -D root 10432 10429 /usr/sbin/keepalived -D shell> tail -f /var/log/messages Feb 12 17:09:41 slave1 Keepalived[10428]: Starting Keepalived v1.2.13 (03/19,2015) Feb 12 17:09:41 slave1 Keepalived[10429]: Starting Healthcheck child process, pid=10431 Feb 12 17:09:41 slave1 Keepalived[10429]: Starting VRRP child process, pid=10432 Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: Netlink reflector reports IP 192.168.238.111 added Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: Netlink reflector reports IP fe80::38cf:7ff:fe78:4485 added Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: Registering Kernel netlink reflector Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: Registering Kernel netlink command channel Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: Registering gratuitous ARP shared channel Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: Opening file '/etc/keepalived/keepalived.conf'. Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: Configuration is using : 62886 Bytes Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: Using LinkWatch kernel netlink reflector... Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: VRRP_Instance(VI_1) Entering BACKUP STATE Feb 12 17:09:41 slave1 Keepalived_vrrp[10432]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)] Feb 12 17:09:41 slave1 kernel: IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP) Feb 12 17:09:41 slave1 kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes) Feb 12 17:09:41 slave1 kernel: IPVS: ipvs loaded. Feb 12 17:09:41 slave1 Keepalived_healthcheckers[10431]: Netlink reflector reports IP 192.168.238.111 added Feb 12 17:09:41 slave1 Keepalived_healthcheckers[10431]: Netlink reflector reports IP fe80::38cf:7ff:fe78:4485 added Feb 12 17:09:41 slave1 Keepalived_healthcheckers[10431]: Registering Kernel netlink reflector Feb 12 17:09:41 slave1 Keepalived_healthcheckers[10431]: Registering Kernel netlink command channel Feb 12 17:09:41 slave1 Keepalived_healthcheckers[10431]: Opening file '/etc/keepalived/keepalived.conf'. Feb 12 17:09:41 slave1 Keepalived_healthcheckers[10431]: Configuration is using : 7429 Bytes Feb 12 17:09:41 slave1 Keepalived_healthcheckers[10431]: Using LinkWatch kernel netlink reflector... Feb 12 17:09:44 slave1 Keepalived_vrrp[10432]: VRRP_Instance(VI_1) Transition to MASTER STATE Feb 12 17:09:45 slave1 Keepalived_vrrp[10432]: VRRP_Instance(VI_1) Entering MASTER STATE Feb 12 17:09:45 slave1 Keepalived_vrrp[10432]: VRRP_Instance(VI_1) setting protocol VIPs. Feb 12 17:09:45 slave1 Keepalived_vrrp[10432]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.238.200 Feb 12 17:09:45 slave1 Keepalived_healthcheckers[10431]: Netlink reflector reports IP 192.168.238.200 added Feb 12 17:09:50 slave1 Keepalived_vrrp[10432]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.238.200
确认slave1节点eth0接口已经绑上vip
[root@slave1 keepalived]# ip a|grep eth0 2: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.238.111/24 brd 192.168.238.255 scope global eth0 inet 192.168.238.200/32 scope global eth0
VRRP(虚拟路由冗余协议)单播通信日志监控
[root@slave1 keepalived]# tcpdump -i eth0 vrrp -n tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 17:30:55.296202 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20 17:30:56.296560 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20 17:30:57.296849 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20 17:30:58.297197 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20 17:30:59.297526 IP 192.168.238.111 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 150, authtype simple, intvl 1s, length 20
8.6 通过vip连接MySQL服务测试
[root@node2 ~]# mysql -uproxysql -pproxysql -h192.168.238.200 -P6033 -e"show global variables like 'hostname'" Warning: Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | hostname | slave1 | +---------------+--------+ [root@node2 ~]# mysql -uproxysql -pproxysql -h192.168.238.200 -P6033 -e"show global variables like 'hostname'" Warning: Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | hostname | slave1 | +---------------+--------+
九 MHA手动切换
9.1 手动在线切换
[root@mgr ~]# cat /etc/masterha/mha_mgr_monitor.sh #!/bin/bash /usr/bin/nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
注意:由于在MHA服务启动脚本里添加--remove_dead_master_conf选项,切换后会自动将Master节点信息从/etc/masterha/app1.cnf中移除,如果要回切,需要重新配置MHA。
--remove_dead_master_conf:该参数代表当发生自动故障切换后,原Master相关信息将会从 /etc/masterha/app1.cnf 配置文件中移除。
--ignore_last_failover:在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/var/tmp产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。
重新配置MHA
[root@mgr ~]# vi /etc/masterha/app1.cnf [server default] manager_log=/var/log/masterha/app1/manager.log manager_workdir=/var/log/masterha/app1 master_binlog_dir=/var/lib/mysql/binlog master_ip_failover_script=/etc/masterha/master_ip_failover master_ip_online_change_script=/etc/masterha/master_ip_online_change password=mysql ping_interval=1 remote_workdir=/tmp repl_password=mysql repl_user=repl secondary_check_script=/usr/bin/masterha_secondary_check -s slave1 -s master report_script="" shutdown_script="" ssh_user=root user=root [server1] hostname=slave1 port=3306 [server2] candidate_master=1 check_repl_delay=0 hostname=master port=3306 [server3] hostname=slave2 port=3306
检查SSH配置:
[root@mgr ~]# /usr/bin/masterha_check_ssh --conf=/etc/masterha/app1.cnf Tue Feb 12 17:55:33 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Feb 12 17:55:33 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Feb 12 17:55:33 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Feb 12 17:55:33 2019 - [info] Starting SSH connection tests.. Tue Feb 12 17:55:35 2019 - [debug] Tue Feb 12 17:55:33 2019 - [debug] Connecting via SSH from root@master(192.168.238.110:22) to root@slave1(192.168.238.111:22).. Tue Feb 12 17:55:34 2019 - [debug] ok. Tue Feb 12 17:55:34 2019 - [debug] Connecting via SSH from root@master(192.168.238.110:22) to root@slave2(192.168.238.112:22).. Tue Feb 12 17:55:35 2019 - [debug] ok. Tue Feb 12 17:55:36 2019 - [debug] Tue Feb 12 17:55:34 2019 - [debug] Connecting via SSH from root@slave1(192.168.238.111:22) to root@master(192.168.238.110:22).. Tue Feb 12 17:55:35 2019 - [debug] ok. Tue Feb 12 17:55:35 2019 - [debug] Connecting via SSH from root@slave1(192.168.238.111:22) to root@slave2(192.168.238.112:22).. Tue Feb 12 17:55:35 2019 - [debug] ok. Tue Feb 12 17:55:36 2019 - [debug] Tue Feb 12 17:55:34 2019 - [debug] Connecting via SSH from root@slave2(192.168.238.112:22) to root@master(192.168.238.110:22).. Tue Feb 12 17:55:35 2019 - [debug] ok. Tue Feb 12 17:55:35 2019 - [debug] Connecting via SSH from root@slave2(192.168.238.112:22) to root@slave1(192.168.238.111:22).. Tue Feb 12 17:55:35 2019 - [debug] ok. Tue Feb 12 17:55:36 2019 - [info] All SSH connection tests passed successfully.
检查复制集群状态:
[root@mgr ~]# /usr/bin/masterha_check_repl --conf=/etc/masterha/app1.cnf …… Tue Jan 15 13:14:19 2019 - [info] Connecting to root@192.168.238.110(master:22).. Can't exec "mysqlbinlog": No such file or directory at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106. mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options at /usr/bin/apply_diff_relay_logs line 493 Testing mysql connection and privileges..sh: mysql: command not found mysql command failed with rc 127:0!
检查时发现mysqlbinlog、mysql程序不存在,这是MHA未检测到对应的环境变量,手动做软链接
[root@master ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog [root@master ~]# ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
继续检查:
[root@mgr ~]# /usr/bin/masterha_check_repl --conf=/etc/masterha/app1.cnf Tue Feb 12 17:58:30 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Feb 12 17:58:30 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Feb 12 17:58:30 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Feb 12 17:58:30 2019 - [info] MHA::MasterMonitor version 0.57. Tue Feb 12 17:58:31 2019 - [info] GTID failover mode = 1 Tue Feb 12 17:58:31 2019 - [info] Dead Servers: Tue Feb 12 17:58:31 2019 - [info] Alive Servers: Tue Feb 12 17:58:31 2019 - [info] master(192.168.238.110:3306) Tue Feb 12 17:58:31 2019 - [info] slave1(192.168.238.111:3306) Tue Feb 12 17:58:31 2019 - [info] slave2(192.168.238.112:3306) Tue Feb 12 17:58:31 2019 - [info] Alive Slaves: Tue Feb 12 17:58:31 2019 - [info] master(192.168.238.110:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Feb 12 17:58:31 2019 - [info] GTID ON Tue Feb 12 17:58:31 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Feb 12 17:58:31 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Feb 12 17:58:31 2019 - [info] GTID ON Tue Feb 12 17:58:31 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Feb 12 17:58:31 2019 - [info] Current Alive Master: slave1(192.168.238.111:3306) Tue Feb 12 17:58:31 2019 - [info] Checking slave configurations.. Tue Feb 12 17:58:31 2019 - [info] Checking replication filtering settings.. Tue Feb 12 17:58:31 2019 - [info] binlog_do_db= , binlog_ignore_db= Tue Feb 12 17:58:31 2019 - [info] Replication filtering check ok. Tue Feb 12 17:58:31 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Tue Feb 12 17:58:31 2019 - [info] Checking SSH publickey authentication settings on the current master.. Tue Feb 12 17:58:31 2019 - [info] HealthCheck: SSH to slave1 is reachable. Tue Feb 12 17:58:31 2019 - [info] slave1(192.168.238.111:3306) (current master) +--master(192.168.238.110:3306) +--slave2(192.168.238.112:3306) Tue Feb 12 17:58:31 2019 - [info] Checking replication health on master.. Tue Feb 12 17:58:31 2019 - [info] ok. Tue Feb 12 17:58:31 2019 - [info] Checking replication health on slave2.. Tue Feb 12 17:58:31 2019 - [info] ok. Tue Feb 12 17:58:31 2019 - [info] Checking master_ip_failover_script status: Tue Feb 12 17:58:31 2019 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=slave1 --orig_master_ip=192.168.238.111 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.238.200/24=== Checking the Status of the script.. OK Tue Feb 12 17:58:31 2019 - [info] OK. Tue Feb 12 17:58:31 2019 - [warning] shutdown_script is not defined. Tue Feb 12 17:58:31 2019 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
检查自动切换后MHA监控服务状态:
[root@mgr ~]# /usr/bin/masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
注意:如果MHA监控服务处于正常启动状态,则需要先关闭。
[root@mgr ~]# masterha_stop --conf=/etc/masterha/app1.cnf MHA Manager is not running on app1(2:NOT_RUNNING).
否则如下报错:
[root@mgr ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.238.110 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 Tue Jan 15 14:18:24 2019 - [info] MHA::MasterRotate version 0.57. Tue Jan 15 14:18:24 2019 - [info] Starting online master switch.. Tue Jan 15 14:18:24 2019 - [info] Tue Jan 15 14:18:24 2019 - [info] * Phase 1: Configuration Check Phase.. Tue Jan 15 14:18:24 2019 - [info] Tue Jan 15 14:18:24 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 14:18:24 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:18:24 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:18:25 2019 - [info] GTID failover mode = 0 Tue Jan 15 14:18:25 2019 - [info] Current Alive Master: slave1(192.168.238.111:3306) Tue Jan 15 14:18:25 2019 - [info] Alive Slaves: Tue Jan 15 14:18:25 2019 - [info] master(192.168.238.110:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:18:25 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Jan 15 14:18:25 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 14:18:25 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:18:25 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on slave1(192.168.238.111:3306)? (YES/no): YES Tue Jan 15 14:18:27 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Jan 15 14:18:27 2019 - [info] ok. Tue Jan 15 14:18:27 2019 - [info] Checking MHA is not monitoring or doing failover.. Tue Jan 15 14:18:27 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again. Tue Jan 15 14:18:27 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53.
检查候选Master主从复制延时
[root@master ~]# mysql -uroot -pmysql mysql> show slave status\G; ...... Slave_IO_Running: Yes Slave_SQL_Running: Yes ...... Seconds_Behind_Master: 0 ......
--orig_master_is_new_slave:切换时加上此选项是将原 master 变更为 slave 节点,如果不加此选项,原来的 master 将不启用
--running_updates_limit=10000:故障切换时,候选master如果有延迟的话,mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover时relay日志的大小决定
执行手动在线切换(switchover)
[root@mgr ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.238.110 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 Tue Jan 15 14:40:58 2019 - [info] MHA::MasterRotate version 0.57. Tue Jan 15 14:40:58 2019 - [info] Starting online master switch.. Tue Jan 15 14:40:58 2019 - [info] Tue Jan 15 14:40:58 2019 - [info] * Phase 1: Configuration Check Phase.. Tue Jan 15 14:40:58 2019 - [info] Tue Jan 15 14:40:58 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 15 14:40:58 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:40:58 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Jan 15 14:40:59 2019 - [info] GTID failover mode = 0 Tue Jan 15 14:40:59 2019 - [info] Current Alive Master: slave1(192.168.238.111:3306) Tue Jan 15 14:40:59 2019 - [info] Alive Slaves: Tue Jan 15 14:40:59 2019 - [info] master(192.168.238.110:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:40:59 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Jan 15 14:40:59 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 15 14:40:59 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Jan 15 14:40:59 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on slave1(192.168.238.111:3306)? (YES/no): YES Tue Jan 15 14:41:02 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Jan 15 14:41:02 2019 - [info] ok. Tue Jan 15 14:41:02 2019 - [info] Checking MHA is not monitoring or doing failover.. Tue Jan 15 14:41:02 2019 - [info] Checking replication health on master.. Tue Jan 15 14:41:02 2019 - [info] ok. Tue Jan 15 14:41:02 2019 - [info] Checking replication health on slave2.. Tue Jan 15 14:41:02 2019 - [info] ok. Tue Jan 15 14:41:02 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln1218] 192.168.238.110 is not alive! Tue Jan 15 14:41:02 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln232] Failed to get new master! Tue Jan 15 14:41:02 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53. [root@mgr ~]# cat /etc/masterha/app1.cnf [server default] manager_log=/var/log/masterha/app1/manager.log manager_workdir=/var/log/masterha/app1 master_binlog_dir=/var/lib/mysql/binlog master_ip_failover_script=/etc/masterha/master_ip_failover master_ip_online_change_script=/etc/masterha/master_ip_online_change password=mysql ping_interval=1 remote_workdir=/tmp repl_password=mysql repl_user=repl secondary_check_script=/usr/bin/masterha_secondary_check -s slave1 -s master report_script="" shutdown_script="" ssh_user=root user=root [server1] hostname=slave1 port=3306 [server2] candidate_master=1 check_repl_delay=0 hostname=master port=3306 [server3] hostname=slave2 port=3306
注意:手动在线切换的时候,注意区分主机名和IP地址
报错:Tue Jan 15 14:41:02 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln1218] 192.168.238.110 is not alive!
出现该报错一般有两种情况:
1、 MHA配置文件/etc/masterha/app1.cnf候选节点配置no_master=1选项,限制该节点成为新Master的可能
2、按官方文档描述,参数--new_master_host =(hostname),而不是用IP地址。若MHA配置文件/etc/masterha/app1.cnf用的是IP地址,那么在切换的时候使用IP地址也是可行的。
[root@mgr ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=master --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 Tue Feb 12 18:06:04 2019 - [info] MHA::MasterRotate version 0.57. Tue Feb 12 18:06:04 2019 - [info] Starting online master switch.. Tue Feb 12 18:06:04 2019 - [info] Tue Feb 12 18:06:04 2019 - [info] * Phase 1: Configuration Check Phase.. Tue Feb 12 18:06:04 2019 - [info] Tue Feb 12 18:06:04 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Feb 12 18:06:04 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Feb 12 18:06:04 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Feb 12 18:06:05 2019 - [info] GTID failover mode = 1 Tue Feb 12 18:06:05 2019 - [info] Current Alive Master: slave1(192.168.238.111:3306) Tue Feb 12 18:06:05 2019 - [info] Alive Slaves: Tue Feb 12 18:06:05 2019 - [info] master(192.168.238.110:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Feb 12 18:06:05 2019 - [info] GTID ON Tue Feb 12 18:06:05 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Feb 12 18:06:05 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Feb 12 18:06:05 2019 - [info] GTID ON Tue Feb 12 18:06:05 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on slave1(192.168.238.111:3306)? (YES/no): YES Tue Feb 12 18:06:19 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Feb 12 18:06:19 2019 - [info] ok. Tue Feb 12 18:06:19 2019 - [info] Checking MHA is not monitoring or doing failover.. Tue Feb 12 18:06:19 2019 - [info] Checking replication health on master.. Tue Feb 12 18:06:19 2019 - [info] ok. Tue Feb 12 18:06:19 2019 - [info] Checking replication health on slave2.. Tue Feb 12 18:06:19 2019 - [info] ok. Tue Feb 12 18:06:19 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln1218] 192.168.238.110 is not alive! Tue Feb 12 18:06:19 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln232] Failed to get new master! Tue Feb 12 18:06:19 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53. [root@mgr masterha]# [root@mgr masterha]# [root@mgr masterha]# [root@mgr masterha]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=master --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 Tue Feb 12 18:06:52 2019 - [info] MHA::MasterRotate version 0.57. Tue Feb 12 18:06:52 2019 - [info] Starting online master switch.. Tue Feb 12 18:06:52 2019 - [info] Tue Feb 12 18:06:52 2019 - [info] * Phase 1: Configuration Check Phase.. Tue Feb 12 18:06:52 2019 - [info] Tue Feb 12 18:06:52 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Feb 12 18:06:52 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Feb 12 18:06:52 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Feb 12 18:06:53 2019 - [info] GTID failover mode = 1 Tue Feb 12 18:06:53 2019 - [info] Current Alive Master: slave1(192.168.238.111:3306) Tue Feb 12 18:06:53 2019 - [info] Alive Slaves: Tue Feb 12 18:06:53 2019 - [info] master(192.168.238.110:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Feb 12 18:06:53 2019 - [info] GTID ON Tue Feb 12 18:06:53 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) Tue Feb 12 18:06:53 2019 - [info] slave2(192.168.238.112:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Tue Feb 12 18:06:53 2019 - [info] GTID ON Tue Feb 12 18:06:53 2019 - [info] Replicating from 192.168.238.111(192.168.238.111:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on slave1(192.168.238.111:3306)? (YES/no): YES Tue Feb 12 18:06:55 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Feb 12 18:06:55 2019 - [info] ok. Tue Feb 12 18:06:55 2019 - [info] Checking MHA is not monitoring or doing failover.. Tue Feb 12 18:06:55 2019 - [info] Checking replication health on master.. Tue Feb 12 18:06:55 2019 - [info] ok. Tue Feb 12 18:06:55 2019 - [info] Checking replication health on slave2.. Tue Feb 12 18:06:55 2019 - [info] ok. Tue Feb 12 18:06:55 2019 - [info] master can be new master. Tue Feb 12 18:06:55 2019 - [info] From: slave1(192.168.238.111:3306) (current master) +--master(192.168.238.110:3306) +--slave2(192.168.238.112:3306) To: master(192.168.238.110:3306) (new master) +--slave2(192.168.238.112:3306) +--slave1(192.168.238.111:3306) Starting master switch from slave1(192.168.238.111:3306) to master(192.168.238.110:3306)? (yes/NO): yes Tue Feb 12 18:07:02 2019 - [info] Checking whether master(192.168.238.110:3306) is ok for the new master.. Tue Feb 12 18:07:02 2019 - [info] ok. Tue Feb 12 18:07:02 2019 - [info] slave1(192.168.238.111:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Tue Feb 12 18:07:04 2019 - [info] slave1(192.168.238.111:3306): Resetting slave pointing to the dummy host. Tue Feb 12 18:07:04 2019 - [info] ** Phase 1: Configuration Check Phase completed. Tue Feb 12 18:07:04 2019 - [info] Tue Feb 12 18:07:04 2019 - [info] * Phase 2: Rejecting updates Phase.. Tue Feb 12 18:07:04 2019 - [info] Tue Feb 12 18:07:04 2019 - [info] Executing master ip online change script to disable write on the current master: Tue Feb 12 18:07:04 2019 - [info] /etc/masterha/master_ip_online_change --command=stop --orig_master_host=slave1 --orig_master_ip=192.168.238.111 --orig_master_port=3306 --orig_master_user='root' --new_master_host=master --new_master_ip=192.168.238.110 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx *************************************************************** Disabling the VIP - 192.168.238.200/24 on old master: slave1 *************************************************************** Tue Feb 12 18:07:05 2019 - [info] ok. Tue Feb 12 18:07:05 2019 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Tue Feb 12 18:07:05 2019 - [info] Executing FLUSH TABLES WITH READ LOCK.. Tue Feb 12 18:07:05 2019 - [info] ok. Tue Feb 12 18:07:05 2019 - [info] Orig master binlog:pos is mysql-bin.000054:27165. Tue Feb 12 18:07:05 2019 - [info] Waiting to execute all relay logs on master(192.168.238.110:3306).. Tue Feb 12 18:07:05 2019 - [info] master_pos_wait(mysql-bin.000054:27165) completed on master(192.168.238.110:3306). Executed 0 events. Tue Feb 12 18:07:05 2019 - [info] done. Tue Feb 12 18:07:05 2019 - [info] Getting new master's binlog name and position.. Tue Feb 12 18:07:05 2019 - [info] mysql-bin.000020:26445 Tue Feb 12 18:07:05 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='master or 192.168.238.110', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Feb 12 18:07:05 2019 - [info] Executing master ip online change script to allow write on the new master: Tue Feb 12 18:07:05 2019 - [info] /etc/masterha/master_ip_online_change --command=start --orig_master_host=slave1 --orig_master_ip=192.168.238.111 --orig_master_port=3306 --orig_master_user='root' --new_master_host=master --new_master_ip=192.168.238.110 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx *************************************************************** Enabling the VIP - 192.168.238.200/24 on new master: master *************************************************************** Tue Feb 12 18:07:05 2019 - [info] ok. Tue Feb 12 18:07:05 2019 - [info] Setting read_only=0 on master(192.168.238.110:3306).. Tue Feb 12 18:07:05 2019 - [info] ok. Tue Feb 12 18:07:05 2019 - [info] Tue Feb 12 18:07:05 2019 - [info] * Switching slaves in parallel.. Tue Feb 12 18:07:05 2019 - [info] Tue Feb 12 18:07:05 2019 - [info] -- Slave switch on host slave2(192.168.238.112:3306) started, pid: 6814 Tue Feb 12 18:07:05 2019 - [info] Tue Feb 12 18:07:06 2019 - [info] Log messages from slave2 ... Tue Feb 12 18:07:06 2019 - [info] Tue Feb 12 18:07:05 2019 - [info] Waiting to execute all relay logs on slave2(192.168.238.112:3306).. Tue Feb 12 18:07:05 2019 - [info] master_pos_wait(mysql-bin.000054:27165) completed on slave2(192.168.238.112:3306). Executed 0 events. Tue Feb 12 18:07:05 2019 - [info] done. Tue Feb 12 18:07:05 2019 - [info] Resetting slave slave2(192.168.238.112:3306) and starting replication from the new master master(192.168.238.110:3306).. Tue Feb 12 18:07:06 2019 - [info] Executed CHANGE MASTER. Tue Feb 12 18:07:06 2019 - [info] Slave started. Tue Feb 12 18:07:06 2019 - [info] End of log messages from slave2 ... Tue Feb 12 18:07:06 2019 - [info] Tue Feb 12 18:07:06 2019 - [info] -- Slave switch on host slave2(192.168.238.112:3306) succeeded. Tue Feb 12 18:07:06 2019 - [info] Unlocking all tables on the orig master: Tue Feb 12 18:07:06 2019 - [info] Executing UNLOCK TABLES.. Tue Feb 12 18:07:06 2019 - [info] ok. Tue Feb 12 18:07:06 2019 - [info] Starting orig master as a new slave.. Tue Feb 12 18:07:06 2019 - [info] Resetting slave slave1(192.168.238.111:3306) and starting replication from the new master master(192.168.238.110:3306).. Tue Feb 12 18:07:06 2019 - [info] Executed CHANGE MASTER. Tue Feb 12 18:07:06 2019 - [info] Slave started. Tue Feb 12 18:07:06 2019 - [info] All new slave servers switched successfully. Tue Feb 12 18:07:06 2019 - [info] Tue Feb 12 18:07:06 2019 - [info] * Phase 5: New master cleanup phase.. Tue Feb 12 18:07:06 2019 - [info] Tue Feb 12 18:07:06 2019 - [info] master: Resetting slave info succeeded. Tue Feb 12 18:07:06 2019 - [info] Switching master to master(192.168.238.110:3306) completed successfully.
已经成功完成在线手动切换。
9.2 检查ProxySQL中HG情况
通过管理端口6032登入ProxySQL数据库
shell> export MYSQL_PS1="(\u@\h:\p) [\d]> " shell> mysql -uadmin -padmin -h127.0.0.1 -P6032
观察各HG组情况,发现master节点已经成为主节点了
mysql> select hostgroup_id,hostname,port,status,weight from runtime_mysql_servers; +--------------+-----------------+------+--------+--------+ | hostgroup_id | hostname | port | status | weight | +--------------+-----------------+------+--------+--------+ | 100 | 192.168.238.110 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.111 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.112 | 3306 | ONLINE | 1 | | 1000 | 192.168.238.110 | 3306 | ONLINE | 1 | +--------------+-----------------+------+--------+--------+
连接测试正常
[root@node1 ~]# mysql -uproxysql -pproxysql -h192.168.238.200 -P6033 -e"show global variables like 'hostname'" Warning: Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | hostname | master | +---------------+--------+ [root@node1 ~]# mysql -uproxysql -pproxysql -h192.168.238.200 -P6033 -e"show global variables like 'hostname'" Warning: Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | hostname | master | +---------------+--------+ [root@node1 ~]# mysql -uproxysql -pproxysql -h192.168.238.200 -P6033 -e"show global variables like 'hostname'" Warning: Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | hostname | master | +---------------+--------+
9
.3 模拟slave1节点宕机
通过关闭slave1节点keepalived服务来实现
[root@slave1 ~]# /etc/init.d/keepalived stop Stopping keepalived: [ OK ] [root@slave1 ~]# tail -f /var/log/messages Feb 12 18:16:36 slave1 Keepalived[10800]: Stopping Keepalived v1.2.13 (03/19,2015) Feb 12 18:16:36 slave1 Keepalived_vrrp[10803]: VRRP_Instance(VI_1) sending 0 priority Feb 12 18:16:36 slave1 Keepalived_vrrp[10803]: VRRP_Instance(VI_1) removing protocol VIPs. Feb 12 18:16:36 slave1 Keepalived_healthcheckers[10802]: Netlink reflector reports IP 192.168.238.201 removed [root@master ~]# tail -f /var/log/messages Feb 12 18:16:36 master Keepalived_vrrp[11347]: VRRP_Instance(VI_1) Transition to MASTER STATE Feb 12 18:16:37 master Keepalived_vrrp[11347]: VRRP_Instance(VI_1) Entering MASTER STATE Feb 12 18:16:37 master Keepalived_vrrp[11347]: VRRP_Instance(VI_1) setting protocol VIPs. Feb 12 18:16:37 master Keepalived_vrrp[11347]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.238.201 Feb 12 18:16:37 master Keepalived_healthcheckers[11346]: Netlink reflector reports IP 192.168.238.201 added Feb 12 18:16:42 master Keepalived_vrrp[11347]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.238.201 [root@master ~]# ip a|grep eth0 2: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.238.110/24 brd 192.168.238.255 scope global eth0 inet 192.168.238.201/32 scope global eth0 inet 192.168.238.200/24 brd 192.168.238.255 scope global secondary eth0:1
连接测试正常
[root@node1 ~]# mysql -uproxysql -pproxysql -h192.168.238.201 -P6033 -e"show global variables like 'hostname'" Warning: Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | hostname | master | +---------------+--------+ [root@node1 ~]# mysql -uproxysql -pproxysql -h192.168.238.201 -P6033 -e"show global variables like 'hostname'" Warning: Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | hostname | master | +---------------+--------+ [root@node1 ~]# mysql -uproxysql -pproxysql -h192.168.238.201 -P6033 -e"show global variables like 'hostname'" Warning: Using a password on the command line interface can be insecure. +---------------+--------+ | Variable_name | Value | +---------------+--------+ | hostname | master | +---------------+--------+
至此,MySQL基于MHA+Keepalived+ProxySQL高可用架构方案部署测试已经成功。