最近在学习PostgreSQL数据库高可用架构repmgr高可用集群,发现网上完整的参考资料不是很多,比较零碎,于是经过在虚拟机尝试测试,分享下个人完整的repmgr集群的搭建过程,步骤包含简单的系统参数配置,pg数据库安装,repmgr集群安装配置以及keepalived安装配置,因为搭建步骤个人理解和查阅资料,可能存在些许错误,请指正。
搭建过程主要分为主要以下几步:
1. 系统参数配置/环境准备
2. PG源代码安装配置
3. repmgr源码安装配置
repmgr基本集群配置
repmgrd守护进程配置
witness监控配置
4. keepalived源码安装配置
keepalived软件安装
5. 自动切换脚本部署
环境准备
操作系统:
-
Red Hat Enterprise Linux Server release 7.8 (Maipo)
地址分配:
-
192.168.56.201 pg1 192.168.56.202 pg2 192.168.56.203 pg3 192.168.56.200 pg4
软件版本:
-
postgresql 10.14 repmgr 5.1.0 keepalived 2.0.20
1. 系统环境准备
--所有节点配置
1.1 添加/etc/hosts
192.168.56.201 pg1 192.168.56.202 pg2 192.168.56.203 pg3 192.168.56.200 pg4
1.2 关闭并禁用防火墙
systemctl stop firewalld systemctl disabled firewalld
1.3 禁用/etc/selinux/config
SELINUX=disabled set enforce 0
1.4 创建postgres用户
useradd postgres echo postgres|passwrd postgres --stdin
1.5 创建postgres互信
--sshUserSetup.sh脚本为Oracle数据库软件提供,可以网上下载,或者采用传统方式创建互信
./sshUserSetup.sh -user postgres -hosts "pg1 pg2 pg3 pg4" -confirmno -noPromptPassphrase -advanced
1.6 软件依赖包安装
--需要配置yum,系统ISO镜像配置本地yum源即可
yum install -y gcc systemd-devel readline-devel zlib-devel flex autoconfautomake libtool libnl3-devel openssl-devel
1.7 创建软件安装目录
pvcreate /dev/sdb vgcreate vg_db /dev/sdb lvcreate -L +10g -n lv_pg vg_db mkfs.xfs /dev/mapper/vg_db-lv_pg mkdir /PostgreSQL mount /dev/mapper/vg_db-lv_pg /PostgreSQL --添加/etc/fstab mkdir /PostgreSQL/data chmod 700 /PostgreSQL/data chown -R postgres:postgres /PostgreSQL
2. postgresql数据库配置
--所有节点配置
2.1 数据库软件安装
su - root tar -zxvf postgresql-10.14.tar.gz cd postgresql-10.14 ./configure --prefix=/PostgreSQL --with-systemd make world -j8 make install-world -j8
2.2 配置postgres用户环境变量~/.bash_profile
su - postgres export PGUSER=postgres export PGPORT=5432 export PGDATA=/PostgreSQL/data export PGHOME=/PostgreSQL export PATH=$PGHOME/bin:$PATH export PGPASSFILE=~postgres/.pgpass # repmgr password file
2.3 配置访问控制pg_hba.conf
host all all 192.168.56.1/24 trust local replication repmgr trust host replication repmgr 127.0.0.1/32 trust host replication repmgr 192.168.56.0/24 trust local repmgr repmgr trust host repmgr repmgr 127.0.0.1/32 trust host repmgr repmgr 192.168.56.0/24 trust
2.4 修改配置文件postgresql.conf
listen_addresses = '*' port = 5432 max_connections = 100 #最大链接数 wal_level = replica #wal日志级别决定了有多少信息写入wal,或者logical wal_log_hints = on #预写日志命中,日志修改的时候预写日志进行记录,同步必须开on 维护系统的命令也记录 archive_mode = on #开启归档模式 流复制必须开 archive_command = '/bin/true' #默认填写 9.5以后淘汰了 max_wal_senders = 10 #发送流复制的进程数 max_replication_slots = 10 hot_standby = on #开启热备 shared_preload_libraries = 'repmgr' #安装的插件 log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_truncate_on_rotation = 'on' log_rotation_age = '1d' log_rotation_size = 0
2.5 启动数据库
su - postgres pg_ctl start -D $PGDATA -l /tmp/logfile
2.6 创建repmgr用户和数据库
su - postgres createuser -s repmgr createdb -O repmgr repmgr psql -c 'ALTER USER repmgr SET search_path TO repmgr,"$user",public;'
2.7用户连接测试
su - postgres psql 'host=pg1 user=repmgr dbname=repmgr connect_timeout=2' psql 'host=pg2 user=repmgr dbname=repmgr connect_timeout=2' psql 'host=pg3 user=repmgr dbname=repmgr connect_timeout=2' psql 'host=pg4 user=repmgr dbname=repmgr connect_timeout=2'
3. repmgr集群配置
--所有节点配置
3.1 repmgr软件安装
su - root tar -zxvf repmgr-5.1.0.tar.gz export PATH=/PostgreSQL/bin:$PATH cd repmgr-5.1.0 ./configure make && make install
3.2 修改repmgr.conf
cp repmgr-5.1.0/repmgr.conf.sample /etc/repmgr.conf vi /etc/repmgr.conf # node_id / node_name / conninfo 每个节点不一样 # 本地节点连接配置 node_id=1 node_name='pg1' conninfo='host=pg1 dbname=repmgr user=repmgr connect_timeout=2' data_directory='/PostgreSQL/data' replication_user='repmgr' # 使用复制槽,防止日志覆写或丢失 use_replication_slots=yes # 启用witness使用 witness_sync_interval=15 primary_visibility_consensus=true # repmgr日志设置 log_level='INFO' log_file='/PostgreSQL/data/repmgrd.log' log_status_interval=30 # 执行文件路径 pg_bindir='/PostgreSQL/bin' repmgr_bindir='/PostgreSQL/bin' # 密码文件位置,前面配置了PGPASSFILE环境变量 passfile='/home/postgres/.pgpass' ssh_options='-q -o ConnectTimeout=10' # repmgrd自动监控配置 monitoring_history=yes monitor_interval_secs=2 # 自动切换配置 repmgrd_service_start_command = 'repmgrd --daemonize=true' repmgrd_service_stop_command = 'kill `cat /PostgreSQL/repmgrd.pid`' repmgrd_pid_file='/PostgreSQL/repmgrd.pid' failover='automatic' reconnect_attempts=4 reconnect_interval=10 promote_command='/PostgreSQL/data/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file' follow_command='/PostgreSQL/data/bin/repmgr standby follow -f /etc/repmgr.conf --upstream-node-id=%n --log-to-file' service_start_command = '/PostgreSQL/bin/pg_ctl -w -D /PostgreSQL/data start' service_stop_command = '/PostgreSQL/bin/pg_ctl -D /PostgreSQL/data -W -m fast stop' service_restart_command = '/PostgreSQL/bin/pg_ctl -w -D /PostgreSQL/data restart' service_reload_command = '/PostgreSQL/bin/pg_ctl -w -D /PostgreSQL/data reload' service_promote_command = '/PostgreSQL/bin/pg_ctl -w -D /PostgreSQL/data promote
3.3 注册主节点
--节点一操作,本处定义pg1为primary [postgres@pg1 ~]$ repmgr primary register [postgres@pg1 ~]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------ 1 | pg1 | primary | * running | | default | 100 | 1 | host=pg1 user=repmgr dbname=repmgr connect_timeout=2
3.4 克隆并注册备用节点
--预注册 [postgres@pg2 ~]$ repmgr -h pg1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --dry-run .......... INFO:all prerequisites for "standby clone" are met --注册 [postgres@pg2 ~]$ repmgr -h pg1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone NOTICE:you can now start your PostgreSQL server HINT:for example: pg_ctl -D /PostgreSQL/data start HINT:after starting the server, you need to register this standby with "repmgr standby register" [postgres@pg2 ~]$ pg_ctl -D /PostgreSQL/data start -l /tmp/logfile [postgres@pg2 ~]$ repmgr standby register -- pg3 重复以上操作 --查看集群状态 [postgres@pg1 ~]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------ 1 | pg1 | primary | * running | | default | 100 | 1 | host=pg1 user=repmgr dbname=repmgr connect_timeout=2 2 | pg2 | standby | running | pg1 | default | 100 | 1 | host=pg2 user=repmgr dbname=repmgr connect_timeout=2 3 | pg3 | standby | running | pg1 | default | 100 | 1 | host=pg3 user=repmgr dbname=repmgr connect_timeout=2
3.5 克隆并注册witness节点
--postgresql.conf,pg_hba.conf,repmgr.conf文件已经配置好,数据库前面已经启动,直接注册witness节点即可 [postgres@pg4 ~]$ repmgr witness register -h pg1 -U repmgr INFO:connecting to witness node "pg4" (ID: 4) INFO:connecting to primary node NOTICE:attempting to install extension "repmgr" NOTICE:"repmgr" extension successfully installed INFO: witness registration complete NOTICE:witness node "pg4" (ID: 4) successfully registered --查看集群状态 [postgres@pg1 ~]$ repmgr service status --detail ID | Name | Role | Status | Upstream | Location | Priority | repmgrd | PID | Paused? | Upstream last seen ----+------+---------+-----------+----------+----------+----------+-------------+-------+---------+-------------------- 1 | pg1 | primary | * running | | default | 100 | not running | 11451 | no | n/a 2 | pg2 | standby | running | pg1 | default | 100 | not running | 2546 | no | 1 second(s) ago 3 | pg3 | standby | running | pg1 | default | 100 | not running | 16018 | no | 0 second(s) ago 4 | pg4 | witness | * running | pg1 | default | 0 | not running | 26061 | no | 0 second(s) ago
3.6 启动repmgrd守护节点
--因为前面已经配置了repmgrd_service_start_command / repmgrd_service_stop_command参数 [postgres@pg1 ~]$ repmgr daemon start [postgres@pg2 ~]$ repmgr daemon start [postgres@pg3 ~]$ repmgr daemon start [postgres@pg4 ~]$ repmgr daemon start --查看集群状态 [postgres@pg1 ~]$repmgr service status --detail ID | Name | Role | Status | Upstream | Location | Priority | repmgrd | PID | Paused? | Upstream last seen ----+------+---------+-----------+----------+----------+----------+---------+-------+---------+-------------------- 1 | pg1 | primary | * running | | default | 100 | running | 11451 | no | n/a 2 | pg2 | standby | running | pg1 | default | 100 | running | 2546 | no | 1 second(s) ago 3 | pg3 | standby | running | pg1 | default | 100 | running | 16018 | no | 0 second(s) ago 4 | pg4 | witness | * running | pg1 | default | 0 | running | 26061 | no | 0 second(s) ago --节点间连通性检测 [postgres@pg1 ~]$ repmgr cluster crosscheck INFO: connecting to database Name | Id | 1 | 2 | 3 | 4 -----+----+----+----+----+---- pg1 | 1 | * | * | * | * pg2 | 2 | * | * | * | * pg3 | 3 | * | * | * | * pg4 | 4 | * | * | * | *
4 keepalived配置
4.1 keepalived软件安装(pg1/pg2/pg3)
su - root tar -zxvf keepalived-2.0.20.tar.gz cd keepalived-2.0.20 ./autogen.sh ./configure make -j4 make install
4.2 修改keepalived.conf(pg1/pg2/pg3)
su - root mkdir /etc/keepalived cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf vi /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { acassen@firewall.loc failover@firewall.loc sysadmin@firewall.loc } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 127.0.0.1 --根据实际修改 smtp_connect_timeout 30 router_id LVS_DEVEL vrrp_skip_check_adv_addr vrrp_strict --注释掉,可能导致ip不能ping通 vrrp_garp_interval 0.1 --防止日志告警,随便修改 vrrp_gna_interval 0.1 --防止日志告警,随便修改 } vrrp_instance VI_1 { state MASTER interface enp0s3 --绑定虚拟ip的网卡 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.204 --需要绑定的ip地址 192.168.56.205 --需要绑定的ip地址 192.168.56.206 --需要绑定的ip地址 } }
4.3 启动keepalived服务(pg1/pg2/pg3)
su - root systemctl start keepalived systemctl enable keepalived systemctl is-enabled keepalived --查看ip挂载状态 [postgres@pg1 tmp]$ifconfig -a ...... enp0s3:1: flags=4163mtu 1500 inet 192.168.56.204 netmask 255.255.255.0 broadcast 0.0.0.0 ether 08:00:27:4f:a5:a9 txqueuelen 1000 (Ethernet)
5 配置keepalived切换脚本
5.1 systemctl调用权限设置(pg1/pg2/pg3)
[postgres@pg1 ~]$visudo ## Allow root to run any commands anywhere root ALL=(ALL) ALL postgres ALL=(ALL) NOPASSWD: /usr/bin/systemctl
5.2 创建切换脚本守护程序(pg1/pg2/pg3)
--通过计划任务(后面添加)每分钟运行一次,确认keepalived_repmgr.sh切换脚本在运行 --脚本前面定义了1个变量,定义了脚本存放位置了日志存放位置 cd ~postgres vi keepalived_daemon.sh #!/bin/sh ####################################################################################### # Author: sync_tb # Created: 2020/09/01 # ToDo: # Daemon for monitoring keepalived failover scripts ####################################################################################### SCR_DIR='/home/postgres' KEEP_MGR=`ps -ef|grep 'keepalived_repmgr.sh'|grep -v grep|wc -l` if [[ ${KEEP_MGR} -lt 1 ]]; then nohup /bin/sh ${SCR_DIR}/keepalived_repmgr.sh >> ${SCR_DIR}/keepalived_repmgr.log & fi
5.3 创建切换脚本(pg1/pg2/pg3)
--通过扫描repmgr集群节点确认primary节点,然后通过修改keepalived权重(priority)参数控制,利用IP抢占模式 把IP漂移到repmgr primary节点上面,整个切换过程需要一定时间,大概10s左右,可以根据实际进行调整,不宜太短, 时间太短可能导致IP漂移没有完成的情况下,开始下一次检测然后就会使切换脚本处于一个循环修改priority的情况中, 从而导致VIP在三个节点间来回切换。 --脚本前面定义了6个变量,可以通过NODES参数控制keepalived只在某几个节点上面漂移,此时需要确报repmgr primary 也要在这几个节点上面。 cd ~postgres vi keepalived_repmgr.sh #!/bin/sh ####################################################################################### # Author: sync_tb # Created: 2020/09/01 # ToDo: # Created for monitoring keepalived for PostgreSQL repmgr cluster # When production shutdown with errors,it needs about 10s to failover. ####################################################################################### NODES='pg1 pg2 pg3' # 集群节点 VIP='192.168.56.204' # VIP,多个请用管道隔开(|) PGHOME=/PostgreSQL # $PGHOME REPMGR_CONF='/etc/repmgr.conf' # repmgr配置文件 KEEP_CONF='/etc/keepalived/keepalived.conf' # keepalived配置文件 # state time function to write logfiles function log_time(){ date '+%Y-%m-%d %H:%M:%S ' } echo "$(log_time) Repmgr and KeepAlived auto-failover monitoring start..." while true do for NODE in ${NODES} do KEEP_STA=`ssh $NODE "/usr/sbin/ip a|grep -w ${VIP}|wc -l"` if [[ ${KEEP_STA} -eq 1 ]]; then KEEP_MASTER=$NODE echo "$(log_time) KeepAlived Master is ${KEEP_MASTER}." fi done for NODE in ${NODES} do MGR_STA=`ssh $NODE "$PGHOME/bin/pg_controldata|grep 'in production'|wc -l"` if [[ ${MGR_STA} -ne 1 ]]; then continue else MGR_PRIMARY=$NODE echo "$(log_time) Repmgr Cluster Primary is ${MGR_PRIMARY}." break fi done sleep 1s if [[ ${KEEP_MASTER} == ${MGR_PRIMARY} ]]; then echo "$(log_time) High Available Architecture running normally." else echo "$(log_time) KeepAlived Master is ${KEEP_MASTER}, but Repmgr Primary is ${MGR_PRIMARY}." for NODE in ${NODES} do case $NODE in ${MGR_PRIMARY} ) ssh ${NODE} "sed -i 's/priority .*/priority 100/g' ${KEEP_CONF}" ssh ${NODE} "sudo systemctl restart keepalived" echo "$(log_time) Modify KeepAlived priority for ${NODE} to 100." ;; ${KEEP_MASTER} ) ssh ${NODE} "sed -i 's/priority .*/priority 60/g' ${KEEP_CONF}" ssh ${NODE} "sudo systemctl restart keepalived" echo "$(log_time) Modify KeepAlived priority for ${NODE} to 60." ;; * ) ssh ${NODE} "sed -i 's/priority .*/priority 80/g' ${KEEP_CONF}" ssh ${NODE} "sudo systemctl restart keepalived" echo "$(log_time) Modify KeepAlived priority for ${NODE} to 80." ;; esac done SECS=10 # 切换等待时长,单位秒 while [[ ${SECS} -ge 0 ]] do echo "$(log_time) Waiting for failover to target: ${SECS}s" let SECS-- if [[ ${SECS} -eq 0 ]]; then break fi sleep 1s done echo "$(log_time) Failover KeepAlived to ${MGR_PRIMARY} completed." fi sleep 2s done
5.4 添加计划任务(pg1/pg2/pg3)
[postgres@pg1 ~]$crontab -l * * * * * sh /home/postgres/keepalived_daemon.sh --日志预览 [postgres@pg1 ~]$tail -f ~/keepalived_repmgr.log 2020-09-02 04:34:50 High Available Architecture running normally. 2020-09-02 04:34:52 KeepAlived Master is pg1. 2020-09-02 04:34:52 Repmgr Cluster Primary is pg1. 2020-09-02 04:34:53 High Available Architecture running normally. 2020-09-01 22:34:36 KeepAlived Master is pg3. 2020-09-01 22:34:36 Repmgr Cluster Primary is pg2. 2020-09-01 22:34:37 KeepAlived Master is pg3, but Repmgr Primary is pg2. 2020-09-01 22:34:38 Modify KeepAlived priority for pg1 to 80. 2020-09-01 22:34:40 Modify KeepAlived priority for pg2 to 100. 2020-09-01 22:34:41 Modify KeepAlived priority for pg3 to 60. 2020-09-01 22:34:41 Waiting for failover to target: 5s 2020-09-01 22:34:42 Waiting for failover to target: 4s 2020-09-01 22:34:43 Waiting for failover to target: 3s 2020-09-01 22:34:44 Waiting for failover to target: 2s 2020-09-01 22:34:45 Waiting for failover to target: 1s 2020-09-01 22:34:45 Failover KeepAlived to pg2 completed. 2020-09-01 22:34:48 KeepAlived Master is pg2. 2020-09-01 22:34:48 Repmgr Cluster Primary is pg2. 2020-09-01 22:34:49 High Available Architecture running normally.
结束