一、orchestrator自身服务高可用方案简介：

方案一：shard backend模式orchestrator集群

方式一：Galera/XtraDB Cluster/InnoDB Cluster 使用单写模式运行。多个orchestrator节点可能通过代理与写入节点进行通信。如果写节点发生故障，后端集群将提升另一个数据库作为写节点；由您的代理来识别并将orchestrator的流量引导新的写节点。（提供代理或者vip，三个orc节点都连接同一个地址的数据库）

方式二：Galera/XtraDB Cluster/InnoDB Cluster 在多写模式下运行。一个好的设置建议是将每个orchestrator节点与数据库服务器部署在同一服务器上。由于复制是同步的，因此不存在裂脑。只有一个orchestrator节点可以成为领导者，并且该领导者只会与数据库节点达成共识。

该方案心跳机制：

shard backend模式orchestrator集群中只有leader的orc节点会给其管理的MySQL集群的各个节点发送心跳；

方案二：采用raft机制保证orc高可用；

借助orc的raft机制部署多个orc的节点，每个ocr都有单独的MySQL后端数据库，这个时候后端数据库不要求是高可用的，因为ocr本身是高可用的，每个orc节点都有自己的后端数据库，后端数据库不需要保证一致性，因为每个orc节点都给其管理的MySQL集群心跳探测，心跳探测的结果信息记录到各自的后端MySQL数据库中。

该方案心跳机制：

如上图所示, 三个orchestrator 组成一个raft cluster, 每个orchestrator 节点使用自己的专用数据库(MySQL或SQLite)

1）orchestrator 节点之间会进行通信.

2）只有一个orchestrator 节点会成为leader.

3）所有orchestrator节点探测整个MySQL集群成员. 每个MySQL server都被每个raft成员探测.

使用 orchestrator 的 raft 算法实现 orchestrator 的高可用，只有leader才能进行MySQL集群故障转移。所有orc节点都运行发现（探测）和自我分析，具体的原理后面章节分析；

二、采用raft机制保证orc高可用环境搭建：

架构信息：

IP地址	部署服务
10.1.92.189	mysql+orc
10.1.92.190	mysql+orc
10.1.92.191	mysql+orc

1、下载安装包

https://github.com/openark/orchestrator/releases

2、安装orc集群：

[root@middleware-1 /data/orchestrator]# yum localinstall orchestrator-3.2.6-1.x86_64.rpm

3、在后端MySQL上创建账号给orchestrator服务使用：

1）10.19.92.189MySQL上创建账号

CREATE USER 'orchestrator'@'10.1.92.189' IDENTIFIED BY 'Orchliu#123';

GRANT ALL PRIVILEGES ON orchestrator.* TO 'orchestrator'@'10.19.92.189';

1）10.19.92.190MySQL上创建账号

CREATE USER 'orchestrator'@'10.1.92.190' IDENTIFIED BY 'Orchliu#123';

GRANT ALL PRIVILEGES ON orchestrator.* TO 'orchestrator'@'10.19.92.190';

1）10.19.92.191MySQL上创建账号

CREATE USER 'orchestrator'@'10.1.92.191' IDENTIFIED BY 'Orchliu#123';

GRANT ALL PRIVILEGES ON orchestrator.* TO 'orchestrator'@'10.19.92.191';

4、配置orchestrator(三个节点都操作)

1）创建orc需要的目录：

cd /data/orchestrator/orc

mkdir -p {log,conf,raftdata}

2）把orc自带的默认配置文件复制到指定的目录：

[root@middleware-1 /data/orchestrator/orc]# cd /usr/local/orchestrator/

[root@middleware-1 /usr/local/orchestrator]# cp orchestrator-sample.conf.json /data/orchestrator/orc/conf/orchestrator.conf.json

3）修改orc参数配置

yum安装的自带的orchestrator-sample.conf.json 文件中参数不全！

添加如下参数：

"RaftEnabled": true,

"RaftDataDir": "/data/orchestrator/orc/raftdata",

"RaftBind": "10.1.92.189", #修改成对应节点

"DefaultRaftPort": 10008,

"RaftNodes": [ "10.1.92.189", "10.1.92.190", "10.1.92.191" ]

修改如下参数：

MySQLTopologyUser：被管理的MySQL账号；

MySQLTopologyPassword：被管理的MySQL密码；

MySQLOrchestratorHost：Orch后端数据库地址；

MySQLOrchestratorPort：Orch后端数据库端口；

MySQLOrchestratorDatabase：Orch后端数据库；

MySQLOrchestratorUser：Orch后端数据库用户；

MySQLOrchestratorPassword：Orch后端数据库密码；

AuthenticationMethod："basic" #建议开启认证方式，如果开启，页面、命令行、API都需要通过用户密码才能访问；

HTTPAuthUser：认证用户；

HTTPAuthPassword：认证密码；

4）启动orc服务（自动创建orchestrator库，并初始化后端服务相关表）

cd /usr/local/orchestrator

nohup ./orchestrator -config /data/orchestrator/orc/conf/orchestrator.conf.json http http >> /data/orchestrator/orc//log/orchestrator.log 2>&1 &

启动放到脚本中：

[root@middleware-1 /data/orchestrator/orc/log]# cat /etc/init.d/orchestrator

#!/bin/bash

#chkconfig: 2345 10 90

#description: orchestrator ....

case "$1" in

start)

cd /usr/local/orchestrator

nohup ./orchestrator -config /data/orchestrator/orc/conf/orchestrator.conf.json http http >> /data/orchestrator/orc//log/orchestrator.log 2>&1 &

echo "orchestrator started"

;;

stop)

ps -ef | grep orchestrator.conf.json | grep -v grep | awk '{print "kill -9 "$2}' | sh

echo "node_exporter stoped"

;;

esac

5）目标管理MySQL集群中创建账号：

grant super, process, replication slave, reload on *.* to 'orc_monitor'@'10.1.92.%' identified by 'Orc#monitor123';

grant select on mysql.slave_master_info to 'orc_monitor'@'10.1.92.%';

6）orchestrator中添加被管理集群：

7）遇到的问题：

1、登录web界面报错：

html/template: "templates/layout" is undefined

遇到问题及解决

由于启动orchstrator时没有cd到go的项目路径，而是直接使用绝对路径启动：

/usr/local/orchestrator/orchestrator –config=/usr/local/orchestrator/orchestrator.conf.json http &

正确启动方式：

cd /usr/local/orchestrator && ./orchestrator –config=./orchestrator.conf.json http &

2、web界面如下报错提醒：

因为从库的gtid比主库多，也就是说你在从库执行过操作，如果确认当前数据是一致的，可以按着如下所示处理这个问题，当然也可以选择忽略这个问题！

1， master 上执行show master status, 查看master已执行的所有gtid；

2，从库执行如下sql,目的是重置从库的gtid_purged列表，使之等于master上的已执行列表；这样就会从该GTID之后开始同步

root@mis-2-218-213 22:20: [(none)]> reset master;

root@mis-2-218-213 22:20: [(none)]> set @@global.gtid_purged='151094de-62a6-11ef-8623-e4434b32c3b8:1-9';

root@mis-2-218-213 22:20: [(none)]> start slave;

刷新界面问题修复：

MySQL高可用方案---orchestrator集群高可用搭建