canal是阿里巴巴mysql数据库binlog的增量订阅&消费组件。
应用场景
canal-python 作为Canal的客户端,其应用场景就是Canal的应用场景。关于应用场景在Canal介绍一节已有概述。举一些实际的使用例子:
1.代替使用轮询数据库方式来监控数据库变更,有效改善轮询耗费数据库资源。
2.根据数据库的变更实时更新搜索引擎,比如电商场景下商品信息发生变更,实时同步到商品搜索引擎 Elasticsearch、solr等
3.根据数据库的变更实时更新缓存,比如电商场景下商品价格、库存发生变更实时同步到redis
4.数据库异地备份、数据同步
5.根据数据库变更触发某种业务,比如电商场景下,创建订单超过xx时间未支付被自动取消,我们获取到这条订单数据的状态变更即可向用户推送消息。
6.将数据库变更整理成自己的数据格式发送到kafka等消息队列,供消息队列的消费者进行消费。
安装Canal
tar -zxvf canal.deployer-1.1.4.tar.gz
[root@mdb01 canal]# ll
total 4
drwxr-xr-x 2 root root 93 Jul 19 15:18 bin
drwxr-xr-x 5 root root 123 Jul 19 14:25 conf
drwxr-xr-x 2 root root 4096 Jul 19 14:25 lib
drwxrwxrwx 4 root root 34 Jul 19 14:29 logs
配置文件在conf/example/instance.properties
[root@mdb01 example]# ll
total 176
-rw-r--r-- 1 root root 172032 Jul 19 15:19 h2.mv.db
-rwxrwxrwx 1 root root 2041 Jul 19 14:34 instance.properties
-rw-r--r-- 1 root root 342 Jul 19 21:11 meta.dat
[root@mdb01 example]# more instance.properties |grep -v '^#'
canal.instance.gtidon=false
canal.instance.master.address=192.168.61.16:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=
canal.instance.tsdb.enable=true
canal.instance.dbUsername=canal
canal.instance.dbPassword=oracle
canal.instance.connectionCharset = UTF-8
canal.instance.enableDruid=false
canal.instance.filter.regex=.*\\..*
canal.instance.filter.black.regex=
canal.mq.topic=example
canal.mq.partition=0
创建数据库用户
CREATE USER canal IDENTIFIED BY 'oracle';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;
启停脚本在bin目录下,sh restart.sh
[root@mdb01 canal]# cd bin
[root@mdb01 bin]# ll
total 20
-rw-r--r-- 1 root root 7 Jul 19 15:18 canal.pid
-rwxr-xr-x 1 root root 58 Sep 2 2019 restart.sh
-rwxr-xr-x 1 root root 1181 Sep 2 2019 startup.bat
-rwxr-xr-x 1 root root 3167 Sep 2 2019 startup.sh
-rwxr-xr-x 1 root root 1356 Sep 2 2019 stop.sh
日志在logs/example目录下。
[root@mdb01 example]# pwd
/u01/canal/logs/example
[root@mdb01 example]# ll
total 28
-rw-r--r-- 1 root root 21582 Jul 19 21:11 example.log
-rw-r--r-- 1 root root 2090 Jul 19 21:11 meta.log
成功启动后日志输出:
2020-07-19 15:18:18.596 [main] INFO c.a.otter.canal.instance.spring.CanalInstanceWithSpring - start CannalInstance for 1-example
2020-07-19 15:18:18.605 [main] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table filter : ^.*\..*$
2020-07-19 15:18:18.606 [main] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter :
2020-07-19 15:18:18.616 [main] INFO c.a.otter.canal.instance.core.AbstractCanalInstance - start successful....
2020-07-19 15:18:18.842 [destination = example , address = /192.168.61.16:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - --->
begin to find start position, it will be long time for reset or first position
2020-07-19 15:18:18.842 [destination = example , address = /192.168.61.16:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prep
are to find start position just show master status
2020-07-19 15:18:37.804 [destination = example , address = /192.168.61.16:3306 , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - --->
find start position successfully, EntryPosition[included=false,journalName=mysql-bin.000019,position=4,serverId=1573854809,gtid=,timestamp=1595139828000]
cost : 18915ms , the next step is binlog dump
Canal Python客户端
canal-python 是 Canal 的 python 客户端,它与 Canal 是采用的Socket来进行通信的,传输协议是TCP,交互协议采用的是 Google Protocol Buffer 3.0。github地址:
https://github.com/haozi3156666/canal-python
github中的给出的例子是不对的,不能正确显示出update的前值。下面是修正过的:
import time
from canal.client import Client
from canal.protocol import EntryProtocol_pb2
from canal.protocol import CanalProtocol_pb2
client = Client()
client.connect(host='127.0.0.1', port=11111)
client.check_valid(username=b'root', password=b'oracle')
client.subscribe(client_id=b'1001', destination=b'example', filter=b'.*\\..*')
while True:
message = client.get(100)
entries = message['entries']
for entry in entries:
entry_type = entry.entryType
if entry_type in [EntryProtocol_pb2.EntryType.TRANSACTIONBEGIN, EntryProtocol_pb2.EntryType.TRANSACTIONEND]:
continue
row_change = EntryProtocol_pb2.RowChange()
row_change.MergeFromString(entry.storeValue)
event_type = row_change.eventType
header = entry.header
database = header.schemaName
table = header.tableName
event_type = header.eventType
for row in row_change.rowDatas:
format_data = dict()
if event_type == EntryProtocol_pb2.EventType.DELETE:
for column in row.beforeColumns:
format_data = {
column.name: column.value
}
elif event_type == EntryProtocol_pb2.EventType.INSERT:
for column in row.afterColumns:
format_data = {
column.name: column.value
}
else:
#format_data['before'] = format_data['after'] = dict()
format_data['before'] = dict()
format_data['after'] = dict()
for column in row.beforeColumns:
format_data['before'][column.name] = column.value
for column in row.afterColumns:
format_data['after'][column.name] = column.value
data = dict(
db=database,
table=table,
event_type=event_type,
data=format_data,
)
print(data)
time.sleep(1)
client.disconnect()
操作数据库
mysql> insert into t1 select 1;
Query OK, 1 row affected (0.02 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> update t1 set a=2 where a=1;
Query OK, 1 row affected (0.17 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> delete from t1 where a=2;
Query OK, 1 row affected (0.02 sec)
输出:
connected to 127.0.0.1:11111
Auth succed
Subscribe succed
{'db': 'ming', 'table': 't1', 'event_type': 1, 'data': {'a': '1'}}
{'db': 'ming', 'table': 't1', 'event_type': 2, 'data': {'before': {'a': '1'}, 'after': {'a': '2'}}}
{'db': 'ming', 'table': 't1', 'event_type': 3, 'data': {'a': '2'}}