近期在学习clickhouse数据库,为后期要运维一套核心clickhouse生产数据库做准备。 本次给大家分享下学习clickhouse备份恢复的一些知识给大家分享,本次分享内容有些简陋,待后续深入学习之后再跟大家进一步详细分享。
Part1备份方式及工具
ClickHouse提供了多种备份和恢复机制,接下来将给大家一一介绍。
1将数据源复制到其它地方
如通过Kafka等消息中间件,将数据通过持久队列传递到其它订阅服务器,这些订阅服务器将写入ClickHouse时读取相同的数据流,并将其存储在冷存储中。
2文件系统快照
某些本地文件系统提供快照功能,但这种方式不能提供实时查询的最佳选择。 使用这种文件系统创建额外的副本,并将它们与用于Select 查询的分布式表分离,任何修改数据的查询都无法访问此类副本上的快照。 这种方式需额外增加磁盘,代价比较高。
3ClickHouse-copier
ClickHouse-copier 是一个多功能工具,最初创建它是为了用于重新切分PB大小的表,因为它能够在ClickHouse表和集群之间可靠地复制数据,所以它也可用于备份和还原数据。
对于较小的数据量,一个简单的insert into ... select ... 到远程表也可以。
4Part操作
ClickHouse允许使用alter table ... freeze partition ... 查询以创建表分区的本地副本,是利用硬链接(hardlink)到/var/lib/clickhouse/shadow/文件夹中实现的,它通常不会因为旧数据而占用额外的磁盘空间。创建的文件副本不由ClickHouse服务器处理。
手动执行 alter table ... freeze partition ... freeze是冻结的意思,此方式优点:并不会影响ClickHouse的执行,会将数据备份到一些指定路径下 恢复时使用 aler table ... attach partition ... 从备份中恢复
5ClickHouse -backup
以上工具都属于手工,ClickHouse-backup可以实现自动化。
Part2clickhouse-backup介绍
clickhouse-backup是社区开源的一个clickhouse备份工具,可用于实现数据迁移。其原理是先创建一个备份,然后从备份导入数据,使用该工具可用于常规的异地冷备方案。
可通过二进制或者rpm包方式安装。
1使用限制
ClickHouse 版本不低于 1.1.54390 仅限MergeTree 系列表引擎 不支持备份 Tiered storage 或 storage_policy 云存储最大备份大小为 5TB AWS S3上的parts数最大为10000
2安装部署
本次通过部署二进制包方式部署clickhouse-backup。
1) 下载 clickhouser-backup
wget https://github.com/Altinity/clickhouse-backup/releases/download/v2.5.9/clickhouse-backup-linux-amd64.tar.gz
2) 解压
tar xf clickhouse-backup-linux-amd64.tar.gz -C /root
3) 复制clickhouse-backup命令到系统中
cp /root/clickhouse/build/linux/amd64/clickhouse-backup /usr/local/bin/
4) 创建clickhouse-backup 配置文件目录
mkdir -p /etc/clickhouse-backup
5) 创建并修改配置文件到/etc/clickhouse-backup目录
general:
remote_storage: none
backups_to_keep_local: 7
backups_to_keep_remote: 31
clickhouse:
username: default
password: "clickhouse"
host: localhost
port: 9000
disk_mapping: {}
skip_tables:
- system.*
- INFORMATION_SCHEMA.*
- information_schema.*
- _temporary_and_external_tables.*
timeout: 5m
freeze_by_part: false
freeze_by_part_where: ""
use_embedded_backup_restore: false
embedded_backup_disk: ""
secure: false
skip_verify: false
sync_replicated_tables: false
log_sql_queries: true
restart_command: systemctl restart clickhouse-server.service
ignore_not_exists_error_during_freeze: true
check_replicas_before_attach: false
tls_key:
tls_cert: ""
tls_ca: ""
debug: false
## 查看clickhouse-backup版本
(base) [root@dsmart ~]# clickhouse-backup -v
Version: 2.5.9
Git Commit: 0387d55d340085e4f9f38cee6df4648b84e972ae
Build Date: 2024-05-20
3查看clickhouse-backup默认配置项
可通过如下方式查看clickhouse-backup的默认配置项。
(base) [root@dsmart ~]# clickhouse-backup default-config
general:
remote_storage: none
max_file_size: 0
backups_to_keep_local: 0
backups_to_keep_remote: 0
log_level: info
allow_empty_backups: false
download_concurrency: 16
upload_concurrency: 4
upload_max_bytes_per_second: 0
download_max_bytes_per_second: 0
object_disk_server_side_copy_concurrency: 32
use_resumable_state: true
restore_schema_on_cluster: ""
upload_by_part: true
download_by_part: true
restore_database_mapping: {}
retries_on_failure: 3
retries_pause: 30s
watch_interval: 1h
full_interval: 24h
watch_backup_name_template: shard{shard}-{type}-{time:20060102150405}
sharded_operation_mode: ""
cpu_nice_priority: 15
io_nice_priority: idle
rbac_backup_always: true
rbac_conflict_resolution: recreate
retriesduration: 100ms
watchduration: 1h0m0s
fullduration: 24h0m0s
clickhouse:
username: default
password: ""
host: localhost
port: 9000
disk_mapping: {}
skip_tables:
- system.*
- INFORMATION_SCHEMA.*
- information_schema.*
- _temporary_and_external_tables.*
skip_table_engines: []
timeout: 30m
freeze_by_part: false
freeze_by_part_where: ""
use_embedded_backup_restore: false
embedded_backup_disk: ""
embedded_backup_threads: 0
embedded_restore_threads: 0
backup_mutations: true
restore_as_attach: false
check_parts_columns: true
secure: false
skip_verify: false
sync_replicated_tables: false
log_sql_queries: true
config_dir: /etc/clickhouse-server/
restart_command: exec:systemctl restart clickhouse-server
ignore_not_exists_error_during_freeze: true
check_replicas_before_attach: true
tls_key: ""
tls_cert: ""
tls_ca: ""
max_connections: 16
debug: false
s3:
access_key: ""
secret_key: ""
bucket: ""
endpoint: ""
region: us-east-1
acl: private
assume_role_arn: ""
force_path_style: false
path: ""
object_disk_path: ""
disable_ssl: false
compression_level: 1
compression_format: tar
sse: ""
sse_kms_key_id: ""
sse_customer_algorithm: ""
sse_customer_key: ""
sse_customer_key_md5: ""
sse_kms_encryption_context: ""
disable_cert_verification: false
use_custom_storage_class: false
storage_class: STANDARD
custom_storage_class_map: {}
concurrency: 17
part_size: 0
max_parts_count: 4000
allow_multipart_download: false
object_labels: {}
request_payer: ""
check_sum_algorithm: ""
debug: false
gcs:
credentials_file: ""
credentials_json: ""
credentials_json_encoded: ""
embedded_access_key: ""
embedded_secret_key: ""
skip_credentials: false
bucket: ""
path: ""
object_disk_path: ""
compression_level: 1
compression_format: tar
debug: false
force_http: false
endpoint: ""
storage_class: STANDARD
object_labels: {}
custom_storage_class_map: {}
client_pool_size: 48
chunk_size: 0
cos:
url: ""
timeout: 2m
secret_id: ""
secret_key: ""
path: ""
compression_format: tar
compression_level: 1
debug: false
api:
listen: localhost:7171
enable_metrics: true
enable_pprof: false
username: ""
password: ""
secure: false
certificate_file: ""
private_key_file: ""
ca_cert_file: ""
ca_key_file: ""
create_integration_tables: false
integration_tables_host: ""
allow_parallel: false
complete_resumable_after_restart: true
watch_is_main_process: false
ftp:
address: ""
timeout: 2m
username: ""
password: ""
tls: false
skip_tls_verify: false
path: ""
object_disk_path: ""
compression_format: tar
compression_level: 1
concurrency: 48
debug: false
sftp:
address: ""
port: 22
username: ""
password: ""
key: ""
path: ""
object_disk_path: ""
compression_format: tar
compression_level: 1
concurrency: 48
debug: false
azblob:
endpoint_schema: https
endpoint_suffix: core.windows.net
account_name: ""
account_key: ""
sas: ""
use_managed_identity: false
container: ""
path: ""
object_disk_path: ""
compression_level: 1
compression_format: tar
sse_key: ""
buffer_size: 0
buffer_count: 3
max_parts_count: 256
timeout: 4h
debug: false
custom:
upload_command: ""
download_command: ""
list_command: ""
delete_command: ""
command_timeout: 4h
commandtimeoutduration: 4h0m0s
4查看可备份的表
可通过如下clickhouse-backup tables 方式查看当前数据库有哪些可以备份的表。
-- clickhouse-backup tables
(base) [root@dsmart ~]# clickhouse-backup tables |grep -v info
ckdb.uk_price_paid 245.43MiB default full
default.uk_price_paid 0B default full
Part3备份测试
本次通过从clickhouse官网导入一些测试数据库到本地测试库进行验证测试。
1导入测试数据
1) 创建表
(base) [root@dsmart clickhouse]# clickhouse-client --user default --password clickhouse -d ckdb --multiquery < /root/clickhouse/uk_price_paid.sql
2) 导入测试数据
(base) [root@dsmart clickhouse]# clickhouse-client --user default --password clickhouse -d ckdb --multiquery < /root/clickhouse/insert_uk_price_paid.sql
wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.tgz
(base) [root@dsmart ~]# tar -zxvf 2021_08_01_07_01_17_data.tgz
Dish.csv
Menu.csv
MenuItem.csv
MenuPage.csv
(base) [root@dsmart clickhouse]# clickhouse-client --user default --password clickhouse -d db1 -m
ClickHouse client version 22.3.20.29 (official build).
Connecting to database db1 at localhost:9000 as user default.
Connected to ClickHouse server version 22.3.20 revision 54455.
dsmart :) show tables;
SHOW TABLES
Query id: 5376e39b-8ebd-48f1-8f36-92bca1e104f0
┌─name──────┐
│ dish │
│ menu │
│ menu_item │
│ menu_page │
└───────────┘
4 rows in set. Elapsed: 0.002 sec.
## 可通过下数据导入csv格式数据至数据库表中
clickhouse-client --user default --password clickhouse --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO db1.dish FORMAT CSVWithNames" < /root/Dish.csv
clickhouse-client --user default --password clickhouse --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO db1.menu FORMAT CSVWithNames" < /root/Menu.csv
clickhouse-client --user default --password clickhouse --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO db1.menu_page FORMAT CSVWithNames" < MenuPage.csv
clickhouse-client --user default --password clickhouse --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --date_time_input_format best_effort --query "INSERT INTO db1.menu_item FORMAT CSVWithNames" < /root/MenuItem.csv
## 查看可备份的表
(base) [root@dsmart clickhouse-backup]# clickhouse-backup --config=/etc/clickhouse-backup/config.yml list
ckdb.uk_price_paid_0523 245.44MiB 23/05/2024 03:34:54 local regular
default.uk_price_paid_0523 771B 23/05/2024 06:29:03 local regular
db1 403.41MiB 23/05/2024 07:19:58 local regular
2024-05-23T07-30-16 1.80KiB 23/05/2024 07:30:17 local regular
2024-05-23T13-18-40 50.22MiB 23/05/2024 13:18:40 local regular
replica2_dwd_sal_sales_attendance_bak20230516_56 7.33MiB 16/05/2023 15:36:00 remote tar, regular
replica2_mictable 2.41GiB 16/05/2023 16:01:59 remote tar, regular
replica1_fosp_dim_fo_bak_20230518 4.36KiB 18/05/2023 10:23:13 remote tar, regular
replica1_fosp_kafka_bak_20230518 2.11MiB 18/05/2023 10:40:54 remote tar, regular
replica1_fosp_kafka_bak_20230518_2 2.08MiB 18/05/2023 11:02:19 remote tar, regular
replica2_local_samples2_back20230907 801.29KiB 07/09/2023 15:13:37 remote tar, regular
20240229-10-110-3-56 31.44GiB 29/02/2024 14:32:14 remote tar, regular
2备份数据库
备份后使用upload上传备份文件
(base) [root@dsmart ~]# clickhouse-backup --config=/etc/clickhouse-backup/config.yml create -t db1.* db1_0524
(base) [root@dsmart ~]# clickhouse-backup --config=/etc/clickhouse-backup/config.yml upload db1_0524
3恢复数据库
[root@ivorysqldb ~]# clickhouse-backup --config /etc/clickhouse-backup/config.yml download db1_0524
[root@ivorysqldb ~]# clickhouse-backup --config /etc/clickhouse-backup/config.yml restore db1_0524
4删除本地备份文件
(base) [root@dsmart backup]# clickhouse-backup --config /etc/clickhouse-backup/config.yml delete db1_0524
2024/05/24 10:08:43.838173 error Backup name must be defined
NAME:
clickhouse-backup delete - Delete specific backup
USAGE:
clickhouse-backup delete
OPTIONS:
--config value, -c value Config 'FILE' name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]
--environment-override value, --env value override any environment variable via CLI parameter
(base) [root@dsmart backup]# clickhouse-backup --config /etc/clickhouse-backup/config.yml delete local db1_0524
5删除远程备份文件
(base) [root@dsmart backup]# clickhouse-backup --config /etc/clickhouse-backup/config.yml delete remote db1_0524
6config.yml配置文件
clickhouse-backup备份需要创建一个config.yml配置文件,可在/etc/clickhouse-backup目录下创建该文件,该文件内容如下:
general:
remote_storage: sftp # REMOTE_STORAGE, if `none` then `upload` and `download` command will fail
max_file_size: 1099511627776 # MAX_FILE_SIZE, 1G by default, useless when upload_by_part is true, use for split data parts files
by archives
disable_progress_bar: false # DISABLE_PROGRESS_BAR, show progress bar during upload and download, have sense only when `upload_con
currency` and `download_concurrency` equal 1
backups_to_keep_local: 0 # BACKUPS_TO_KEEP_LOCAL, how much newest local backup should keep, 0 mean all created backups will keep
on local disk
# you shall to run `clickhouse-backup delete local ` command to avoid useless disk space a
llocations
backups_to_keep_remote: 0 # BACKUPS_TO_KEEP_REMOTE, how much newest backup should keep on remote storage, 0 mean all uploaded bac
kups will keep on remote storage.
# if old backup is required for newer incremental backup, then it will don't delete. Be careful with lo
ng incremental backup sequences.
log_level: error # LOG_LEVEL
allow_empty_backups: false # ALLOW_EMPTY_BACKUPS
download_concurrency: 1 # DOWNLOAD_CONCURRENCY, max 255
upload_concurrency: 1 # UPLOAD_CONCURRENCY, max 255
upload_by_part: true # UPLOAD_BY_PART
download_by_part: true # DOWNLOAD_BY_PART
restore_database_mapping: {} # RESTORE_DATABASE_MAPPING, restore rules from backup databases to target databases, which is useful on
change destination database all atomic tables will create with new uuid.
retries_on_failure: 3 # RETRIES_ON_FAILURE, retry if failure during upload or download
retries_pause: 100ms # RETRIES_PAUSE, time duration pause after each download or upload fail
clickhouse:
username: default # CLICKHOUSE_USERNAME
password: "clickhouse" # CLICKHOUSE_PASSWORD
host: localhost # CLICKHOUSE_HOST
port: 9000 # CLICKHOUSE_PORT, don't use 8123, clickhouse-backup doesn't support HTTP protocol
disk_mapping: {} # CLICKHOUSE_DISK_MAPPING, use it if your system.disks on restored servers not the same with system.d
isks on server where backup was created
skip_tables: # CLICKHOUSE_SKIP_TABLES
- system.*
- INFORMATION_SCHEMA.*
- information_schema.*
timeout: 5m # CLICKHOUSE_TIMEOUT
freeze_by_part: false # CLICKHOUSE_FREEZE_BY_PART, allows freeze part by part instead of freeze the whole table
freeze_by_part_where: "" # CLICKHOUSE_FREEZE_BY_PART_WHERE, allows parts filtering during freeze when freeze_by_part: true
secure: false # CLICKHOUSE_SECURE, use SSL encryption for connect
skip_verify: false # CLICKHOUSE_SKIP_VERIFY
sync_replicated_tables: true # CLICKHOUSE_SYNC_REPLICATED_TABLES
tls_key: "" # CLICKHOUSE_TLS_KEY, filename with TLS key file
tls_cert: "" # CLICKHOUSE_TLS_CERT, filename with TLS certificate file
tls_ca: "" # CLICKHOUSE_TLS_CA, filename with TLS custom authority file
log_sql_queries: true # CLICKHOUSE_LOG_SQL_QUERIES, enable log clickhouse-backup SQL queries on `system.query_log` table inside
clickhouse-server
debug: false # CLICKHOUSE_DEBUG
config_dir: "/etc/clickhouse-server" # CLICKHOUSE_CONFIG_DIR
restart_command: "systemctl restart clickhouse-server" # CLICKHOUSE_RESTART_COMMAND, this command use when you try to restore with --r
bac or --config options
ignore_not_exists_error_during_freeze: true # CLICKHOUSE_IGNORE_NOT_EXISTS_ERROR_DURING_FREEZE, allow avoiding backup failures when yo
u often CREATE / DROP tables and databases during backup creation, clickhouse-backup will ignore `code: 60` and `code: 81` errors during
execute `ALTER TABLE ... FREEZE`
check_replicas_before_attach: true # CLICKHOUSE_CHECK_REPLICAS_BEFORE_ATTACH, allow to avoid concurrent ATTACH PART execution when res
tore ReplicatedMergeTree tables
sftp:
address: "xxx.xxx.xxx.xxx" # SFTP_ADDRESS
port: 60002 # SFTP_PORT
username: "root" # SFTP_USERNAME
password: "root" # SFTP_PASSWORD
key: "" # SFTP_KEY
path: "/data/clickhouse_backup/" # SFTP_PATH
concurrency: 1 # SFTP_CONCURRENCY
compression_format: tar # SFTP_COMPRESSION_FORMAT
compression_level: 1 # SFTP_COMPRESSION_LEVEL
debug: false # SFTP_DEBUG