
前言
通常,我们都使用initdb --wal-segsize=<8|16 ….>来指定WAL段文件的大小,以MB为单位。默认是16MB。那么initdb完之后,有没有办法再次更改WAL文件大小呢?有人提议使用pg_resetwal来搞定。
如:
pg_resetwal resets the PostgreSQL write-ahead log.
Usage:
pg_resetwal [OPTION]... DATADIR
Options:
-c, --commit-timestamp-ids=XID,XID
set oldest and newest transactions bearing
commit timestamp (zero means no change)
[-D, --pgdata=]DATADIR data directory
-e, --epoch=XIDEPOCH set next transaction ID epoch
-f, --force force update to be done
-l, --next-wal-file=WALFILE set minimum starting location for new WAL
-m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID
-n, --dry-run no update, just show what would be done
-o, --next-oid=OID set next OID
-O, --multixact-offset=OFFSET set next multitransaction offset
-u, --oldest-transaction-id=XID set oldest transaction ID
-V, --version output version information, then exit
-x, --next-transaction-id=XID set next transaction ID
--wal-segsize=SIZE size of WAL segments, in megabytes
-?, --help show this help, then exit
其中--wal-segsize=SIZE 就可以重置WAL段大小。
我们来看看使用它,可行性以及需要注意些什么。在试验之前,我本能的意识到,在重置WAL段大小之前后,基于PITR的恢复,可能会受到冲击,以至于不好使。切换前后,最好的做法是做一次基础备份。并且要划分好两次段大小的起始点。
下面我们看看实验过程。
实验分析
原库以16MB为WAL段大小,开始归档模式。
archive_mode = on
archive_command = 'cp %p /iihero/archive/data/%f'
1、全备
默认16MB下的库,做一次全备 (全备点1)
[10:04:51-postgres@centos1:/var/lib/pgsql/14/backups]$ pg_basebackup -c fast -D data.bak/ -Fp -v -P -p 5555 -U postgres
2、原库做一些CUD操作
为后续PITR恢复做准备
create table t(id int, col2 varchar(32), col3 timestamp);
insert into t select n, 'test' || n, clock_timestamp() from generate_series(1, 1000) as n;
select pg_current_xact_id(), pg_current_wal_lsn(), pg_switch_wal(); checkpoint; select clock_timestamp();
pg_current_xact_id | pg_current_wal_lsn | pg_switch_wal
--------------------+--------------------+---------------
740 | 0/3031268 | 0/3031280
(1 row)
CHECKPOINT
clock_timestamp
-------------------------------
2023-12-23 22:45:09.614875+08
(1 row)
delete from t;
select pg_current_xact_id(), pg_current_wal_lsn(), pg_switch_wal(); checkpoint; select clock_timestamp();
pg_current_xact_id | pg_current_wal_lsn | pg_switch_wal
--------------------+--------------------+---------------
742 | 0/401A940 | 0/401A958
(1 row)
CHECKPOINT
clock_timestamp
-------------------------------
2023-12-23 22:45:25.831306+08
(1 row)
3、重设WAL段大小
重设WAL段大小
pg_ctl stop
pg_resetwal --wal-segsize=32 -D $PGDATA
Write-ahead log reset
4、重启PG,验证大小
pg_ctl start
insert into t select n, 'test' || n , clock_timestamp() from generate_series(5001, 100000) as n;
select pg_current_xact_id(), pg_current_wal_lsn(), pg_switch_wal(); checkpoint; select clock_timestamp();
pg_current_xact_id | pg_current_wal_lsn | pg_switch_wal
--------------------+--------------------+---------------
744 | 0/A745BB0 | 0/A745BC8
(1 row)
CHECKPOINT
clock_timestamp
-------------------------------
2023-12-23 22:46:05.194161+08
(1 row)
\! ls -la 14/data/pg_wal
total 65544
drwx------ 3 postgres postgres 140 Dec 23 22:46 .
drwx------ 20 postgres users 4096 Dec 23 22:45 ..
-rw------- 1 postgres postgres 338 Dec 23 22:44 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 33554432 Dec 23 22:46 000000010000000000000006
-rw------- 1 postgres postgres 33554432 Dec 23 22:46 000000010000000000000007
drwx------ 2 postgres postgres 59 Dec 23 22:46 archive_status
我们可以看到上边的文件大小,确实都变成了32兆。
再看看归档目录:
postgres=# \! ls -la /iihero/archive/data
total 98308
drwxr-xr-x. 2 postgres postgres 214 Dec 23 22:45 .
drwxr-xr-x. 15 postgres postgres 195 Oct 22 10:01 ..
-rw------- 1 postgres postgres 16777216 Dec 23 22:44 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Dec 23 22:44 000000010000000000000002
-rw------- 1 postgres postgres 338 Dec 23 22:44 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Dec 23 22:45 000000010000000000000003
-rw------- 1 postgres postgres 16777216 Dec 23 22:45 000000010000000000000004
-rw------- 1 postgres postgres 33554432 Dec 23 22:46 000000010000000000000005
既有16M的老的归档文件,也有新的32M大小的新的归档文件。
我们再做一些操作:
delete from t where id > 5002;
DELETE 94998
select * from t;
id | col2 | col3
------+----------+----------------------------
5001 | test5001 | 2023-12-23 22:45:59.388643
5002 | test5002 | 2023-12-23 22:45:59.388826
(2 rows)
select pg_current_xact_id(), pg_current_wal_lsn(), pg_switch_wal(); checkpoint; select clock_timestamp();
pg_current_xact_id | pg_current_wal_lsn | pg_switch_wal
--------------------+--------------------+---------------
747 | 0/CA1BF88 | 0/CA1BFA0
(1 row)
CHECKPOINT
clock_timestamp
-------------------------------
2023-12-23 22:47:38.724118+08
(1 row)
5、验证两个时间段的恢复
(1)取段大小未改变前的某一个时间点:xid: 740, 时间为:2023-12-23 22:45:09.614875+08
pg_ctl stop
waiting for server to shut down.... done
/var/lib/pgsql/14/backups/data.bak]$ touch recovery.signal
vi postgresql.conf
restore_command='cp /iihero/archive/data/%f %p'
recovery_target_time = '2023-12-23 22:45:09.614875+08'
[22:51:48-postgres@centos2:/var/lib/pgsql/14/backups/data.bak]$ pg_ctl start -D ./
waiting for server to start....2023-12-23 22:51:56.137 CST [2954] LOG: redirecting log output to logging collector process
2023-12-23 22:51:56.137 CST [2954] HINT: Future log output will appear in directory "log".
done
server started
[22:51:56-postgres@centos2:/var/lib/pgsql/14/backups/data.bak]$ psql
psql (14.8)
Type "help" for help.
postgres=# select count(*) from t;
count
-------
1000
(1 row)
postgres=# select * from t order by col3 desc limit 1;
id | col2 | col3
------+----------+----------------------------
1000 | test1000 | 2023-12-23 22:45:04.333741
(1 row)
在这个时间点上基于原来的基础备份,是可以恢复的。这个时间点位于段大小改变之前。
(2) 取段大小改变后的某一个时间点:xid: 747, 时间为:2023-12-23 22:47:38.724118+08
/var/lib/pgsql/14/backups/data.bak]$ pg_ctl stop -D ./
vi postgresql.conf
restore_command='cp /iihero/archive/data/%f %p'
recovery_target_time = '2023-12-23 22:47:38.724118+08'
-- 再次启动
[22:54:51-postgres@centos2:/var/lib/pgsql/14/backups/data.bak]$ pg_ctl start -D ./
waiting for server to start....2023-12-23 22:55:18.904 CST [2996] LOG: redirecting log output to logging collector process
2023-12-23 22:55:18.904 CST [2996] HINT: Future log output will appear in directory "log".
stopped waiting
pg_ctl: could not start server
Examine the log output.
我们发现,启动的时候报错,日志里头显示 :
2023-12-23 22:55:18.911 CST [2998] LOG: starting point-in-time recovery to 2023-12-23 22:47:38.724118+08
2023-12-23 22:55:18.932 CST [2998] LOG: restored log file "000000010000000000000004" from archive
2023-12-23 22:55:18.947 CST [2998] LOG: redo starts at 0/4000050
2023-12-23 22:55:18.948 CST [2998] LOG: consistent recovery state reached at 0/401A918
2023-12-23 22:55:18.949 CST [2996] LOG: database system is ready to accept read-only connections
2023-12-23 22:55:18.986 CST [2998] FATAL: archive file "000000010000000000000005" has wrong size: 33554432 instead of 16777216
2023-12-23 22:55:18.987 CST [2996] LOG: startup process (PID 2998) exited with exit code 1
2023-12-23 22:55:18.987 CST [2996] LOG: terminating any other active server processes
2023-12-23 22:55:18.988 CST [2996] LOG: shutting down due to startup process failure
2023-12-23 22:55:18.989 CST [2996] LOG: database system is shut down
有这么一句:FATAL: archive file "000000010000000000000005" has wrong size: 33554432 instead of 16777216
也就是说,它还是希望是16M的大小去解析。所以报这个错。
最后
如果我们在WAL段大小改变后,重新做一次基础备份,再看看基于这个新的基础备份以后的数据恢复,归档目录的内容全都不变。看看情况如何?
1. 启动原来改变了WAL段大小的实例:
[22:59:45-postgres@centos2:/var/lib/pgsql/14/data]$ pg_ctl start
waiting for server to start....2023-12-23 22:59:48.626 CST [3058] LOG: redirecting log output to logging collector process
2023-12-23 22:59:48.626 CST [3058] HINT: Future log output will appear in directory "log".
done
server started
2. 再做一次基础备份 并设置 信号文件(recovery.signal)
[23:00:26-postgres@centos2:/var/lib/pgsql/14/backups]$ pg_basebackup -c fast -D data.bak2/ -Fp -v -P -p 5555 -U postgres
[23:00:50-postgres@centos2:/var/lib/pgsql/14/backups/data.bak2]$ touch recovery.signal
3. 对原库做些CUD操作
postgres=# select count(*) from t;
count
-------
2
(1 row)
postgres=# insert into t values (-99, 'abc', clock_timestamp());
INSERT 0 1
postgres=# select pg_current_xact_id(), pg_current_wal_lsn(), pg_switch_wal(); checkpoint; select clock_timestamp();
pg_current_xact_id | pg_current_wal_lsn | pg_switch_wal
--------------------+--------------------+---------------
751 | 0/1401CBF8 | 0/1401CC10
(1 row)
CHECKPOINT
clock_timestamp
-------------------------------
2023-12-23 23:02:36.542212+08
(1 row)
postgres=# delete from t;
DELETE 3
postgres=# select pg_current_xact_id(), pg_current_wal_lsn(), pg_switch_wal(); checkpoint; select clock_timestamp();
pg_current_xact_id | pg_current_wal_lsn | pg_switch_wal
--------------------+--------------------+---------------
753 | 0/160003F8 | 0/16000410
(1 row)
CHECKPOINT
clock_timestamp
-------------------------------
2023-12-23 23:02:57.638854+08
(1 row)
后边我们就基于备份2,做恢复到XID:751的操作
pg_ctl stop
进到备份目录:var/lib/pgsql/14/backups/data.bak2
recovery_target_xid = '751'
restore_command='cp /pgccc/archive/data/%f %p'
recovery_target_action = 'pause'
然后启动备份的实例:
[23:05:14-postgres@centos2:/var/lib/pgsql/14/backups/data.bak2]$ pg_ctl start -D ./
waiting for server to start....2023-12-23 23:05:17.494 CST [3152] LOG: redirecting log output to logging collector process
2023-12-23 23:05:17.494 CST [3152] HINT: Future log output will appear in directory "log".
done
server started
我们发现它可以成功启动,日志显示 :
2023-12-23 23:05:17.509 CST [3154] LOG: starting point-in-time recovery to XID 751
2023-12-23 23:05:17.825 CST [3154] LOG: restored log file "000000010000000000000009" from archive
2023-12-23 23:05:17.851 CST [3154] LOG: redo starts at 0/12000028
2023-12-23 23:05:17.852 CST [3154] LOG: consistent recovery state reached at 0/12000138
2023-12-23 23:05:17.852 CST [3152] LOG: database system is ready to accept read-only connections
2023-12-23 23:05:17.902 CST [3154] LOG: restored log file "00000001000000000000000A" from archive
2023-12-23 23:05:18.250 CST [3154] LOG: restored log file "00000001000000000000000B" from archive
2023-12-23 23:05:18.272 CST [3154] LOG: recovery stopping after commit of transaction 751, time 2023-12-23 23:02:35.919079+08
2023-12-23 23:05:18.272 CST [3154] LOG: pausing at the end of recovery
2023-12-23 23:05:18.272 CST [3154] HINT: Execute pg_wal_replay_resume() to promote.
主要是因为redo starts at 0/12000028, 它这个起点直接定位到WAL段大小改变后的段文件了。所以不会有错。
总结:
pg_resetwal可以用于重新设定WAL段大小。只是在设定以后,启动完实例,需要立即进行一次全新的基础备份。在重设大小之前的基础备份无法用于更改WAL段大小以后的时间点的数据恢复。相当地旧段大小适用于旧的基础备份,新的WAL段大小适用于新的基础备份。
