引言
对于index-only-scan,熟悉PostgreSQL的应该都知道,又称为覆盖索引,也即"仅索引扫描",是一种"常见"的优化方法。
Index-only scans are a major performance feature added to Postgres 9.2. They allow certain types of queries to be satisfied just by retrieving data from indexes, and not from tables. This can result in a significant reduction in the amount of I/O necessary to satisfy queries.
大意就是查询SQL可以从索引中获取到想要的数据,而无需回表,有效减少随机IO以提升性能。在PostgreSQL中,所有的索引都是secondary indexes,即二级索引,也就是说索引和数据(Heap)是分开存储的,意味着查询通过索引找到了满足的键值,还要回表去获取满足条件的值。这一点和Innodb是不同的,虽然PostgreSQL也有cluster进行聚集的语法,但不要混淆,在Innodb里面,聚集索引的叶子节点leaf page存放索引和数据,非聚集索引(二级索引)存放主键的值。
All indexes in PostgreSQL are secondary indexes, meaning that each index is stored separately from the table's main data area (which is called the table's heap in PostgreSQL terminology). This means that in an ordinary index scan, each row retrieval requires fetching data from both the index and the heap.
何为index only scan
如上所述,index-only-scan不需要回表,减少IO,但在执行计划中看到了index-only-scan,PostgreSQL就真的不会回表了吗?以往经验告诉我(经验不足),凡事无绝对。
首先,要走index-only-scan,需要满足两个条件:
查询只获取索引中包含的列。
对应堆页上的所有元组都应该可见。因为索引不会维护可见性信息,因此,为了仅从索引中获取数据,需要避免检查可见性。
第一点很好理解,索引是 (Key + CTID) 的组合,只有部分列

第二点,是通过 vm(visibility map) 文件来完成可见性信息判断的。
postgres=# create table test(id int,info text);
CREATE TABLE
postgres=# insert into test select n,'test' from generate_series(1,10000) as n;
INSERT 0 10000
postgres=# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13578/162418
(1 row)
postgres=# \q
[postgres@xiongcc ~]$ ls -l pgdata/base/13578/ | grep 162418
-rw------- 1 postgres postgres 450560 Jul 20 10:28 162418 ---数据文件
-rw------- 1 postgres postgres 24576 Jul 20 10:28 162418_fsm---空闲空间映射文件
-rw------- 1 postgres postgres 8192 Jul 20 10:29 162418_vm ---可见性映射文件

vm文件长这样,用一个比特位标识某一个数据块是否没有死元组,是否对于事务全部可见了,如下第一个block,没有dead tuple,就设置为"1",另外配合vm文件,还可以加速vacuum清理动作,这也是为什么第一次vacuum会稍慢,后续的vacuum会快很多。

我们可以通过pg_visibility扩展,来获取数据块的可见性信息,
postgres=# select * from pg_visibility_map('test'::regclass) limit 10;
blkno | all_visible | all_frozen
-------+-------------+------------
0 | t | f
1 | t | f
2 | t | f
3 | t | f
4 | t | f
5 | t | f
6 | t | f
7 | t | f
8 | t | f
9 | t | f
(10 rows)
第二个all_frozen是在v9.6中引入的,当数据页中的所有记录已经是frozen的状态时,在发起vacuum freeze时会跳过这个页的扫描,从而大幅提升静态数据的freeze操作,减少IO扫描,对于以前还是会挨个块去扫描

另外值得注意的是,vm文件会在vacuum的时候进行生成,analyze的时候不会
postgres=# create table test1(id int,info text);
CREATE TABLE
postgres=# alter table test1 set (autovacuum_enabled = off);
---为了演示,关闭该表的autovacuum
ALTER TABLE
postgres=# insert into test1 select n,'test' from generate_series(1,10000) as n;
INSERT 0 10000
postgres=# select pg_relation_filepath('test1');
pg_relation_filepath
----------------------
base/13578/162430
(1 row)
postgres=# analyze test1;
ANALYZE
postgres=# \q
[postgres@xiongcc ~]$ ll pgdata/base/13578/ | grep 162430
-rw------- 1 postgres postgres 450560 Jul 20 10:41 162430
-rw------- 1 postgres postgres 24576 Jul 20 10:41 162430_fsm
[postgres@xiongcc ~]$ psql
psql (13.2)
Type "help" for help.
postgres=# vacuum test1;
VACUUM
postgres=# \q
[postgres@xiongcc ~]$ ll pgdata/base/13578/ | grep 162430
-rw------- 1 postgres postgres 450560 Jul 20 10:41 162430
-rw------- 1 postgres postgres 24576 Jul 20 10:41 162430_fsm
-rw------- 1 postgres postgres 8192 Jul 20 10:41 162430_vm ---生成了vm文件
走了index only scan就不会回表了吗
但是,执行计划显示index-only-scan,就真的不会回表了吗?试想一下这么几个问题:
PostgreSQL是会话级缓存执行计划的,非Oracle位于shared pool的全局Cursor,假如单个会话内,SQL最开始是走的index-only-scan,然后表内删除更新了部分数据,vm文件会发生变化吗?原先的SQL会继续延用index-only-scan吗?可见性怎么处理?
index-only-scan的速度就比index scan的速度快吗
带着这两个问题,一探究竟。可以看到,最初因为没有索引只能顺序扫描,建了索引之后,选择使用了index-only-scan
postgres=# \d+ test
Table "public.test"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
Access method: heap
postgres=# explain select id from test where id = 99;
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..180.00 rows=1 width=4)
Filter: (id = 99)
(2 rows)
postgres=# create index t_idx on test using btree (id);
CREATE INDEX
postgres=# explain select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using t_idx on test (cost=0.29..4.30 rows=1 width=4)
Index Cond: (id = 99)
(2 rows)
假如索引不能cover查询的列,就会转为index scan,这一点是毋庸置疑的
postgres=# explain select id,info from test where id = 99;
QUERY PLAN
------------------------------------------------------------------
Index Scan using t_idx on test (cost=0.29..8.30 rows=1 width=9)
Index Cond: (id = 99)
(2 rows)
为此,我们当然可以建一个复合索引,直接cover所需的列
postgres=# create index t_idx2 on test using btree(id,info);
CREATE INDEX
postgres=# explain select id,info from test where id = 99;
QUERY PLAN
------------------------------------------------------------------------
Index Only Scan using t_idx2 on test (cost=0.29..4.30 rows=1 width=9)
Index Cond: (id = 99)
(2 rows)
不过,在PostgreSQL里面,还可以使用 include 的语法,这个是v11引入的特性,试想这么一个场景,如果有一个唯一索引,缺少了一些列用于某些查询的覆盖索引,我们不能简单地将列添加到索引中,因为这会破坏索引最初目的:保证唯一性。那我们就可以使用include的特性,允许包含不影响唯一性且不能用于搜索谓词的非键值列,但仍然可以提供index only scan,见如下示例:
postgres=# create index t_idx3 on test using btree (id) include (info);
CREATE INDEX
postgres=# drop index t_idx2;
DROP INDEX
postgres=# explain select id,info from test where id = 99;
QUERY PLAN
------------------------------------------------------------------------
Index Only Scan using t_idx3 on test (cost=0.29..4.30 rows=1 width=9)
Index Cond: (id = 99)
(2 rows)
实际跑一下效果看看:
postgres=# explain analyze select id from test where id = 99;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx3 on test (cost=0.29..4.30 rows=1 width=4) (actual time=0.122..0.124 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 0
Planning Time: 0.122 ms
Execution Time: 0.164 ms
(5 rows)
postgres=# select ctid from test where id = 99;
ctid
--------
(0,99)
(1 row)
postgres=# select * from pg_visibility_map('test'::regclass,0) limit 10;
---确实可见, all_visible
all_visible | all_frozen
-------------+------------
t | f
(1 row)
Heap Fetch表明了在index-only scan期间,需要回表查找的次数,此处为0,代表不需要回表。
The number of rows Postgres had to look up in the table, rather than the index, during an index-only scan.Postgres still needs to be sure that the row is visible before it can return it, and that information is on the heap, not in the index. It can get around going to the heap for a particular row by checking the visibility map, which records whether or not each page has been changed recently.If the page has changed, then that means a trip to the heap, the same as if we were doing a normal index scan — in fact it’s slightly worse than an index scan, because we’ve added in an extra check of the visibility map.Heap fetches can sometimes be reduced by vacuum, or adjusting autovacuum settings to keep the visibility map more up to date.
When all the information needed is contained in the index, the scan can read all the data from it, without referring to the table itself.This is very fast because everything can be looked up from memory.If a block has changed recently, Postgres needs to check whether the row is visible to the query’s transaction before it can return it, and that information is on the heap, not in the index.If you notice that heap fetches are a high proportion of rows processed (including any filtered), it could be a sign that the visibility map is out of date.
直接Google翻译一下:
在仅索引扫描期间,Postgres 必须在表中而不是索引中查找的行数。Postgres 仍然需要确保该行在返回之前可见,并且该信息在堆上,不在索引中。它可以通过检查可见性映射来绕过特定行的堆,该可见性映射记录了每个页面最近是否已更改。如果页面已更改,则意味着访问堆,就像我们一样正在做一个正常的索引扫描 — 实际上它比索引扫描稍差,因为我们添加了对可见性图的额外检查。有时可以通过vacuum减少堆提取,或调整autovacuum设置以保持可见性图更多最新。
当索引中包含所有需要的信息时,扫描可以从中读取所有数据,而无需参考表本身。这非常快,因为可以从内存中查找所有内容。如果一个块最近发生了变化,Postgres 需要 在查询的事务返回之前检查该行是否对查询的事务可见,并且该信息在堆上,而不是在索引中。如果您注意到 heap fetches 处理的行(包括任何过滤的)的比例很高,这可能表明可见性映射很久没更新了。
新开一个会话,更新一行,看看第0块会如何变化,可以看到,立马变成了not_all_visible
postgres=# select ctid from test where id = 30;
ctid
--------
(0,30)
(1 row)
postgres=# select * from pg_visibility_map('test'::regclass,0) limit 10;
all_visible | all_frozen
-------------+------------
t | f
(1 row)
postgres=# delete from test where id = 30;
DELETE 1
postgres=# select * from pg_visibility_map('test'::regclass,0) limit 10;
---可以看到,立马变成了not_all_visible
all_visible | all_frozen
-------------+------------
f | f
(1 row)
再看看第一个会话的SQL,可以看到还是选择走index only scan
postgres=# explain select id from test where id = 99;
QUERY PLAN
------------------------------------------------------------------------
Index Only Scan using t_idx3 on test (cost=0.29..4.30 rows=1 width=4)
Index Cond: (id = 99)
(2 rows)
postgres=# select pg_backend_pid();
pg_backend_pid
----------------
31174
(1 row)
第二个会话里也进行查看,排除缓存执行计划的影响,可以看到,也是走了index only scan
postgres=# explain select id from test where id = 99;
QUERY PLAN
------------------------------------------------------------------------
Index Only Scan using t_idx3 on test (cost=0.29..4.30 rows=1 width=4)
Index Cond: (id = 99)
(2 rows)
postgres=# select pg_backend_pid();
pg_backend_pid
----------------
31284
(1 row)
第一个会话实际跑一下,可以看到此处Heap Fetches变成了1,意味着此处需要回表查询数据块去检验可见性了。
postgres=# explain analyze select id from test where id = 99;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx3 on test (cost=0.29..4.30 rows=1 width=4) (actual time=0.073..0.075 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 1
Planning Time: 0.104 ms
Execution Time: 0.106 ms
(5 rows)
---和第一次跑的时候做比较
postgres=# explain analyze select id from test where id = 99;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx3 on test (cost=0.29..4.30 rows=1 width=4) (actual time=0.122..0.124 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 0
Planning Time: 0.122 ms
Execution Time: 0.164 ms
(5 rows)
清除环境,重新看一下效果
postgres=# drop table test;
DROP TABLE
postgres=# create table test(id int,info text);
CREATE TABLE
postgres=# alter table test set (autovacuum_enabled = off);
ALTER TABLE
postgres=# insert into test select n,'test' from generate_series(1,100000) as n;
INSERT 0 100000
postgres=# create index t_idx on test using btree (id);
CREATE INDEX
postgres=# analyze;
ANALYZE
postgres=# explain select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using t_idx on test (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 99)
(2 rows)
postgres=# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13578/162496
(1 row)
postgres=# \q
[postgres@xiongcc ~]$ ls -l pgdata/base/13578/ | grep 162496
---此时还没有生成vm文件
-rw------- 1 postgres postgres 4431872 Jul 20 13:43 162496
-rw------- 1 postgres postgres 24576 Jul 20 13:43 162496_fsm
可以看到,此时还没有生成vm文件,同样Heap Fetches变成了1
postgres=# explain (analyze,buffers,verbose) select id from test where id = 99;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx on public.test (cost=0.29..8.31 rows=1 width=4) (actual time=0.118..0.120 rows=1 loops=1)
Output: id
Index Cond: (test.id = 99)
Heap Fetches: 1
Buffers: shared hit=1 read=2
I/O Timings: read=0.023
Planning Time: 0.098 ms
Execution Time: 0.171 ms
(8 rows)
postgres=# select * from bt_metap('t_idx');
magic | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples | allequalimage
--------+---------+------+-------+----------+-----------+-------------+-------------------------+---------------
340322 | 4 | 3 | 1 | 3 | 1 | 0 | -1 | t
(1 row)
另外可以看到,索引的高度level是1,代表只有1层(排除了root根节点),那么上面的数据块的流程如下:
Buffers: shared hit=1 read=2
通过根节点,再到下一层叶节点,最后根据CTID回表,总计3个数据块,其中在shared_buffers里面命中了一个数据块。另外,需要知道的是,vm文件本身是足够小的,可以足够放在内存中,In most situations the visibility map remains cached in memory all the time.
[postgres@xiongcc ~]$ ls -lth pgdata/base/13578/ | grep 162540
-rw------- 1 postgres postgres 8.0K Jul 20 14:20 162540_vm
-rw------- 1 postgres postgres 4.3M Jul 20 14:19 162540
-rw------- 1 postgres postgres 24K Jul 20 14:19 162540_fsm
还有pg_class中的relallvisible字段,用于统计全部可见的数据块有多少,vacuum analyze时会更新这些统计信息,让PostgreSQL可以对页面的情况更加了解。
postgres=# drop table test;
DROP TABLE
postgres=# create table test(id int,info text);
CREATE TABLE
postgres=# alter table test set (autovacuum_enabled = off);
ALTER TABLE
postgres=# insert into test select n,'test' from generate_series(1,100000) as n;
INSERT 0 100000
postgres=# create index t_idx on test using btree (id);
CREATE INDEX
postgres=# analyze test;
ANALYZE
postgres=# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13578/162589
(1 row)
postgres=# explain select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using t_idx on test (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 99)
(2 rows)
postgres=# select relallvisible,relname from pg_class where relname = 'test';
relallvisible | relname
---------------+---------
0 | test
(1 row)
postgres=# vacuum test;
VACUUM
postgres=# select relallvisible,relname from pg_class where relname = 'test';
relallvisible | relname
---------------+---------
541 | test
(1 row)
postgres=# delete from test where id = 99; ---未得到更新
DELETE 1
postgres=# select relallvisible,relname from pg_class where relname = 'test';
relallvisible | relname
---------------+---------
541 | test
(1 row)
postgres=# delete from test where id = 9999; ---未得到更新
DELETE 1
postgres=# select relallvisible,relname from pg_class where relname = 'test';
relallvisible | relname
---------------+---------
541 | test
(1 row)
postgres=# analyze ;
ANALYZE
postgres=# select relallvisible,relname from pg_class where relname = 'test';
relallvisible | relname
---------------+---------
539 | test
(1 row)
哪些索引可以用于index-only-scan
Btree
上述都只是针对Btree索引进行讨论,因为前提是索引可以覆盖查询列,比如Hash,就不支持复合索引,我们可以通过如下SQL查询索引的属性:
postgres=# select a.amname, p.name, pg_indexam_has_property(a.oid,p.name)
from pg_am a,
unnest(array['can_order','can_unique','can_multi_col','can_exclude','can_include']) p(name)
where a.amname = 'btree';
amname | name | pg_indexam_has_property
--------+---------------+-------------------------
btree | can_order | t
btree | can_unique | t
btree | can_multi_col | t
btree | can_exclude | t
btree | can_include | t
(5 rows)
can_order:我们能够在创建索引时指定值的排序顺序(到目前为止仅适用于btree)。
can_unique:支持唯一约束和主键(仅适用于btree)。
can_multi_col:可以在多个列上创建索引。
can_exclude:支持排除约束exclude。
can_include:是否支持include。
postgres=# select p.name, pg_index_has_property('t_idx'::regclass,p.name)
from unnest(array[
'clusterable','index_scan','bitmap_scan','backward_scan'
]) p(name);
name | pg_index_has_property
---------------+-----------------------
clusterable | t
index_scan | t
bitmap_scan | t
backward_scan | t
(4 rows)
clusterable:是否可能根据索引对数据行重新排序(同名命令是cluster)
index_scan:支持索引扫描。虽然这个属性看起来很奇怪,但并不是所有的索引都可以一个一个地返回TID,有些是一次性全部返回,并且只支持位图扫描
bitmap_scan:支持位图扫描
backward_scan:返回结果的顺序与创建索引时指定的顺序相反。
哈希索引
哈希索引不支持复合索引,和include索引。其他索引也是类似,依葫芦画瓢即可。
postgres=# select a.amname, p.name, pg_indexam_has_property(a.oid,p.name)
from pg_am a,
unnest(array['can_order','can_unique','can_multi_col','can_exclude','can_include']) p(name)
where a.amname = 'hash';
amname | name | pg_indexam_has_property
--------+---------------+-------------------------
hash | can_order | f
hash | can_unique | f
hash | can_multi_col | f
hash | can_exclude | t
hash | can_include | f
(5 rows)
postgres=# create index t_idx2 on test using hash (id) include(info);
ERROR: access method "hash" does not support included columns
postgres=# create index t_idx2 on test using hash (id,info);
ERROR: access method "hash" does not support multicolumn indexes
HOT实战
另外一个有趣的case是,HOT,我们知道,HOT是为了减少不必要的索引IO。PostgreSQL目前默认的存储引擎在更新记录时,会在堆内产生一条新版本,旧版本在不需要使用后vacuum回收,回收旧版本前,需要先回收所有关联这个版本的所有索引point item。
v8.3以前,每个tuple版本都有对应的索引point item,因此更新的放大比较大。
v8.3开始,引入了HOT的概念,当更新记录时,如果能满足两个条件时,通过heap page内部link来串起所有tuple版本,因此索引不变。
来看看实际例子,清除一下环境,重新操作一遍。更新了一条之后, Heap Fetches变成了2,如下:
postgres=# select ctid from test where id = 99;
ctid
--------
(0,99)
(1 row)
postgres=# vacuum test;
VACUUM
postgres=# explain analyze select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx on test (cost=0.29..4.31 rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 0
Planning Time: 0.148 ms
Execution Time: 0.046 ms
(5 rows)
postgres=# update test set info = 'hello world' where id = 99;
UPDATE 1
postgres=# explain analyze select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx on test (cost=0.29..4.31 rows=1 width=4) (actual time=0.071..0.074 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 2
Planning Time: 0.083 ms
Execution Time: 0.104 ms
(5 rows)
postgres=# select ctid from test where id = 99;
ctid
-----------
(540,101)
(1 row)
再执行一次,又变成了1
postgres=# explain analyze select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx on test (cost=0.29..4.31 rows=1 width=4) (actual time=0.033..0.035 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 1
Planning Time: 0.107 ms
Execution Time: 0.069 ms
(5 rows)
这是因为HOT的缘故,最开始更新之后,采用了HOT技术,流程如下:
找到了满足的行指针item pointer (0,99)
然后通过行指针item pointer 定位到CTID=(0,99)的数据块
通过CTID的重定向,找到CTID=(540,101)的数据块

所以最开始,读了两个block,也就是Heap Fetches: 2,为什么又执行一次变成了1呢?这是因为剪枝pruning的原因,因为原先的老行是死元组,可能会被vacuum清理掉,这样就找不到正确的数据了,于是PostgreSQL会择机进行剪枝,对行指针item pointer进行重定向,重定向到最新的行指针,剪枝可能在select、update、insert、delete的时候执行,当然其他时机也有可能,具体可以参照Readme。如下:
The pruning processing will be executed, if possible, when a SQL command is executed such as SELECT, UPDATE, INSERT and DELETE. The exact execution timing is not described in this chapter because it is very complicated. The details are described in the README.HOT file.

可以看到,剪枝之后,流程变成了:
找到了满足的行指针item pointer (0,99)
然后通过行指针item pointer重定向到最新的行指针item pointer (540,101)
定位到CTID=(540,101)的数据块
所以这样只需要一次数据块读取了。可以这么验证:
postgres=# vacuum test;
VACUUM
postgres=# select ctid from test where id = 99;
ctid
--------
(0,99)
(1 row)
postgres=# explain analyze select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx on test (cost=0.29..4.31 rows=1 width=4) (actual time=0.036..0.038 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 0
Planning Time: 0.093 ms
Execution Time: 0.067 ms
(5 rows)
postgres=# update test set info = 'hello world' where id = 99;
UPDATE 1
postgres=# select ctid from test where id = 99;
---此处先进程查询,进行了剪枝
ctid
-----------
(540,101)
(1 row)
postgres=# explain analyze select id from test where id = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx on test (cost=0.29..4.31 rows=1 width=4) (actual time=0.027..0.029 rows=1 loops=1)
Index Cond: (id = 99)
Heap Fetches: 1
Planning Time: 0.093 ms
Execution Time: 0.057 ms
(5 rows)
注意观察最后一步,Heap Fetches变成了1,因为上一步查询的时候进行了剪枝,只需要一次数据块读取了。
当然,HOT不是万能的,也有限制:
当更新的元组和老元组不在同一个page中时,新旧元组链是不能跨越页面的,指向该元组的索引元组也会被添加到索引页面中。
当索引的key值更新时,原有索引记录中的key无法再定位到正确元组,此时会在索引页面中插入一条新的索引元组。

同时值得注意的是,HOT在流复制场景下,会导致冲突:
One way to reduce the need for VACUUM is to use HOT updates. Then any query on the primary that accesses a page with dead heap-only tuples and can get an exclusive lock on it will prune the HOT chains. PostgreSQL always holds such page locks for a short time, so there is no conflict with processing on the primary. There are other causes for page locks, but this is perhaps the most frequent one.When the standby server should replay such an exclusive page lock and a query is using the page (“has the page pinned” in PostgreSQL jargon), you get a buffer pin replication conflict. Pages can be pinned for a while, for example during a sequential scan of a table on the outer side of a nested loop join.HOT chain pruning can of course also lead to snapshot replication conflicts.
减少对VACUUM需求的一种方法便是使用HOT更新。然后,任何在主数据库上访问包含死的heap-only元组的页面并可以在该页面上获取exclusive锁的查询都将修剪HOT链。PostgreSQL总是在很短的时间内持有这种页面锁,因此与主数据库上的处理没有冲突。页面锁定还有其他原因,但这可能是最常见的原因。当备用服务器回放此类的排它页面锁并且有查询正在使用该页面时 (在PostgreSQL中的术语为"已将页面固定"),则会出现buffer pin复制冲突。页面可以被固定一段时间,例如在嵌套循环连接外侧的表的顺序扫描期间。当然,HOT链剪枝也可能导致快照复制冲突。
postgres=# select * from pg_stat_database_conflicts ;
datid | datname | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock
-------+-----------+------------------+------------+----------------+-----------------+----------------
13578 | postgres | 0 | 0 | 0 | 0 | 0
1 | template1 | 0 | 0 | 0 | 0 | 0
13577 | template0 | 0 | 0 | 0 | 0 | 0
79871 | mydb | 0 | 0 | 0 | 0 | 0
(4 rows)
小结
在以上的例子中,可以看到,即时还没有生成vm文件,PostgreSQL也会认为走index only scan更高效,因为vm文件很小,足够缓存在内存中,同时可见性映射的扫描比扫描堆表的代价要小四个数量级。所以这个也是类似于MySQL的优化大法:禁止select *,一方面会增加网络、磁盘IO,另一方面需要回表,增加IO,PostgreSQL也是一样,可以使用Index only scan优化,减少随机IO。
Even in the successful case, this approach trades visibility map accesses for heap accesses; but since the visibility map is four orders of magnitude smaller than the heap it describes, far less physical I/O is needed to access it. In most situations the visibility map remains cached in memory all the time.
所以我们可以得出如下几个结论:
PostgreSQL会优先选择index only scan,因为扫描vm的代价比扫描堆表要小得多,差了四个数量级
即使选择了index only scan,并不代表不会回表,取决于vm文件是否存在,是否是最新的,这也从侧面体现出了autovacuum的重要性,其中之一目的便是更新vm文件
vm文件会在vacuum的时候更新,标记为all_visible,但是一旦发生了更新删除,所在的数据块会立即变成not_all_visible,这个时候就会回表,去判断可见性,对应的是Heap Fetch
假如存在vm文件过旧,大量的数据库已经不是all_visible了,这个时候index only scan相较于普通的index scan会更慢,因为多了一步扫描vm文件的代价。
跑count(*)的时候,PostgreSQL经常被诟病为什么这么慢?MySQL为什么这么快?因为PostgreSQL得去挨个统计行的可见性,所以试着跑一下vacuum analyze,有了vm文件后,相信速度会提升很多
使用覆盖索引来满足index-only-scan的时候,还要考虑对于HOT的影响,因为列越多,更新了键值就不能使用HOT了,导致使用HOT的机会也就越少,造成不必要的IO。
在v11中,还支持了parallel index only scan。所以,假如发生了index only scan跑的慢的时候,该知道如何分析了吧。
参考
https://www.postgresql.org/docs/13/indexes-index-only-scans.html
https://www.pgmustard.com/blog/2019/03/04/index-only-scans-in-postgres
https://blog.dbi-services.com/an-index-only-scan-in-postgresql-is-not-always-index-only/
https://wiki.postgresql.org/wiki/Index-only_scans
https://www.interdb.jp/pg/pgsql07.html