PostgreSQL支持Hash索引,本节简单介绍Hash索引的结构等相关信息.

逻辑结构
可以把Hash Index理解为一个Hash Table,每个Hash bucket存储根据Hash Function计算得到的对应的索引条目,为了节省空间,Hash索引条目只存储Hash Code(即Hash Value) + TID而不存储Hash Key(即索引键值),扫描索引后还必须读取相应的数据表行,因此Index Only Scan不适用于Hash Index.


testdb=# drop table if exists t_idx1;
DROP TABLE
testdb=# create table t_idx1(id int,c1 varchar(20));
CREATE TABLE
testdb=# create index idx_t_idx1_id on t_idx1 using hash(id);
CREATE INDEX
testdb=# insert into t_idx1 select generate_series(1,100000);
INSERT 0 100000
testdb=# analyze t_idx1;
ANALYZE
testdb=# explain verbose select * from t_idx1 where id = 1;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Index Scan using idx_t_idx1_id on public.t_idx1  (cost=0.00..8.02 rows=1 width=62)
   Output: id, c1
   Index Cond: (t_idx1.id = 1)
(3 rows)
testdb=# -- 不能实现Index Only Scan
testdb=# explain verbose select id from t_idx1 where id = 100;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Index Scan using idx_t_idx1_id on public.t_idx1  (cost=0.00..8.02 rows=1 width=4)
   Output: id
   Index Cond: (t_idx1.id = 100)
(3 rows)

而普通的B-Tree索引是可以Index Only Scan的:


testdb=# create table t_idx2(id int,c1 varchar(20));
CREATE TABLE
testdb=# insert into t_idx2 select generate_series(1,100000);
INSERT 0 100000
testdb=# create index idx_t_idx2_id on t_idx2 using btree(id);
CREATE INDEX
testdb=# analyze t_idx2;
ANALYZE
testdb=# explain verbose select id from t_idx2 where id = 100;
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Index Only Scan using idx_t_idx2_id on public.t_idx2  (cost=0.29..8.31 rows=1 width=4)
   Output: id
   Index Cond: (t_idx2.id = 100)
(3 rows)

物理结构
下图是Hash Index的物理存储结构:

有四种页面,分别是Meta page,Bucket Page,Overflow page和Bitmap page.

页面类型	说明
Meta page	page number zero, which contains information on what is inside the index.
Bucket pages	main pages of the index, which store data as «hash code — TID» pairs.
Overflow pages	structured the same way as bucket pages and used when one page is insufficient for a bucket
Bitmap pages	which keep track of overflow pages that are currently clear and can be reused for other buckets

使用pageinspect插件可查看index中的相关信息


testdb=# select hash_page_type(get_raw_page('idx_t_idx1_id',0));
 hash_page_type 
----------------
 metapage
(1 row)
testdb=# select hash_page_type(get_raw_page('idx_t_idx1_id',1));
 hash_page_type 
----------------
 bucket
(1 row)
testdb=# \x
Expanded display is on.
testdb=# select * from hash_page_stats(get_raw_page('idx_t_idx1_id',1));
-[ RECORD 1 ]---+-----------
live_items      | 189
dead_items      | 0
page_size       | 8192
free_size       | 4368
hasho_prevblkno | 256
hasho_nextblkno | 4294967295
hasho_bucket    | 0
hasho_flag      | 2
hasho_page_id   | 65408
testdb=# select * from hash_page_stats(get_raw_page('idx_t_idx1_id',2));
-[ RECORD 1 ]---+-----------
live_items      | 201
dead_items      | 0
page_size       | 8192
free_size       | 4128
hasho_prevblkno | 257
hasho_nextblkno | 4294967295
hasho_bucket    | 1
hasho_flag      | 2
hasho_page_id   | 65408

参考资料
Indexes in PostgreSQL — 3 (Hash)
Hash Indexing README
hashfunc.c

PostgreSQL DBA(43) - Index(Hash)