本文简单介绍了PostgreSQL中不同数据类型对查询性能的影响,包括numeric、float、smallint、int和boolean。
容量
数据列占用空间大小
[local]:5432 pg12@testdb=# SELECT pg_column_size(SMALLINT '1'),pg_column_size(INT4 '1'), pg_column_size(NUMERIC(6,0) '1'),pg_column_size(FLOAT '1');
pg_column_size | pg_column_size | pg_column_size | pg_column_size
----------------+----------------+----------------+----------------
2 | 4 | 8 | 8
创建数据表,0和1的数据值各插入100w行,查看数据表的占用空间大小。
numeric
[local]:5432 pg12@testdb=# create table t_numeric(id numeric);
CREATE TABLE
[local]:5432 pg12@testdb=#
[local]:5432 pg12@testdb=# insert into t_numeric select 0 from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=# insert into t_numeric select 1 from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=# select pg_size_pretty(pg_relation_size('t_numeric'));
pg_size_pretty
----------------
69 MB
(1 row)
float
[local]:5432 pg12@testdb=# create table t_float(id int);
CREATE TABLE
[local]:5432 pg12@testdb=#
[local]:5432 pg12@testdb=# insert into t_float select 0 from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=# insert into t_float select 1 from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=#
[local]:5432 pg12@testdb=# select pg_size_pretty(pg_relation_size('t_float'));
pg_size_pretty
----------------
69 MB
(1 row)
[local]:5432 pg12@testdb=#
int
[local]:5432 pg12@testdb=# create table t_int(id int);
CREATE TABLE
[local]:5432 pg12@testdb=#
[local]:5432 pg12@testdb=# insert into t_int select 0 from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=# insert into t_int select 1 from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=# select pg_size_pretty(pg_relation_size('t_int'));
pg_size_pretty
----------------
69 MB
(1 row)
smallint
[local]:5432 pg12@testdb=# create table t_smallint(id smallint);
CREATE TABLE
[local]:5432 pg12@testdb=#
[local]:5432 pg12@testdb=# insert into t_smallint select 0 from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=# insert into t_smallint select 1 from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=#
[local]:5432 pg12@testdb=# select pg_size_pretty(pg_relation_size('t_smallint'));
pg_size_pretty
----------------
69 MB
(1 row)
boolean
[local]:5432 pg12@testdb=# create table t_bool(id boolean);
CREATE TABLE
[local]:5432 pg12@testdb=# insert into t_bool select 0::boolean from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=# insert into t_bool select 1::boolean from generate_series(1,1000000);
INSERT 0 1000000
[local]:5432 pg12@testdb=#
[local]:5432 pg12@testdb=# select pg_size_pretty(pg_relation_size('t_bool'));
pg_size_pretty
----------------
69 MB
(1 row)
可以看到,四种数据类型占用的空间都是69 MB。
查询性能
不加条件,全表扫描
-- 禁用并行
[local]:5432 pg12@testdb=# SET max_parallel_workers_per_gather = 0;
SET
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_numeric;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=33850.00..33850.01 rows=1 width=8) (actual time=478.196..478.196 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_numeric (cost=0.00..28850.00 rows=2000000 width=0) (actual time=0.053..255.949 rows=2000000 loops=1)
Output: id
Buffers: shared hit=8850
Planning Time: 0.716 ms
Execution Time: 478.280 ms
(8 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_float;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=33850.00..33850.01 rows=1 width=8) (actual time=421.919..421.919 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_float (cost=0.00..28850.00 rows=2000000 width=0) (actual time=0.010..222.624 rows=2000000 loops=1)
Output: id
Buffers: shared hit=8850
Planning Time: 0.231 ms
Execution Time: 421.948 ms
(8 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_int;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=33850.00..33850.01 rows=1 width=8) (actual time=440.328..440.328 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_int (cost=0.00..28850.00 rows=2000000 width=0) (actual time=0.011..236.078 rows=2000000 loops=1)
Output: id
Buffers: shared hit=8850
Planning Time: 0.208 ms
Execution Time: 440.359 ms
(8 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_smallint;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=33850.00..33850.01 rows=1 width=8) (actual time=439.007..439.007 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_smallint (cost=0.00..28850.00 rows=2000000 width=0) (actual time=0.043..232.069 rows=2000000 loops=1)
Output: id
Buffers: shared hit=8850
Planning Time: 0.553 ms
Execution Time: 439.081 ms
(8 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_bool;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=33850.00..33850.01 rows=1 width=8) (actual time=430.800..430.800 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_bool (cost=0.00..28850.00 rows=2000000 width=0) (actual time=0.010..230.333 rows=2000000 loops=1)
Output: id
Buffers: shared hit=8850
Planning Time: 0.224 ms
Execution Time: 430.831 ms
(8 rows)
[local]:5432 pg12@testdb=#
不带条件全表扫描,时间相差不大,执行时长最大的是numeric类型。
添加查询条件,全表扫描
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_numeric where id = '0'::numeric;
lain (analyze,verbose,buffers) select count(*) from t_bool where id = 0::boolean;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36358.67..36358.68 rows=1 width=8) (actual time=723.356..723.357 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_numeric (cost=0.00..33850.00 rows=1003467 width=0) (actual time=0.057..610.907 rows=1000000 loops=1)
Output: id
Filter: (t_numeric.id = '0'::numeric)
Rows Removed by Filter: 1000000
Buffers: shared hit=8850
Planning Time: 1.901 ms
Execution Time: 723.449 ms
(10 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_float where id = '0'::numeric;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=38875.00..38875.01 rows=1 width=8) (actual time=827.686..827.687 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_float (cost=0.00..38850.00 rows=10000 width=0) (actual time=0.015..725.737 rows=1000000 loops=1)
Output: id
Filter: ((t_float.id)::numeric = '0'::numeric)
Rows Removed by Filter: 1000000
Buffers: shared hit=8850
Planning Time: 0.234 ms
Execution Time: 827.720 ms
(10 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_int where id = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36329.50..36329.51 rows=1 width=8) (actual time=434.067..434.067 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_int (cost=0.00..33850.00 rows=991800 width=0) (actual time=0.014..333.883 rows=1000000 loops=1)
Output: id
Filter: (t_int.id = 0)
Rows Removed by Filter: 1000000
Buffers: shared hit=8850
Planning Time: 0.295 ms
Execution Time: 434.101 ms
(10 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_smallint where id = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36354.50..36354.51 rows=1 width=8) (actual time=486.466..486.466 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_smallint (cost=0.00..33850.00 rows=1001800 width=0) (actual time=0.053..368.184 rows=1000000 loops=1)
Output: id
Filter: (t_smallint.id = 0)
Rows Removed by Filter: 1000000
Buffers: shared hit=8850
Planning Time: 1.396 ms
Execution Time: 486.554 ms
(10 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_bool where id = 0::boolean;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=31356.67..31356.68 rows=1 width=8) (actual time=416.510..416.510 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=8850
-> Seq Scan on public.t_bool (cost=0.00..28850.00 rows=1002667 width=0) (actual time=0.014..316.188 rows=1000000 loops=1)
Output: id
Filter: (NOT t_bool.id)
Rows Removed by Filter: 1000000
Buffers: shared hit=8850
Planning Time: 0.261 ms
Execution Time: 416.551 ms
(10 rows)
[local]:5432 pg12@testdb=#
存在查询条件的情况下,由于解析表达式的代价不同(bool < int < numeric < float),因此时间相差较大,时长最大的是float类型,时间接近bool类型的2倍。
创建索引,全索引扫描
禁用全表扫描,使用全索引扫描
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_numeric where id = '0'::numeric;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=35541.77..35541.78 rows=1 width=8) (actual time=594.984..594.984 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=7160
-> Index Only Scan using idx_t_numeric_id on public.t_numeric (cost=0.43..33033.10 rows=1003467 width=0) (actual time=0.269..482.525 rows=1000000 loops=1)
Output: id
Index Cond: (t_numeric.id = '0'::numeric)
Heap Fetches: 1000000
Buffers: shared hit=7160
Planning Time: 1.392 ms
Execution Time: 595.253 ms
(10 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_float where id = '0'::numeric;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=70854.43..70854.44 rows=1 width=8) (actual time=1337.093..1337.094 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=14317
-> Index Only Scan using idx_t_float_id on public.t_float (cost=0.43..70829.43 rows=10000 width=0) (actual time=0.037..1233.730 rows=1000000 loops=1)
Output: id
Filter: ((t_float.id)::numeric = '0'::numeric)
Rows Removed by Filter: 1000000
Heap Fetches: 2000000
Buffers: shared hit=14317
Planning Time: 0.293 ms
Execution Time: 1337.168 ms
(11 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_int where id = 0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=35128.43..35128.44 rows=1 width=8) (actual time=526.942..526.943 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=7160
-> Index Only Scan using idx_t_int_id on public.t_int (cost=0.43..32648.93 rows=991800 width=0) (actual time=0.035..414.797 rows=1000000 loops=1)
Output: id
Index Cond: (t_int.id = 0)
Heap Fetches: 1000000
Buffers: shared hit=7160
Planning Time: 0.245 ms
Execution Time: 526.979 ms
(10 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_smallint where id = 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=35480.43..35480.44 rows=1 width=8) (actual time=551.394..551.394 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=4428 read=2735
-> Index Only Scan using idx_t_smallint_id on public.t_smallint (cost=0.43..32975.93 rows=1001800 width=0) (actual time=0.459..438.992 rows=1000000 loops=1)
Output: id
Index Cond: (t_smallint.id = 0)
Heap Fetches: 1000000
Buffers: shared hit=4428 read=2735
Planning Time: 1.889 ms
Execution Time: 551.499 ms
(10 rows)
[local]:5432 pg12@testdb=# explain (analyze,verbose,buffers) select count(*) from t_bool where id = 0::boolean;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=35513.77..35513.78 rows=1 width=8) (actual time=497.886..497.886 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=7160
-> Index Only Scan using idx_t_bool_id on public.t_bool (cost=0.43..33007.10 rows=1002667 width=0) (actual time=0.035..393.653 rows=1000000 loops=1)
Output: id
Index Cond: (t_bool.id = false)
Heap Fetches: 1000000
Buffers: shared hit=7160
Planning Time: 0.250 ms
Execution Time: 497.922 ms
(10 rows)
[local]:5432 pg12@testdb=#
走全索引扫描,执行时长最长的仍是float类型,其他三种类型则相差不大,numeric的性能相较全表扫描有明显提升(595ms vs 723ms)。
压力测试
使用pgbench进行压力测试,numeric/float/int三种类型,各插入100w数据
drop table t_big_numeric;
create table t_big_numeric(id numeric);
insert into t_big_numeric select 0 from generate_series(1,1000000);
drop table t_big_float;
create table t_big_float(id int);
insert into t_big_float select 0 from generate_series(1,1000000);
drop table t_big_int;
create table t_big_int(id int);
insert into t_big_int select 0 from generate_series(1,1000000);
测试结果
[pg12@localhost test]$ pgbench -C -f ./select_numeric.sql --time=120 --client=8 --jobs=2 -d testdb
...
transaction type: ./select_numeric.sql
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 2
duration: 120 s
number of transactions actually processed: 1254
latency average = 768.659 ms
tps = 10.407739 (including connections establishing)
tps = 10.906626 (excluding connections establishing)
[pg12@localhost test]$
[pg12@localhost test]$ pgbench -C -f ./select_float.sql --time=120 --client=8 --jobs=2 -d testdb
...
transaction type: ./select_float.sql
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 2
duration: 120 s
number of transactions actually processed: 2167
latency average = 444.006 ms
tps = 18.017778 (including connections establishing)
tps = 19.461350 (excluding connections establishing)
[pg12@localhost test]$ cat select_float.sql
\set id random(1,1000000)
select * from t_big_float where id = :id;
[pg12@localhost test]$
[pg12@localhost test]$ pgbench -C -f ./select_int.sql --time=120 --client=8 --jobs=2 -d testdb
...
transaction type: ./select_int.sql
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 2
duration: 120 s
number of transactions actually processed: 2184
latency average = 440.271 ms
tps = 18.170626 (including connections establishing)
tps = 19.658996 (excluding connections establishing)
[pg12@localhost test]$
小结
numeric的性能相对于int和boolean性能要低,原因是它被存储为二进制编码的十进制(任意的精度值),而float则是浮点数运算,比起int/bool类型的运算都要慢且耗费更多的CPU资源。
Numeric is straight up slower than integers because if the internal operations needed for how to represent and compare it. Use it only if your need and an int won’t do it. Floats are imprecise. Never use a float to store monetary values or anything where you’re not ok with the value returned being approximately correct versus the value stored. Use it for real world sensor data and probably that’s it.
bigint is fixed at 8 bytes.
decimal is an alias to numeric
numeric is variable-width which means it may store to the same size, but it may not be as compact in memory and overflow-expansion logic is likely to be slower.
double precision is 8 bytes too, but it’s float. However, keep in mind floats do interger math up until MAX_INT. So at best, it’s not any worse, but if you don’t need it you’re better off using bigint.
参考资料
Numeric vs Integer for a column - size and performance
Data type performance