性能突破极限!在 openEuler22上使用 pg_duckdb + PostgreSQL 17 执行速度提升不止1000倍

pg_duckdb简介

https://github.com/duckdb/pg_duckdb

https://docs.pgduckdb.com/quickstart

https://hub.docker.com/r/pgduckdb/pgduckdb

pg_duckdb 是一个扩展模块,旨在将 PostgreSQL 和 DuckDB 结合使用,提供了一种在 PostgreSQL 中访问 DuckDB 功能的途径。DuckDB 是一种面向分析工作负载优化的嵌入式列存储数据库,其特点是高性能、开箱即用且支持多种分析功能。

pg_duckdb 是一种强大的扩展,特别适合数据分析、外部数据集成以及需要高性能查询的场景。通过它,用户可以在 PostgreSQL 的生态中轻松引入 DuckDB 的能力,进一步扩展数据库的应用范围。

核心功能

  • 嵌入式集成:pg_duckdb 在 PostgreSQL 中嵌入了 DuckDB 的功能,使用户可以利用 DuckDB 的列存分析引擎处理复杂的分析查询。

  • 无缝访问:通过 pg_duckdb,用户可以在 PostgreSQL 中直接查询 DuckDB 表,而无需离开 PostgreSQL 环境。

  • 高性能分析:DuckDB 优化了分析场景中的列式存储和向量化计算,pg_duckdb 可以充分利用这些特性。

  • 多格式支持:DuckDB 支持查询多种外部数据格式(如 Parquet、CSV 等),通过 pg_duckdb,用户可以直接从这些格式中读取数据。

优势

  • 提升查询性能:在分析场景中,利用 DuckDB 的列存优势,可以显著提升查询性能。

  • 节省存储资源:DuckDB 的列存设计在处理高基数列时具有更高的压缩效率。

  • 数据格式兼容性:支持直接加载和查询 Parquet、Arrow 等流行的数据格式,无需额外的转换。

使用场景

  • 数据分析和聚合: pg_duckdb 适用于处理大量的分析查询,特别是涉及复杂聚合和多表联接的场景。

  • 外部文件查询: 通过 pg_duckdb 可以直接查询 CSV、Parquet 等文件,而不需要将文件数据加载到 PostgreSQL 中。

  • 加速 BI 工具查询: 对于与 BI 工具集成的 PostgreSQL 数据库,pg_duckdb 可以显著提高查询响应速度。

示例查询

1-- Remote Parquet scans:
2SELECT count(*) FROM read_parquet('https://shell.duckdb.org/data/tpch/0_01/parquet/orders.parquet'AS (o_orderkey int);
3SELECT avg(c_acctbal) FROM read_parquet('https://shell.duckdb.org/data/tpch/0_01/parquet/customer.parquet'AS (c_acctbal float);
4SELECT count(*)::int as aws_service_cnt FROM read_parquet('https://raw.githubusercontent.com/tobilg/aws-iam-data/main/data/parquet/aws_services.parquet'AS (service_id int);
5SELECT code, city, state FROM read_parquet('https://raw.githubusercontent.com/tobilg/aws-edge-locations/main/data/aws-edge-locations.parquet'AS (code text, city text, state text);
6SELECT cloud_provider, sum(ip_address_cnt)::int as cnt FROM read_parquet('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.parquet'AS (cloud_provider text, ip_address_cnt intGROUP BY cloud_provider;
7
8-- Remote CSV scan
9SELECT * FROM read_csv('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.csv'AS (cloud_provider text, cidr_block text, ip_address textlimit 30;

pg_duckdb安装

docker安装使用

 1docker pull pgduckdb/pgduckdb:15-main
2docker pull pgduckdb/pgduckdb:16-main
3docker pull pgduckdb/pgduckdb:17-main
4
5docker run -d --name pgduckdb17 -h pgduckdb15 -p 5632:5432  -e POSTGRES_PASSWORD=lhr pgduckdb/pgduckdb:15-main
6docker run -d --name pgduckdb17 -h pgduckdb16 -p 5633:5432  -e POSTGRES_PASSWORD=lhr pgduckdb/pgduckdb:16-main
7docker run -d --name pgduckdb17 -h pgduckdb17 -p 5634:5432  -e POSTGRES_PASSWORD=lhr pgduckdb/pgduckdb:17-main
8
9
10psql postgres://postgres:lhr@127.0.0.1:5634/postgres
11
12
13# Or if using docker compose
14git clone https://github.com/duckdb/pg_duckdb && cd pg_duckdb && docker compose up -d
15
16docker compose exec db psql

编译安装

To build pg_duckdb, you need:

  • Postgres 15-17

  • Ubuntu 22.04-24.04 or MacOS

  • Standard set of build tools for building Postgres extensions

  • Build tools that are required to build DuckDB :

 1sudo yum install -y git g++ cmake ninja-build openssl-devel
2
3--  CMake 3.5...3.29 or higher is required
4
5git clone https://github.com/duckdb/pg_duckdb
6cd pg_duckdb
7make install -j 16
8
9
10-- 主要文件
11/pg17/pg17/lib/postgresql/pg_duckdb.so
12/pg17/pg17/lib/postgresql/libduckdb.so
13/pg17/pg17/share/postgresql/extension/pg_duckdb--0.1.0--0.2.0.sql
14/pg17/pg17/share/postgresql/extension/pg_duckdb--0.1.0.sql
15/pg17/pg17/share/postgresql/extension/pg_duckdb.control
16
17
18alter system set shared_preload_libraries = 'pg_duckdb';
19CREATE EXTENSION pg_duckdb;
20SET duckdb.force_execution = true;
21
22
23-- Ubuntu 安装
24curl -sfL https://install.pgx.sh | sh -
25pgxman install pg_duckdb
26shared_preload_libraries = 'pg_duckdb'
27CREATE EXTENSION pg_duckdb;
28SET duckdb.force_execution = true;

大表测试性能

 11. 安装duckdb、生成数据、导出数据
2wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-amd64.zip
3unzip duckdb_cli-linux-amd64.zip
4./duckdb
5install tpcds;
6load tpcds;
7-- 生成规模因子为1的测试数据,即1GB左右数据
8call dsdgen(sf=1);
9-- 导出数据为csv格式,并且用“|”分隔
10export database 'public' (format csv, delimiter '|');
11.quit
12
13
142. 导入数据到PG数据库中
15sed -i 's/COPY/\\copy/' public/load.sql
16createdb testduckdb
17psql -d testduckdb -f public/schema.sql
18psql -d testduckdb -f public/load.sql
19
20
21
22testduckdb=# \dt+
23                                              List of relations
24 Schema |          Name          | Type  |  Owner   | Persistence | Access method |    Size    | Description 
25--------+------------------------+-------+----------+-------------+---------------+------------+-------------
26 public | call_center            | table | postgres | permanent   | heap          | 16 kB      | 
27 public | catalog_page           | table | postgres | permanent   | heap          | 1976 kB    | 
28 public | catalog_returns        | table | postgres | permanent   | heap          | 23 MB      | 
29 public | catalog_sales          | table | postgres | permanent   | heap          | 293 MB     | 
30 public | customer               | table | postgres | permanent   | heap          | 15 MB      | 
31 public | customer_address       | table | postgres | permanent   | heap          | 7400 kB    | 
32 public | customer_demographics  | table | postgres | permanent   | heap          | 139 MB     | 
33 public | date_dim               | table | postgres | permanent   | heap          | 11 MB      | 
34 public | household_demographics | table | postgres | permanent   | heap          | 440 kB     | 
35 public | income_band            | table | postgres | permanent   | heap          | 8192 bytes | 
36 public | inventory              | table | postgres | permanent   | heap          | 496 MB     | 
37 public | item                   | table | postgres | permanent   | heap          | 5800 kB    | 
38 public | promotion              | table | postgres | permanent   | heap          | 80 kB      | 
39 public | reason                 | table | postgres | permanent   | heap          | 16 kB      | 
40 public | ship_mode              | table | postgres | permanent   | heap          | 16 kB      | 
41 public | store                  | table | postgres | permanent   | heap          | 16 kB      | 
42 public | store_returns          | table | postgres | permanent   | heap          | 38 MB      | 
43 public | store_sales            | table | postgres | permanent   | heap          | 405 MB     | 
44 public | time_dim               | table | postgres | permanent   | heap          | 8424 kB    | 
45 public | warehouse              | table | postgres | permanent   | heap          | 16 kB      | 
46 public | web_page               | table | postgres | permanent   | heap          | 16 kB      | 
47 public | web_returns            | table | postgres | permanent   | heap          | 11 MB      | 
48 public | web_sales              | table | postgres | permanent   | heap          | 147 MB     | 
49 public | web_site               | table | postgres | permanent   | heap          | 48 kB      | 
50(24 rows)

做查询操作,测试性能:

不使用pg_duckdb插件的情况下:

 1[pg17@lhrpgalloe soft]$ psql -d testduckdb
2psql (17.1)
3Type "help" for help.
4
5testduckdb=# \timing on
6Timing is on.
7testduckdb=# \o /dev/null
8testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/01.sql
9Time: 145934.025 ms (02:25.934)
10testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/02.sql
11Time: 494.014 ms
12testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/03.sql
13Time: 134.083 ms
14testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/04.sql
15
16
17^CCancel request sent
18psql:pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/04.sql:119: ERROR:  canceling statement due to user request
19Time: 2062108.995 ms (34:22.109)
20
21-- ... 00:25:15 未出结果
22testduckdb=# 
23testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/05.sql
24Time: 575.717 ms
25testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/06.sql
26Time: 50091.984 ms (00:50.092)
27testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/07.sql
28Time: 372.611 ms
29testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/08.sql
30Time: 199.187 ms
31testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/09.sql
32Time: 1467.471 ms (00:01.467)
33testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/10.sql
34Time: 5629.187 ms (00:05.629)
35testduckdb=# 
36
37
38testduckdb=# explain WITH customer_total_return AS
39testduckdb-#   (SELECT sr_customer_sk AS ctr_customer_sk,
40testduckdb(#           sr_store_sk AS ctr_store_sk,
41testduckdb(#           sum(sr_return_amt) AS ctr_total_return
42testduckdb(#    FROM store_returns,
43testduckdb(#         date_dim
44testduckdb(#    WHERE sr_returned_date_sk = d_date_sk
45testduckdb(#      AND d_year = 2000
46testduckdb(#    GROUP BY sr_customer_sk,
47testduckdb(#             sr_store_sk)
48testduckdb-# SELECT c_customer_id
49testduckdb-# FROM customer_total_return ctr1,
50testduckdb-#      store,
51testduckdb-#      customer
52testduckdb-# WHERE ctr1.ctr_total_return >
53testduckdb-#     (SELECT avg(ctr_total_return)*1.2
54testduckdb(#      FROM customer_total_return ctr2
55testduckdb(#      WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
56testduckdb-#   AND s_store_sk = ctr1.ctr_store_sk
57testduckdb-#   AND s_state = 'TN'
58testduckdb-#   AND ctr1.ctr_customer_sk = c_customer_sk
59testduckdb-# ORDER BY c_customer_id
60testduckdb-# LIMIT 100;
61                                                       QUERY PLAN                                                       
62------------------------------------------------------------------------------------------------------------------------
63 Limit  (cost=55885.77..55885.78 rows=2 width=17)
64   CTE customer_total_return
65     ->  Finalize GroupAggregate  (cost=9458.80..9633.24 rows=1381 width=40)
66           Group Key: store_returns.sr_customer_sk, store_returns.sr_store_sk
67           ->  Gather Merge  (cost=9458.80..9604.48 rows=1150 width=40)
68                 Workers Planned: 2
69                 ->  Partial GroupAggregate  (cost=8458.78..8471.71 rows=575 width=40)
70                       Group Key: store_returns.sr_customer_sk, store_returns.sr_store_sk
71                       ->  Sort  (cost=8458.78..8460.21 rows=575 width=14)
72                             Sort Key: store_returns.sr_customer_sk, store_returns.sr_store_sk
73                             ->  Parallel Hash Join  (cost=1963.80..8432.42 rows=575 width=14)
74                                   Hash Cond: (store_returns.sr_returned_date_sk = date_dim.d_date_sk)
75                                   ->  Parallel Seq Scan on store_returns  (cost=0.00..6015.45 rows=119945 width=18)
76                                   ->  Parallel Hash  (cost=1961.12..1961.12 rows=214 width=4)
77                                         ->  Parallel Seq Scan on date_dim  (cost=0.00..1961.12 rows=214 width=4)
78                                               Filter: (d_year = 2000)
79   ->  Sort  (cost=46252.54..46252.54 rows=2 width=17)
80         Sort Key: customer.c_customer_id
81         ->  Hash Join  (cost=43005.51..46252.53 rows=2 width=17)
82               Hash Cond: (customer.c_customer_sk = ctr1.ctr_customer_sk)
83               ->  Seq Scan on customer  (cost=0.00..2872.00 rows=100000 width=21)
84               ->  Hash  (cost=43005.48..43005.48 rows=2 width=4)
85                     ->  Nested Loop  (cost=0.00..43005.48 rows=2 width=4)
86                           Join Filter: (ctr1.ctr_store_sk = store.s_store_sk)
87                           ->  CTE Scan on customer_total_return ctr1  (cost=0.00..42987.08 rows=460 width=8)
88                                 Filter: (ctr_total_return > (SubPlan 2))
89                                 SubPlan 2
90                                   ->  Aggregate  (cost=31.09..31.11 rows=1 width=32)
91                                         ->  CTE Scan on customer_total_return ctr2  (cost=0.00..31.07 rows=7 width=32)
92                                               Filter: (ctr1.ctr_store_sk = ctr_store_sk)
93                           ->  Materialize  (cost=0.00..11.51 rows=1 width=4)
94                                 ->  Seq Scan on store  (cost=0.00..11.50 rows=1 width=4)
95                                       Filter: ((s_state)::text = 'TN'::text)
96(33 rows)
97
98Time: 26.596 ms
99

使用pg_duckdb插件的情况下:

  1[pg17@lhrpgalloe soft]$ psql -d testduckdb 
2psql (17.1)
3Type "help" for help.
4
5testduckdb=# \timing on
6Timing is on.
7testduckdb=#  \o /dev/null
8testduckdb=# SET duckdb.force_execution = true;
9Time: 0.922 ms
10testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/01.sql
11Time: 217.649 ms
12testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/02.sql
13Time: 824.615 ms
14testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/03.sql
15Time: 808.953 ms
16testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/04.sql
17Time: 2618.012 ms (00:02.618)
18testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/05.sql
19Time: 1918.740 ms (00:01.919)
20testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/06.sql
21Time: 450.428 ms
22testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/07.sql
23Time: 840.963 ms
24testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/07.sql
25Time: 760.951 ms
26testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/08.sql
27Time: 428.280 ms
28testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/09.sql
29Time: 9708.039 ms (00:09.708)
30testduckdb=# \i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/10.sql
31Time: 1495.543 ms (00:01.496)
32testduckdb=# 
33
34testduckdb=# explain WITH customer_total_return AS
35testduckdb-#   (SELECT sr_customer_sk AS ctr_customer_sk,
36testduckdb(#           sr_store_sk AS ctr_store_sk,
37testduckdb(#           sum(sr_return_amt) AS ctr_total_return
38testduckdb(#    FROM store_returns,
39testduckdb(#         date_dim
40testduckdb(#    WHERE sr_returned_date_sk = d_date_sk
41testduckdb(#      AND d_year = 2000
42testduckdb(#    GROUP BY sr_customer_sk,
43testduckdb(#             sr_store_sk)
44testduckdb-# SELECT c_customer_id
45testduckdb-# FROM customer_total_return ctr1,
46testduckdb-#      store,
47testduckdb-#      customer
48testduckdb-# WHERE ctr1.ctr_total_return >
49testduckdb-#     (SELECT avg(ctr_total_return)*1.2
50testduckdb(#      FROM customer_total_return ctr2
51testduckdb(#      WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
52testduckdb-#   AND s_store_sk = ctr1.ctr_store_sk
53testduckdb-#   AND s_state = 'TN'
54testduckdb-#   AND ctr1.ctr_customer_sk = c_customer_sk
55testduckdb-# ORDER BY c_customer_id
56testduckdb-# LIMIT 100;
57                                                                                                                QUERY PLAN                                                                                                                
58------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
59 Custom Scan (DuckDBScan)  (cost=0.00..0.00 rows=0 width=0)
60   DuckDB Execution Plan: 
61
62 ┌───────────────────────────┐
63 │           TOP_N           │
64 │    ────────────────────   │
65 │          Top: 100         │
66 │                           │
67 │         Order By:         │
68 │ customer.c_customer_id ASC│
69 └─────────────┬─────────────┘
70 ┌─────────────┴─────────────┐
71 │         PROJECTION        │
72 │    ────────────────────   │
73 │       c_customer_id       │
74 │                           │
75 │        ~28785 Rows        │
76 └─────────────┬─────────────┘
77 ┌─────────────┴─────────────┐
78 │            CTE            │
79 │    ────────────────────   │
80 │         CTE Name:         │
81 │   customer_total_return   ├───────────────────────────────────────────┐
82 │                           │                                           │
83 │       Table Index: 0      │                                           │
84 └─────────────┬─────────────┘                                           │
85 ┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
86 │       HASH_GROUP_BY       │                             │           FILTER          │
87 │    ────────────────────   │                             │    ────────────────────   │
88 │          Groups:          │                             │ (CAST(ctr_total_return AS │
89 │             #0            │                             │     DOUBLE) > SUBQUERY)   │
90 │             #1            │                             │                           │
91 │                           │                             │                           │
92 │    Aggregates: sum(#2)    │                             │                           │
93 │                           │                             │                           │
94 │        ~28785 Rows        │                             │         ~5757 Rows        │
95 └─────────────┬─────────────┘                             └─────────────┬─────────────┘
96 ┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
97 │         PROJECTION        │                             │      RIGHT_DELIM_JOIN     │
98 │    ────────────────────   │                             │    ────────────────────   │
99 │       sr_customer_sk      │                             │      Join Type: RIGHT     │
100 │        sr_store_sk        │                             │                           │
101 │       sr_return_amt       │                             │        Conditions:        ├────────────────────────────────────────────────────────────────────────┐
102 │                           │                             │    ctr_store_sk IS NOT    │                                                                        │
103 │                           │                             │ DISTINCT FROM ctr_store_sk│                                                                        │
104 │                           │                             │                           │                                                                        │
105 │        ~57570 Rows        │                             │         ~5757 Rows        │                                                                        │
106 └─────────────┬─────────────┘                             └─────────────┬─────────────┘                                                                        │
107 ┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐                                                          ┌─────────────┴─────────────┐
108 │         HASH_JOIN         │                             │         HASH_JOIN         │                                                          │         HASH_JOIN         │
109 │    ────────────────────   │                             │    ────────────────────   │                                                          │    ────────────────────   │
110 │      Join Type: INNER     │                             │      Join Type: INNER     │                                                          │      Join Type: RIGHT     │
111 │                           │                             │                           │                                                          │                           │
112 │        Conditions:        ├──────────────┐              │        Conditions:        ├──────────────┐                                           │        Conditions:        ├───────────────────────────────────────────┐
113 │   sr_returned_date_sk =   │              │              │      c_customer_sk =      │              │                                           │    ctr_store_sk IS NOT    │                                           │
114 │          d_date_sk        │              │              │       ctr_customer_sk     │              │                                           │ DISTINCT FROM ctr_store_sk│                                           │
115 │                           │              │              │                           │              │                                           │                           │                                           │
116 │        ~57570 Rows        │              │              │         ~5757 Rows        │              │                                           │         ~5757 Rows        │                                           │
117 └─────────────┬─────────────┘              │              └─────────────┬─────────────┘              │                                           └─────────────┬─────────────┘                                           │
118 ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
119 │     POSTGRES_SEQ_SCAN     ││     POSTGRES_SEQ_SCAN     ││     POSTGRES_SEQ_SCAN     ││         HASH_JOIN         │                             │         PROJECTION        │                             │         DUMMY_SCAN        │
120 │    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   │                             │    ────────────────────   │                             │                           │
121 │         Function:         ││         Function:         ││         Function:         ││      Join Type: INNER     │                             │ (avg(ctr_total_return) * 1│                             │                           │
122 │     POSTGRES_SEQ_SCAN     ││     POSTGRES_SEQ_SCAN     ││     POSTGRES_SEQ_SCAN     ││                           │                             │            .2)            │                             │                           │
123 │                           ││                           ││                           ││        Conditions:        │                             │        ctr_store_sk       │                             │                           │
124 │        Projections:       ││        Projections:       ││                           ││ ctr_store_sk = s_store_sk │                             │                           │                             │                           │
125 │    sr_returned_date_sk    ││         d_date_sk         ││                           ││                           ├──────────────┐              │                           │                             │                           │
126 │       sr_customer_sk      ││                           ││                           ││                           │              │              │                           │                             │                           │
127 │        sr_store_sk        ││          Filters:         ││                           ││                           │              │              │                           │                             │                           │
128 │       sr_return_amt       ││ d_year=2000 AND d_year IS ││                           ││                           │              │              │                           │                             │                           │
129 │                           ││          NOT NULL         ││                           ││                           │              │              │                           │                             │                           │
130 │                           ││                           ││                           ││                           │              │              │                           │                             │                           │
131 │        ~287867 Rows       ││        ~14609 Rows        ││        ~100000 Rows       ││         ~5757 Rows        │              │              │         ~143 Rows         │                             │                           │
132 └───────────────────────────┘└───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘              │              └─────────────┬─────────────┘                             └───────────────────────────┘
133                                                                                        ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
134                                                                                        │          CTE_SCAN         ││     POSTGRES_SEQ_SCAN     ││       HASH_GROUP_BY       │
135                                                                                        │    ────────────────────   ││    ────────────────────   ││    ────────────────────   │
136                                                                                        │        CTE Index: 0       ││         Function:         ││         Groups: #0        │
137                                                                                        │                           ││     POSTGRES_SEQ_SCAN     ││    Aggregates: avg(#1)    │
138                                                                                        │                           ││                           ││                           │
139                                                                                        │                           ││          Filters:         ││                           │
140                                                                                        │                           ││  s_state='TN' AND s_state ││                           │
141                                                                                        │                           ││         IS NOT NULL       ││                           │
142                                                                                        │                           ││                           ││                           │
143                                                                                        │        ~28785 Rows        ││          ~24 Rows         ││         ~143 Rows         │
144                                                                                        └───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘
145                                                                                                                                                  ┌─────────────┴─────────────┐
146                                                                                                                                                  │         PROJECTION        │
147                                                                                                                                                  │    ────────────────────   │
148                                                                                                                                                  │        ctr_store_sk       │
149                                                                                                                                                  │      ctr_total_return     │
150                                                                                                                                                  │                           │
151                                                                                                                                                  │         ~287 Rows         │
152                                                                                                                                                  └─────────────┬─────────────┘
153                                                                                                                                                  ┌─────────────┴─────────────┐
154                                                                                                                                                  │         HASH_JOIN         │
155                                                                                                                                                  │    ────────────────────   │
156                                                                                                                                                  │      Join Type: INNER     │
157                                                                                                                                                  │                           │
158                                                                                                                                                  │        Conditions:        ├──────────────┐
159                                                                                                                                                  │ctr_store_sk = ctr_store_sk│              │
160                                                                                                                                                  │                           │              │
161                                                                                                                                                  │         ~287 Rows         │              │
162                                                                                                                                                  └─────────────┬─────────────┘              │
163                                                                                                                                                  ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
164                                                                                                                                                  │          CTE_SCAN         ││         DELIM_SCAN        │
165                                                                                                                                                  │    ────────────────────   ││    ────────────────────   │
166                                                                                                                                                  │        CTE Index: 0       ││       Delim Index: 1      │
167                                                                                                                                                  │                           ││                           │
168                                                                                                                                                  │        ~28785 Rows        ││         ~2878 Rows        │
169                                                                                                                                                  └───────────────────────────┘└───────────────────────────┘
170
171
172(113 rows)
173

测试结果

震惊,在未使用pg_duckdb插件的情况下,sql1的时间为145秒,使用了pg_duckdb插件后变为了0.2秒;sql4的时间为33分钟+  未出结果秒,使用了pg_duckdb插件后变为了2.6秒。

脚本 未使用pg_duckdb 使用了pg_duckdb
01.sql 145934.025 ms 217.649 ms
02sql 494.014 ms 824.615 ms
03.sql 134.083 ms 808.953 ms
04.sql 33分钟+  未出结果 2618.012 ms
05.sql 575.717 ms 1918.740 ms
06.sql 50091.984 ms 450.428 ms
07.sql 372.611 ms 760.951 ms
08.sql 199.187 ms 428.280 ms
09.sql 1467.471 ms 9708.039 ms
10.sql 5629.187 ms 1495.543 ms

04.sql未跑出结果:

总结

1、目前仅支持Postgres 15-17、Ubuntu 22.04-24.04 or MacOS,或使用docker安装使用较快

2、在openEuler 22.03也可以编译成功(本文使用openEuler 22.03)

3、pg_duckdb编译需要CMake 3.5…3.29 or higher (cmake --version)

4、可以在会话级别配置当前SQL是否使用pg_duckdb插件:

 1\timing on
2\o  /dev/null
3
4
5SET duckdb.force_execution = 0;
6\i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/22.sql
7
8
9SET duckdb.force_execution = 1;
10\i ./pg_duckdb/third_party/duckdb/extension/tpcds/dsdgen/queries/22.sql

5、在实际使用场景中,可以优先考虑PG的索引只扫描+表并行;若性能依然差,则可以考虑安装使用pg_duckdb

参考

https://mp.weixin.qq.com/s/ljT4Ubt8eEgNaV84Jfv9Dg

https://pigsty.io/zh/blog/pg/pg-duckdb/

https://pgxman.com/x/pg_duckdb

https://cloud.tencent.com/developer/article/2439084



请使用浏览器的分享功能分享到微信等