postgresql insert into select无法使用并行查询的解决

本文信息基于pg13.1。

从pg9.6开始支持并行查询。pg11开始支持create table … as、select into以及create materialized view的并行查询。

先说结论:

换用create table as 或者select into或者导入导出。

首先跟踪如下查询语句的执行计划:

select count(*) from test t1,test1 t2 where t1.id = t2.id ;
postgres=# explain analyze select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                                    query plan                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------
 finalize aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=683.246..715.324 rows=1 loops=1)
  -> gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=681.474..715.311 rows=3 loops=1)
     workers planned: 2
     workers launched: 2
     -> partial aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=674.689..675.285 rows=1 loops=3)
        -> parallel hash join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=447.799..645.689 rows=333333 loops=3)
           hash cond: (t1.id = t2.id)
           -> parallel seq scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.025..74.010 rows=333333 loops=3)
           -> parallel hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=260.052..260.053 rows=333333 loops=3)
              buckets: 131072 batches: 16 memory usage: 3520kb
              -> parallel seq scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..104.804 rows=333333 loops=3)
 planning time: 0.420 ms
 execution time: 715.447 ms
(13 rows)

可以看到走了两个workers。

下边看一下insert into select:

postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;     
                                  query plan                                  
--------------------------------------------------------------------------------------------------------------------------------------------------
 insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3744.179..3744.187 rows=0 loops=1)
  -> subquery scan on "*select*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3743.343..3743.352 rows=1 loops=1)
     -> aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3743.247..3743.254 rows=1 loops=1)
        -> hash join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1092.295..3511.301 rows=1000000 loops=1)
           hash cond: (t1.id = t2.id)
           -> seq scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.030..421.537 rows=1000000 loops=1)
           -> hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1090.078..1090.081 rows=1000000 loops=1)
              buckets: 131072 batches: 16 memory usage: 3227kb
              -> seq scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.021..422.768 rows=1000000 loops=1)
 planning time: 0.511 ms
 execution time: 3745.633 ms
(11 rows)

可以看到并没有workers的指示,没有启用并行查询。

即使开启强制并行,也无法走并行查询。

postgres=# set force_parallel_mode =on;
set
postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                                  query plan                                  
--------------------------------------------------------------------------------------------------------------------------------------------------
 insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3825.042..3825.049 rows=0 loops=1)
  -> subquery scan on "*select*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3824.976..3824.984 rows=1 loops=1)
     -> aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3824.972..3824.978 rows=1 loops=1)
        -> hash join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1073.587..3599.402 rows=1000000 loops=1)
           hash cond: (t1.id = t2.id)
           -> seq scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.034..414.965 rows=1000000 loops=1)
           -> hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1072.441..1072.443 rows=1000000 loops=1)
              buckets: 131072 batches: 16 memory usage: 3227kb
              -> seq scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.022..400.624 rows=1000000 loops=1)
 planning time: 0.577 ms
 execution time: 3825.923 ms
(11 rows)

原因在官方文档有写:

the query writes any data or locks any database rows. if a query contains a data-modifying operation either at the top level or within a cte, no parallel plans for that query will be generated. as an exception, the commands create table … as, select into, and create materialized view which create a new table and populate it can use a parallel plan.

解决方案有如下三种:

1.select into

postgres=# explain analyze select count(*) into vaa from test t1,test1 t2 where t1.id = t2.id ;
                                    query plan                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------
 finalize aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=742.736..774.923 rows=1 loops=1)
  -> gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=740.223..774.907 rows=3 loops=1)
     workers planned: 2
     workers launched: 2
     -> partial aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=731.408..731.413 rows=1 loops=3)
        -> parallel hash join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=489.880..700.830 rows=333333 loops=3)
           hash cond: (t1.id = t2.id)
           -> parallel seq scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.033..87.479 rows=333333 loops=3)
           -> parallel hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=266.839..266.840 rows=333333 loops=3)
              buckets: 131072 batches: 16 memory usage: 3520kb
              -> parallel seq scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.058..106.874 rows=333333 loops=3)
 planning time: 0.319 ms
 execution time: 783.300 ms
(13 rows)

2.create table as

postgres=# explain analyze create table vb as select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                                   query plan                                    
-------------------------------------------------------------------------------------------------------------------------------------------------------
 finalize aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=540.120..563.733 rows=1 loops=1)
  -> gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=537.982..563.720 rows=3 loops=1)
     workers planned: 2
     workers launched: 2
     -> partial aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=526.602..527.136 rows=1 loops=3)
        -> parallel hash join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=334.532..502.793 rows=333333 loops=3)
           hash cond: (t1.id = t2.id)
           -> parallel seq scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.018..57.819 rows=333333 loops=3)
           -> parallel hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=189.502..189.503 rows=333333 loops=3)
              buckets: 131072 batches: 16 memory usage: 3520kb
              -> parallel seq scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..77.786 rows=333333 loops=3)
 planning time: 0.189 ms
 execution time: 565.448 ms
(13 rows)

3.或者通过导入导出的方式,例如:

psql -h localhost -d postgres -u postgres -c "select count(*) from test t1,test1 t2 where t1.id = t2.id " -o result.csv -a -t -f ","
psql -h localhost -d postgres -u postgres -c "copy va from 'result.csv' with (format csv, delimiter ',', header false, encoding 'windows-1252')"

一些场景下也会比非并行快。

到此这篇关于postgresql insert into select无法使用并行查询的解决的文章就介绍到这了,更多相关postgresql insert into select并行查询内容请搜索www.887551.com以前的文章或继续浏览下面的相关文章希望大家以后多多支持www.887551.com!

(0)
上一篇 2022年3月21日
下一篇 2022年3月21日

相关推荐