1. 程式人生 > 其它 >Connect By 兩種寫法的效能差異

Connect By 兩種寫法的效能差異

對於遞迴查詢,KINGBASE使用者可以選擇使用connect by ,或者使用 with recursive 。下面,我們以例子來看下二者的差別。

一、構造資料

create table test_recursive(id integer,pid integer,name varchar,description text);
insert into test_recursive(id,name,description) select generate_series(1,100000),'a'||generate_series(1,100000),repeat('desc',500);

update test_recursive set pid=1 where id between 2 and 10;
update test_recursive set pid=mod(id,9)+2 where id between 11 and 100;
update test_recursive set pid=mod(id,90)+11 where id between 101 and 1000;
update test_recursive set pid=mod(id,900)+101 where id between 1001 and 10000;
update test_recursive set pid=mod(id,9000)+1001 where id between 10001 and 100000;

create table test_recursive_random(id integer,pid integer,name varchar,description text);
insert into test_recursive_random select * from test_recursive order by random;

create index ind_test_recursive_random_id on test_recursive_random(id);
create index ind_test_recursive_random_pid on test_recursive_random(pid);
vacuum full test_recursive_random;
analyze test_recursive_random;

create index ind_test_recursive_id on test_recursive(id);
create index ind_test_recursive_pid on test_recursive(pid);
vacuum full test_recursive;
analyze test_recursive;

本例子構造了5層的資料,有排序與非排序兩種資料。

二、使用connect by

connect by的查詢效能:用時 746ms

test=# explain analyze select id,pid,name from test_recursive start with id=1 connect by prior id = pid ;
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Recursive Union  (cost=0.29..422.37 rows=101 width=14) (actual time=0.038..728.281 rows=100000 loops=1)
   ->  Index Scan using ind_test_recursive_id on test_recursive  (cost=0.29..8.31 rows=1 width=14) (actual time=0.015..0.017 rows=1 loops=1)
         Index Cond: (id = 1)
   ->  Nested Loop  (cost=0.42..41.30 rows=10 width=14) (actual time=0.002..0.003 rows=1 loops=100000)
         ->  WorkTable Scan on "connect"  (cost=0.00..0.02 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=100000)
         ->  Index Scan using ind_test_recursive_pid on test_recursive  (cost=0.42..41.18 rows=10 width=14) (actual time=0.002..0.002 rows=1 loops=100000)
               Index Cond: (pid = (PRIOR test_recursive.id))
 Planning Time: 0.185 ms
 Execution Time: 746.102 ms
(9 rows)

  

三、Kingbase with recursive 查詢

1、排序資料:用時302ms

explain analyze with recursive tmp1 as (
select id,pid,name from test_recursive where id=1
union all
select a.id,a.pid,a.name from test_recursive a inner join tmp1 b on a.pid=b.id )
select * from tmp1;
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on tmp1  (cost=4013.94..4033.96 rows=1001 width=40) (actual time=0.020..297.856 rows=100000 loops=1)
   CTE tmp1
     ->  Recursive Union  (cost=0.29..4013.94 rows=1001 width=14) (actual time=0.018..257.298 rows=100000 loops=1)
           ->  Index Scan using ind_test_recursive_id on test_recursive  (cost=0.29..8.31 rows=1 width=14) (actual time=0.016..0.018 rows=1 loops=1)
                 Index Cond: (id = 1)
           ->  Nested Loop  (cost=0.42..398.56 rows=100 width=14) (actual time=20.529..38.777 rows=16666 loops=6)
                 ->  WorkTable Scan on tmp1 b  (cost=0.00..0.20 rows=10 width=4) (actual time=0.003..2.150 rows=16667 loops=6)
                 ->  Index Scan using ind_test_recursive_pid on test_recursive a  (cost=0.42..39.74 rows=10 width=14) (actual time=0.001..0.002 rows=1 loops=100000)
                       Index Cond: (pid = b.id)
 Planning Time: 0.207 ms
 Execution Time: 302.244 ms
(11 rows)

2、非排序資料:440ms

test=# explain analyze with recursive tmp1 as (
test(# select id,pid,name from test_recursive_random where id=1
test(# union all
test(# select a.id,a.pid,a.name from test_recursive_random a inner join tmp1 b on a.pid=b.id )
test-# select * from tmp1;
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on tmp1  (cost=4206.87..4226.89 rows=1001 width=40) (actual time=0.020..434.721 rows=100000 loops=1)
   CTE tmp1
     ->  Recursive Union  (cost=0.29..4206.87 rows=1001 width=14) (actual time=0.018..397.456 rows=100000 loops=1)
           ->  Index Scan using ind_test_recursive_random_id on test_recursive_random  (cost=0.29..8.31 rows=1 width=14) (actual time=0.017..0.018 rows=1 loops=1)
                 Index Cond: (id = 1)
           ->  Nested Loop  (cost=4.50..417.85 rows=100 width=14) (actual time=33.080..62.311 rows=16666 loops=6)
                 ->  WorkTable Scan on tmp1 b  (cost=0.00..0.20 rows=10 width=4) (actual time=0.007..2.412 rows=16667 loops=6)
                 ->  Bitmap Heap Scan on test_recursive_random a  (cost=4.50..41.67 rows=10 width=14) (actual time=0.002..0.003 rows=1 loops=100000)
                       Recheck Cond: (pid = b.id)
                       Heap Blocks: exact=99557
                       ->  Bitmap Index Scan on ind_test_recursive_random_pid  (cost=0.00..4.49 rows=10 width=0) (actual time=0.001..0.001 rows=1 loops=100000)
                             Index Cond: (pid = b.id)
 Planning Time: 0.304 ms
 Execution Time: 439.563 ms
(14 rows)

3、使用hash join:260ms

test=# set enable_nestloop=off;
SET
test=# explain analyze with recursive tmp1 as (
test(# select id,pid,name from test_recursive where id=1
test(# union all
test(# select a.id,a.pid,a.name from test_recursive a inner join tmp1 b on a.pid=b.id )
test-# select * from tmp1;
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on tmp1  (cost=24101.58..24121.60 rows=1001 width=40) (actual time=0.018..255.766 rows=100000 loops=1)
   CTE tmp1
     ->  Recursive Union  (cost=0.29..24101.58 rows=1001 width=14) (actual time=0.016..218.427 rows=100000 loops=1)
           ->  Index Scan using ind_test_recursive_id on test_recursive  (cost=0.29..8.31 rows=1 width=14) (actual time=0.015..0.017 rows=1 loops=1)
                 Index Cond: (id = 1)
           ->  Hash Join  (cost=0.33..2407.32 rows=100 width=14) (actual time=13.828..32.571 rows=16666 loops=6)
                 Hash Cond: (a.pid = b.id)
                 ->  Seq Scan on test_recursive a  (cost=0.00..2031.00 rows=100000 width=14) (actual time=0.005..8.240 rows=100000 loops=6)
                 ->  Hash  (cost=0.20..0.20 rows=10 width=4) (actual time=5.114..5.114 rows=16667 loops=6)
                       Buckets: 131072 (originally 1024)  Batches: 2 (originally 1)  Memory Usage: 3073kB
                       ->  WorkTable Scan on tmp1 b  (cost=0.00..0.20 rows=10 width=4) (actual time=0.004..2.068 rows=16667 loops=6)
 Planning Time: 0.196 ms
 Execution Time: 260.360 ms
(13 rows)

四、執行計劃差異分析

  • connect by 查詢執行邏輯:查詢是通過 pid = prior id ,也就是將前條記錄的 id 作為值,傳給 pid 進行索引掃描。邏輯上可以看做是逐個分支查詢,上個分支查詢結束,再進行下個分支掃描。loop = 100000,就是表示針對每條記錄,都要訪問一次索引。
  • with recursive 查詢邏輯:是按層次查詢,上層結果都返回後,再執行下層查詢。每層可以根據所有ctid進行排序,也就是Bitmap Index Scan,將所有ctid都返回,排序,再訪問表,效率提高。另外,由於是每層資料返回後,再去關聯查詢下層資料,可以使用hash join,提升訪問效率。 rows=16666 loop = 6,表示需要訪問6個批次,每次平均 16666 條記錄。

五、Oracle connect by 查詢效能

以下是同樣資料量的情況下,Oracle connect by 查詢的效能:

SQL> select id,pid,name from test_recursive start with id=1 connect by prior id = pid ;

100000 rows selected.

Elapsed: 00:00:00.98

Execution Plan
----------------------------------------------------------
Plan hash value: 2099392185

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |    12 |   384 |    18  (12)| 00:00:01 |
|*  1 |  CONNECT BY WITH FILTERING            |                        |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED | TEST_RECURSIVE         |     1 |    32 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                   | IND_TEST_RECURSIVE_ID  |     1 |       |     1   (0)| 00:00:01 |
|   4 |   NESTED LOOPS                        |                        |    11 |   495 |    14   (0)| 00:00:01 |
|   5 |    CONNECT BY PUMP                    |                        |       |       |            |          |
|   6 |    TABLE ACCESS BY INDEX ROWID BATCHED| TEST_RECURSIVE         |    11 |   352 |    12   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN                  | IND_TEST_RECURSIVE_PID |    11 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PID"=PRIOR "ID")
   3 - access("ID"=1)
   7 - access("connect$_by$_pump$_002"."prior id "="PID")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     101983  consistent gets
          0  physical reads
          0  redo size
    2337649  bytes sent via SQL*Net to client
      73769  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          8  sorts (memory)
          0  sorts (disk)
     100000  rows processed