1. 程式人生 > 其它 >PostgreSQL 繫結變數窺探

PostgreSQL 繫結變數窺探

今天我們要探討的是 custom執行計劃和通用執行計劃。這一技術在 Oracle中被稱為繫結變數窺視。但 Kingbase中並沒有這樣的定義,更嚴格地說,Kingbase叫做custom執行計劃和通用執行計劃。

什麼是custom執行計劃,什麼是通用執行計劃,我們先來看一個例子,我建立了一個100011行的表,其中有兩列分別為 id、 name。在name列就2種類型的值,一種值為“aaa”,有整整100000行, 而值為bbb列的僅有11行。這就是我們常說的資料傾斜。在oracle資料庫中,配合繫結變數窺視我們常常需要收集傾斜列的直方圖。

以下測試基於版本:

KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

create table a(id numeric,name varchar(40));
insert into a select i, 'aaa' from generate_series (1,100000) i;
insert into a select i, 'bbb' from generate_series (100001,100011) i;
create index idx_a1 on a(name);
analyze a;

下一步是使用 prepare語句。利用該方法可以避免對語句反覆解析。這個功能類似oracle 的繫結變數,(一次硬解析後在library cache產生的執行計劃可為以後sql通用。避免多次硬解析,這樣找到相同的執行計劃planhash value叫做軟解析。當然還有軟軟解析,這裡略過。)

test=# prepare test_stmt as select * from a where name = $1;
PREPARE
 select * from pg_prepared_statements;

我們執行如下語句,連續6次都查詢name為'aaa'的資料。注意是6次。

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..25.862 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.217 ms
 Execution Time: 34.710 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..16.401 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.073 ms
 Execution Time: 23.340 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..30.001 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.093 ms
 Execution Time: 39.383 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..23.365 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.073 ms
 Execution Time: 32.397 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..19.287 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.099 ms
 Execution Time: 27.462 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.082..35.540 rows=100000 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.114 ms
 Execution Time: 45.546 ms
(4 rows)


由於 aaa佔用了該表的大部分資料,因此優化器選擇使用全表掃描,這是優化器的演算法決定的,這也存在合理性。在第六次的時候,請注意 Filter部分,(name)::text = 'aaa'::text變為 text=$1。此時優化器將生成通用執行計劃,並使用繫結變數。那麼之前的5次則被稱為 custom執行計劃。為什麼第六次才生成通用執行計劃?我們可以在 PostgreSQL的 plancache. c原始碼中找到說明:

The logic for choosing generic or custom plans is in choose_custom_plan

在choose_custom_plan函式裡我們可以看到/* Generate costom plans until we have done at least 5 (arbitrary)*/ if (planaource->num_custom_plans < 5) return true;

請注意,這裡的限定值小於5次,返回 true,選擇 custom執行計劃,而大於5次之後,則選擇通用執行計劃。因此,5次之後執行計劃就會固定。為什麼第六次使用通用執行計劃,執行計劃卻改為索引掃描的方式?實際上這和一個引數有關plan_cache_mode。目前檢視引數值時auto。

test=# show plan_cache_mode;
 plan_cache_mode
-----------------
 auto
(1 row)

在引數是auto的前提下,不管我執行aaa或bbb的列值,執行計劃都是一樣,執行計劃固定了。如果每次不管變數值怎麼變化,都選擇索引掃描方式,顯然這不是我們想要的。因為資料傾斜,如果執行計劃不變,那麼是不明智的,會出現低效解析行為。

如下,使用通用執行計劃後,我們關注不管索引掃描還是全表掃描,預估cost值是50006,有意思的是這個值是實際rows的一半。從第六次執行計劃開始,這個cost就沒再變過,顯然這是不合理的。當然有可能優化器認為這種演算法對於不同的掃描方式對應的Execution Time差的不是很多,所以固定執行計劃為通用執行計劃。

還有一個關鍵是使用通用執行計劃後Planning Time很小,這是否說明了”軟解析的功能呢!“生成執行計劃時間大大減少。


test=# explain (analyze) execute test_stmt ('bbb');
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.021..0.024 rows=11 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.015 ms
 Execution Time: 0.048 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('bbb');
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.020..0.022 rows=11 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.014 ms
 Execution Time: 0.041 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.032..23.592 rows=100000 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.013 ms
 Execution Time: 31.333 ms
(4 rows)

設定 plan_cache_mode=force_custom_plan

繼續測試另外一種情況,將plan_cache_mode設定為force_custom_plan。可以看到執行計劃會根據繫結變數的值的分佈進行變化,這種情況執行計劃是合理的。但是代價是每次執行都要重新解析語句,我們知道在oracle裡這叫硬解析,都聽說過一句話,硬解析是萬惡之源!對應的在Kingbase裡資料傾斜,謂詞條件經常變化,最好使用custom執行計劃。

set plan_cache_mode=force_custom_plan;


test=# explain (analyze) execute test_stmt ('bbb');
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..8.65 rows=13 width=10) (actual time=0.020..0.022 rows=11 loops=1)
   Index Cond: ((name)::text = 'bbb'::text)
 Planning Time: 0.079 ms
 Execution Time: 0.036 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99998 width=10) (actual time=0.010..16.897 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.077 ms
 Execution Time: 24.020 ms
(5 rows)

設定 plan_cache_mode=force_generic_plan

可以看到,這種情況下,執行計劃就被固定了。和最開始執行到第六次的執行計劃一樣,不管條件怎麼變化,優化器都採用了通用執行計劃。

test=# set plan_cache_mode =force_generic_plan ;
test=# explain (analyze) execute test_stmt ('bbb');
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.022..0.024 rows=11 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.016 ms
 Execution Time: 0.044 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('bbb');
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.032..0.035 rows=11 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.015 ms
 Execution Time: 0.055 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.037..23.191 rows=100000 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.016 ms
 Execution Time: 30.997 ms
(4 rows)

關閉prepare語句
deallocate all;

結論:

如果在Kingbase中使用prepare語句(類似繫結變數功能),

對於資料分佈均勻,且引數經常改變的情況適合使用這個功能。

建議對於資料傾斜的情況,將plan_cache_mode設定為force_custom_plan。或者不用這個功能。

當然在實現任何功能前還是建議進行充分測試。