PostgreSQL 繫結變數窺探
今天我們要探討的是 custom執行計劃和通用執行計劃。這一技術在 Oracle中被稱為繫結變數窺視。但 Kingbase中並沒有這樣的定義,更嚴格地說,Kingbase叫做custom執行計劃和通用執行計劃。
什麼是custom執行計劃,什麼是通用執行計劃,我們先來看一個例子,我建立了一個100011行的表,其中有兩列分別為 id、 name。在name列就2種類型的值,一種值為“aaa”,有整整100000行, 而值為bbb列的僅有11行。這就是我們常說的資料傾斜。在oracle資料庫中,配合繫結變數窺視我們常常需要收集傾斜列的直方圖。
以下測試基於版本:
KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
create table a(id numeric,name varchar(40));
insert into a select i, 'aaa' from generate_series (1,100000) i;
insert into a select i, 'bbb' from generate_series (100001,100011) i;
create index idx_a1 on a(name);
analyze a;
下一步是使用 prepare語句。利用該方法可以避免對語句反覆解析。這個功能類似oracle 的繫結變數,(一次硬解析後在library cache產生的執行計劃可為以後sql通用。避免多次硬解析,這樣找到相同的執行計劃planhash value叫做軟解析。當然還有軟軟解析,這裡略過。)
test=# prepare test_stmt as select * from a where name = $1;
PREPARE
select * from pg_prepared_statements;
我們執行如下語句,連續6次都查詢name為'aaa'的資料。注意是6次。
test=# explain (analyze) execute test_stmt ('aaa'); QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on a (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..25.862 rows=100000 loops=1) Filter: ((name)::text = 'aaa'::text) Rows Removed by Filter: 11 Planning Time: 0.217 ms Execution Time: 34.710 ms (5 rows) test=# explain (analyze) execute test_stmt ('aaa'); QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on a (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..16.401 rows=100000 loops=1) Filter: ((name)::text = 'aaa'::text) Rows Removed by Filter: 11 Planning Time: 0.073 ms Execution Time: 23.340 ms (5 rows) test=# explain (analyze) execute test_stmt ('aaa'); QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on a (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..30.001 rows=100000 loops=1) Filter: ((name)::text = 'aaa'::text) Rows Removed by Filter: 11 Planning Time: 0.093 ms Execution Time: 39.383 ms (5 rows) test=# explain (analyze) execute test_stmt ('aaa'); QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on a (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..23.365 rows=100000 loops=1) Filter: ((name)::text = 'aaa'::text) Rows Removed by Filter: 11 Planning Time: 0.073 ms Execution Time: 32.397 ms (5 rows) test=# explain (analyze) execute test_stmt ('aaa'); QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on a (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..19.287 rows=100000 loops=1) Filter: ((name)::text = 'aaa'::text) Rows Removed by Filter: 11 Planning Time: 0.099 ms Execution Time: 27.462 ms (5 rows) test=# explain (analyze) execute test_stmt ('aaa'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Index Scan using idx_a1 on a (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.082..35.540 rows=100000 loops=1) Index Cond: ((name)::text = $1) Planning Time: 0.114 ms Execution Time: 45.546 ms (4 rows)
由於 aaa佔用了該表的大部分資料,因此優化器選擇使用全表掃描,這是優化器的演算法決定的,這也存在合理性。在第六次的時候,請注意 Filter部分,(name)::text = 'aaa'::text變為 text=$1。此時優化器將生成通用執行計劃,並使用繫結變數。那麼之前的5次則被稱為 custom執行計劃。為什麼第六次才生成通用執行計劃?我們可以在 PostgreSQL的 plancache. c原始碼中找到說明:
The logic for choosing generic or custom plans is in choose_custom_plan
在choose_custom_plan函式裡我們可以看到/* Generate costom plans until we have done at least 5 (arbitrary)*/ if (planaource->num_custom_plans < 5) return true;
請注意,這裡的限定值小於5次,返回 true,選擇 custom執行計劃,而大於5次之後,則選擇通用執行計劃。因此,5次之後執行計劃就會固定。為什麼第六次使用通用執行計劃,執行計劃卻改為索引掃描的方式?實際上這和一個引數有關plan_cache_mode。目前檢視引數值時auto。
test=# show plan_cache_mode;
plan_cache_mode
-----------------
auto
(1 row)
在引數是auto的前提下,不管我執行aaa或bbb的列值,執行計劃都是一樣,執行計劃固定了。如果每次不管變數值怎麼變化,都選擇索引掃描方式,顯然這不是我們想要的。因為資料傾斜,如果執行計劃不變,那麼是不明智的,會出現低效解析行為。
如下,使用通用執行計劃後,我們關注不管索引掃描還是全表掃描,預估cost值是50006,有意思的是這個值是實際rows的一半。從第六次執行計劃開始,這個cost就沒再變過,顯然這是不合理的。當然有可能優化器認為這種演算法對於不同的掃描方式對應的Execution Time差的不是很多,所以固定執行計劃為通用執行計劃。
還有一個關鍵是使用通用執行計劃後Planning Time很小,這是否說明了”軟解析的功能呢!“生成執行計劃時間大大減少。
test=# explain (analyze) execute test_stmt ('bbb');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Scan using idx_a1 on a (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.021..0.024 rows=11 loops=1)
Index Cond: ((name)::text = $1)
Planning Time: 0.015 ms
Execution Time: 0.048 ms
(4 rows)
test=# explain (analyze) execute test_stmt ('bbb');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Scan using idx_a1 on a (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.020..0.022 rows=11 loops=1)
Index Cond: ((name)::text = $1)
Planning Time: 0.014 ms
Execution Time: 0.041 ms
(4 rows)
test=# explain (analyze) execute test_stmt ('aaa');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_a1 on a (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.032..23.592 rows=100000 loops=1)
Index Cond: ((name)::text = $1)
Planning Time: 0.013 ms
Execution Time: 31.333 ms
(4 rows)
設定 plan_cache_mode=force_custom_plan
繼續測試另外一種情況,將plan_cache_mode設定為force_custom_plan。可以看到執行計劃會根據繫結變數的值的分佈進行變化,這種情況執行計劃是合理的。但是代價是每次執行都要重新解析語句,我們知道在oracle裡這叫硬解析,都聽說過一句話,硬解析是萬惡之源!對應的在Kingbase裡資料傾斜,謂詞條件經常變化,最好使用custom執行計劃。
set plan_cache_mode=force_custom_plan;
test=# explain (analyze) execute test_stmt ('bbb');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Index Scan using idx_a1 on a (cost=0.42..8.65 rows=13 width=10) (actual time=0.020..0.022 rows=11 loops=1)
Index Cond: ((name)::text = 'bbb'::text)
Planning Time: 0.079 ms
Execution Time: 0.036 ms
(4 rows)
test=# explain (analyze) execute test_stmt ('aaa');
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..1794.14 rows=99998 width=10) (actual time=0.010..16.897 rows=100000 loops=1)
Filter: ((name)::text = 'aaa'::text)
Rows Removed by Filter: 11
Planning Time: 0.077 ms
Execution Time: 24.020 ms
(5 rows)
設定 plan_cache_mode=force_generic_plan
可以看到,這種情況下,執行計劃就被固定了。和最開始執行到第六次的執行計劃一樣,不管條件怎麼變化,優化器都採用了通用執行計劃。
test=# set plan_cache_mode =force_generic_plan ;
test=# explain (analyze) execute test_stmt ('bbb');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Scan using idx_a1 on a (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.022..0.024 rows=11 loops=1)
Index Cond: ((name)::text = $1)
Planning Time: 0.016 ms
Execution Time: 0.044 ms
(4 rows)
test=# explain (analyze) execute test_stmt ('bbb');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Scan using idx_a1 on a (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.032..0.035 rows=11 loops=1)
Index Cond: ((name)::text = $1)
Planning Time: 0.015 ms
Execution Time: 0.055 ms
(4 rows)
test=# explain (analyze) execute test_stmt ('aaa');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_a1 on a (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.037..23.191 rows=100000 loops=1)
Index Cond: ((name)::text = $1)
Planning Time: 0.016 ms
Execution Time: 30.997 ms
(4 rows)
關閉prepare語句
deallocate all;
結論:
如果在Kingbase中使用prepare語句(類似繫結變數功能),
對於資料分佈均勻,且引數經常改變的情況適合使用這個功能。
建議對於資料傾斜的情況,將plan_cache_mode設定為force_custom_plan。或者不用這個功能。
當然在實現任何功能前還是建議進行充分測試。