PostgreSQL 繫結變數窺探

阿新 • • 發佈：2022-05-20

今天我們要探討的是 custom執行計劃和通用執行計劃。這一技術在 Oracle中被稱為繫結變數窺視。但 Kingbase中並沒有這樣的定義，更嚴格地說，Kingbase叫做custom執行計劃和通用執行計劃。

什麼是custom執行計劃，什麼是通用執行計劃，我們先來看一個例子，我建立了一個100011行的表，其中有兩列分別為 id、 name。在name列就2種類型的值，一種值為“aaa”，有整整100000行, 而值為bbb列的僅有11行。這就是我們常說的資料傾斜。在oracle資料庫中，配合繫結變數窺視我們常常需要收集傾斜列的直方圖。

以下測試基於版本：

KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

create table a(id numeric,name varchar(40));
insert into a select i, 'aaa' from generate_series (1,100000) i;
insert into a select i, 'bbb' from generate_series (100001,100011) i;
create index idx_a1 on a(name);
analyze a;

下一步是使用 prepare語句。利用該方法可以避免對語句反覆解析。這個功能類似oracle 的繫結變數，（一次硬解析後在library cache產生的執行計劃可為以後sql通用。避免多次硬解析，這樣找到相同的執行計劃planhash value叫做軟解析。當然還有軟軟解析，這裡略過。）

test=# prepare test_stmt as select * from a where name = $1;
PREPARE
 select * from pg_prepared_statements;

我們執行如下語句，連續6次都查詢name為'aaa'的資料。注意是6次。

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..25.862 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.217 ms
 Execution Time: 34.710 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..16.401 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.073 ms
 Execution Time: 23.340 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..30.001 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.093 ms
 Execution Time: 39.383 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..23.365 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.073 ms
 Execution Time: 32.397 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..19.287 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.099 ms
 Execution Time: 27.462 ms
(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.082..35.540 rows=100000 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.114 ms
 Execution Time: 45.546 ms
(4 rows)

由於 aaa佔用了該表的大部分資料，因此優化器選擇使用全表掃描，這是優化器的演算法決定的，這也存在合理性。在第六次的時候，請注意 Filter部分，(name)::text = 'aaa'::text變為 text=$1。此時優化器將生成通用執行計劃，並使用繫結變數。那麼之前的5次則被稱為 custom執行計劃。為什麼第六次才生成通用執行計劃？我們可以在 PostgreSQL的 plancache. c原始碼中找到說明：

The logic for choosing generic or custom plans is in choose_custom_plan

在choose_custom_plan函式裡我們可以看到/* Generate costom plans until we have done at least 5 (arbitrary)*/ if (planaource->num_custom_plans < 5) return true;

請注意，這裡的限定值小於5次，返回 true，選擇 custom執行計劃，而大於5次之後，則選擇通用執行計劃。因此，5次之後執行計劃就會固定。為什麼第六次使用通用執行計劃，執行計劃卻改為索引掃描的方式？實際上這和一個引數有關plan_cache_mode。目前檢視引數值時auto。

test=# show plan_cache_mode;
 plan_cache_mode
-----------------
 auto
(1 row)

在引數是auto的前提下，不管我執行aaa或bbb的列值，執行計劃都是一樣，執行計劃固定了。如果每次不管變數值怎麼變化，都選擇索引掃描方式，顯然這不是我們想要的。因為資料傾斜，如果執行計劃不變，那麼是不明智的，會出現低效解析行為。

如下，使用通用執行計劃後，我們關注不管索引掃描還是全表掃描，預估cost值是50006，有意思的是這個值是實際rows的一半。從第六次執行計劃開始，這個cost就沒再變過，顯然這是不合理的。當然有可能優化器認為這種演算法對於不同的掃描方式對應的Execution Time差的不是很多，所以固定執行計劃為通用執行計劃。

還有一個關鍵是使用通用執行計劃後Planning Time很小，這是否說明了”軟解析的功能呢！“生成執行計劃時間大大減少。


test=# explain (analyze) execute test_stmt ('bbb');
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.021..0.024 rows=11 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.015 ms
 Execution Time: 0.048 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('bbb');
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.020..0.022 rows=11 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.014 ms
 Execution Time: 0.041 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.032..23.592 rows=100000 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.013 ms
 Execution Time: 31.333 ms
(4 rows)

設定 plan_cache_mode=force_custom_plan

繼續測試另外一種情況，將plan_cache_mode設定為force_custom_plan。可以看到執行計劃會根據繫結變數的值的分佈進行變化，這種情況執行計劃是合理的。但是代價是每次執行都要重新解析語句，我們知道在oracle裡這叫硬解析，都聽說過一句話，硬解析是萬惡之源！對應的在Kingbase裡資料傾斜，謂詞條件經常變化，最好使用custom執行計劃。

set plan_cache_mode=force_custom_plan;


test=# explain (analyze) execute test_stmt ('bbb');
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..8.65 rows=13 width=10) (actual time=0.020..0.022 rows=11 loops=1)
   Index Cond: ((name)::text = 'bbb'::text)
 Planning Time: 0.079 ms
 Execution Time: 0.036 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..1794.14 rows=99998 width=10) (actual time=0.010..16.897 rows=100000 loops=1)
   Filter: ((name)::text = 'aaa'::text)
   Rows Removed by Filter: 11
 Planning Time: 0.077 ms
 Execution Time: 24.020 ms
(5 rows)

設定 plan_cache_mode=force_generic_plan

可以看到，這種情況下，執行計劃就被固定了。和最開始執行到第六次的執行計劃一樣，不管條件怎麼變化，優化器都採用了通用執行計劃。

test=# set plan_cache_mode =force_generic_plan ;
test=# explain (analyze) execute test_stmt ('bbb');
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.022..0.024 rows=11 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.016 ms
 Execution Time: 0.044 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('bbb');
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.032..0.035 rows=11 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.015 ms
 Execution Time: 0.055 ms
(4 rows)

test=# explain (analyze) execute test_stmt ('aaa');
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.037..23.191 rows=100000 loops=1)
   Index Cond: ((name)::text = $1)
 Planning Time: 0.016 ms
 Execution Time: 30.997 ms
(4 rows)

關閉prepare語句
deallocate all;

結論：

如果在Kingbase中使用prepare語句（類似繫結變數功能），

對於資料分佈均勻，且引數經常改變的情況適合使用這個功能。

建議對於資料傾斜的情況，將plan_cache_mode設定為force_custom_plan。或者不用這個功能。

當然在實現任何功能前還是建議進行充分測試。

PostgreSQL 繫結變數窺探

設定 plan_cache_mode=force_custom_plan

設定 plan_cache_mode=force_generic_plan

結論：

PostgreSQL 繫結變數窺探

關於繫結變數窺探

Python新手如何進行閉包時繫結變數操作

oracle繫結變數測試及效能對比

未使用繫結變數對share_pool的影響

繫結變數的使用

YAML配置繫結變數兩種方式

oracle 並非所有變數都已繫結_PostgreSQL繫結變數窺視

golang sql繫結變數_記錄一個golang的nil型別有意思的case

Oracle撈取系統未採用繫結變數sql的語句

【ORACLE】Oracle繫結變數知識梳理

Oracle使用SPM對含有繫結變數SQL做固定的方法

MySQl的繫結變數特性

關於pl/sql中的繫結變數(r3筆記第73天)

oracle繫結變數的使用

Oracle篩選沒有使用繫結變數的語句

WPF繫結靜態變數的教程

WPF繫結靜態變數的教程(二)

552 let、const、var及其區別，變數提升，前端程式碼中的上下文（作用域），迴圈中的 IIFE、塊級作用域，迴圈繫結事件的優化

Vue select 繫結動態變數的例項講解

PostgreSQL 繫結變數窺探

設定 plan_cache_mode=force_custom_plan

設定 plan_cache_mode=force_generic_plan

結論：

相關推薦