修正MySQL表統計資訊以獲得更好的執行計劃
案例學習:
表的統計資訊錯誤導致優化器不能選擇正確的執行計劃
一個客戶說,在沒有程式碼和配置變更的情況下,一個查詢發生了災難性的效能下降。為簡介起見,對本文中的資料進行了編輯和修改,以免資訊洩露。該案例也獲得客戶的允許。
以下是執行計劃和執行結果:
mysql> explain
-> SELECT count(con.id) ,
-> MAX(DAYNAME(con.date)) ,
-> now() ,
-> pcz.type,
-> pcz.c_c
-> FROM con AS con
-> join orders o on con.o_id = o.id
-> JOIN pcz AS pcz ON o.d_p_c_z_id = pcz.id
-> left join c c on con.c_id = c.id
-> WHERE con.date = current_date() and pcz.type = "T_D"
-> GROUP BY con.date, pcz.c_c, pcz.type;
+----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | pcz | NULL | ALL | PRIMARY | NULL | NULL | NULL | 194 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | o | NULL | ref | PRIMARY,dpcz_FK | dpcz_FK | 9 | custom.pcz.id | 1642 | 100.00 | Using index |
| 1 | SIMPLE | con | NULL | ref | FK_order,IDX_date | FK_order | 8 | custom.o.id | 1 | 4.23 | Using where |
| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.con.c_id | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+
深入研究一下查詢,可以看到,條件con.date=current_date()。這個條件似乎是個能很好地過濾結果,但是MySQL優化器為什麼跳過了使用索引呢?我們再來看看執行計劃,通過強制使用在con.date列上的索引。執行計劃輸出結果是:
mysql> explain
-> SELECT count(con.id) ,
-> MAX(DAYNAME(con.date)) ,
-> now() ,
-> pcz.type,
-> pcz.c_c
-> FROM con AS con USE INDEX(IDX_date)
-> join orders o on con.o_id = o.id
-> JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id
-> left join c c on con.c_id = c.id
-> WHERE con.date = current_date() and pcz.type = "T_D"
-> GROUP BY con.date, pcz.c_c, pcz.type;
+----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+
| 1 | SIMPLE | con | NULL | ref | IDX_date | IDX_date | 3 | const | 110446 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.con.c_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | o | NULL | eq_ref | PRIMARY,dpcz_FK | PRIMARY | 8 | custom.con.o_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | pcz | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.o.d_p_c_z_id | 1 | 10.00 | Using where |
+----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+
強制使用索引後,估算出的記錄數= 110446(110%)=11045 。
根據預估,1347是11045的十分之一,MySQL自然會選擇第一個執行計劃。
但是,真正執行時間與期望的響應時間相比,明顯有些不對。第二個執行計劃在預計的時間內返回了結果,第一個執行計劃時間超過了預估時間。
進一步深入地檢查表的結構和執行計劃一,我們發現pcz表實際只有194行記錄。但是,再看orders表通過索引orders.dpcz_FK,表orders會返回1642行記錄,因為外來鍵約束orders_ibfk_10(如下面所示):這意味著表orders中的記錄數應該是194*1642=318548,但是實際的記錄數是32508150,是估算記錄數318548的十倍。
CREATE TABLE `orders` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
...
`d_p_c_z_id` bigint(20) DEFAULT NULL,
...,
PRIMARY KEY (`id`),
...
KEY `dpcz_FK` (`d_p_c_z_id`),
...
CONSTRAINT `orders_ibfk_10` FOREIGN KEY (`d_p_c_z_id`) REFERENCES `p_c_z` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
...
) ENGINE=InnoDB ....
mysql> select * from mysql.innodb_table_stats where database_name='cutom' and table_name='orders';
+---------------+------------+---------------------+----------+----------------------+--------------------------+
| database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+----------+----------------------+--------------------------+
| custom | orders | 2022-03-03 21:58:18 | 32508150 | 349120 | 697618 |
+---------------+------------+---------------------+----------+----------------------+--------------------------+
因此,我們懷疑表上關於orders.dpcz_FK的統計資訊是不準的。可以通過以下的語句進行查詢:
mysql> select * from mysql.innodb_index_stats where database_name='cutom' and table_name='orders' and index_name='dpcz_FK';
mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| custom | orders | dpcz_FK | 2022-02-28 12:35:30 | n_diff_pfx01 | 19498 | 50 | d_p_c_z_id |
| custom | orders | dpcz_FK | 2022-02-28 12:35:30 | n_diff_pfx02 | 32283087 | 128 | d_p_c_z_id,id |
| custom | orders | dpcz_FK | 2022-02-28 12:35:30 | n_leaf_pages | 55653 | NULL | Number of leaf pages in the index |
| custom | orders | dpcz_FK | 2022-02-28 12:35:30 | size | 63864 | NULL | Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
mysql> select count(distinct d_p_c_z_id) from orders;
+----------------------------------------------+
| count(distinct d_p_c_z_id) |
+----------------------------------------------+
| 195 |
+----------------------------------------------+
bingo!在表orders上,列d_p_c_z_id的cardinality是195。在表mysql.innodb_index_stats中,索引dpcz_FK的stat_value是19498,這個值是錯誤的,與實際的值相差很大。索引的stat_value被認為是列的cardinality。
使用正確的stat_value=195,從執行計劃一的第二行可以得到實際的行數是32508150/195=166708,然後要被檢查的行數是(194x10%)x166708x(1x4.23%)=136804。這個值比11045的十倍還大,11045是執行計劃二預估的值,現在MySQL會選擇執行計劃二,而不需要強制走索引。
但是,為什麼MySQL會統計出錯誤的統計資訊?如何去修正呢?
為了回答這個問題,我們首先需要知道MySQL是如何估算統計資訊,哪些引數會有影響。然後就可以很容易地找到解決路徑。
InnoDB是如何估算表的統計資訊的?
表的統計資訊可以自動收集或顯式收集。人們通常會開啟(預設就是開啟)innodb_stats_auto_recalc,在表中的資料變更一定比例後,自動重新收集永續性統計資訊。當表中的記錄超過10%被修改後,InnoDB會重新收集統計資訊。我們也可以使用analyze table來顯式地重新收集表的統計資訊。
InnoDB使用取樣技術(被熟知技術random dive),取樣索引的隨機的頁,來估算索引的cardinality。innodb_stats_persistent_sample_pages控制了取樣的頁數。可以參考:https://dev.mysql.com/doc/refman/5.7/en/innodb-persistent-stats.html
隨機取樣不是完全隨機。取樣的頁是根據取樣演算法算出來的。最後,全部不同的鍵值,即索引的stat_value會根據公式:N * R * N_DIFF_AVG_LEAF
其中:
N:表示葉子頁的數量
R:Level LA上不同的值的個數/Levl LA上所有的記錄數。從root頁開始,向下一層(level)遍歷它的全部child page,直到遇到這樣的一個leve:此level中至少包含A*10條不同的key,把此level標記為LA。
N_DIFF_AVG_LEAF:在所有的A葉子頁上,不同鍵值的平均值
理解了上面的演算法之後,當表的索引有了碎片之後,葉子頁的數量,和level LA的不同鍵值與LA上的所有記錄的比例邊的越來越不準確,因此估算stat_value的值就會不準確。一旦發生碎片,除非引數inodb_stats_persistent_sample_pages被修改或者重構、顯式地重新計算(手動執行analyze table),否則就無法生成準確的stat_value值。
解決方案:如何糾正表的統計資訊,避免統計資訊再次錯誤
根據取樣演算法,只有兩個因素會影響統計資訊的估算:引數innodb_stats_persistent_sample_pages;索引是如何組織的
要想innodb獲得正確的統計資訊,要麼增加innodb_stats_persistent_sample_pages;或者重構索引。重構索引的首要方式是重構表,例如執行一個alter操作。
讓我們來看看下面的三個例子:
1.analyze table,不rebuild表,保持innodb_stats_persistent_sample_pages=128。將stat_value更新成了19582 ,接近原來不準確的值19498。索引中葉子頁的數量從55653變成了55891,索引中的頁的數量也從63864變成了64248:
mysql> show variables = 'innodb_stats_persistent_sample_pages';
+--------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------+-------+
| innodb_stats_persistent_sample_pages | 128 |
+--------------------------------------+-------+
mysql> analyze table orders;
+---------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------------+---------+----------+----------+
| custom.orders | analyze | status | OK |
+---------------+---------+----------+----------+
mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| custom | orders | dpcz_FK | 2022-03-03 21:58:18 | n_diff_pfx01 | 19582 | 50 | d_p_c_z_id |
| custom | orders | dpcz_FK | 2022-03-03 21:58:18 | n_diff_pfx02 | 32425512 | 128 | d_p_c_z_id,id |
| custom | orders | dpcz_FK | 2022-03-03 21:58:18 | n_leaf_pages | 55891 | NULL | Number of leaf pages in the index |
| custom | orders | dpcz_FK | 2022-03-03 21:58:18 | size | 64248 | NULL | Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
2.analyze table,不rebuild表,但是將innodb_stats_persistent_sample_pages從128增加到512。得到的stat_value的值為192,非常接近真正的cardinality,即195。索引中葉子頁的數量有的很大的改變,從55653變成了44188。索引中頁的數量也有很大的改邊,從63864變成了50304。
mysql> show variables like '%persistent_sample%';
+--------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------+-------+
| innodb_stats_persistent_sample_pages | 512 |
+--------------------------------------+-------+
mysql> analyze table orders;
+---------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------------+---------+----------+----------+
| custom.orders | analyze | status | OK |
+---------------+---------+----------+----------+
mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| custom | orders | dpcz_FK | 2022-03-09 06:54:29 | n_diff_pfx01 | 192 | 179 | d_p_c_z_id |
| custom | orders | dpcz_FK | 2022-03-09 06:54:29 | n_diff_pfx02 | 31751321 | 512 | d_p_c_z_id,id |
| custom | orders | dpcz_FK | 2022-03-09 06:54:29 | n_leaf_pages | 44188 | NULL | Number of leaf pages in the index |
| custom | orders | dpcz_FK | 2022-03-09 06:54:29 | size | 50304 | NULL | Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
3.rebuild表,保持innodb_stats_persistent_sample_pages=128,得到的stat_value值是187,接近準確的cardinality 195。索引中葉子頁的數量改變很大,從55653變成了43733,索引中頁的數量也從63864變成了50111。
mysql> show variables = 'innodb_stats_persistent_sample_pages';
+--------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------+-------+
| innodb_stats_persistent_sample_pages | 128 |
+--------------------------------------+-------+
mysql> alter table orders engine=innodb;
Query OK, 0 rows affected (11 min 16.37 sec)
mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| custom | orders | dpcz_FK | 2022-03-07 18:44:43 | n_diff_pfx01 | 187 | 128 | d_p_c_z_id |
| custom | orders | dpcz_FK | 2022-03-07 18:44:43 | n_diff_pfx02 | 31531493 | 128 | d_p_c_z_id,id |
| custom | orders | dpcz_FK | 2022-03-07 18:44:43 | n_leaf_pages | 43733 | NULL | Number of leaf pages in the index |
| custom | orders | dpcz_FK | 2022-03-07 18:44:43 | size | 50111 | NULL | Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
表的統計資訊正確之後,mysql優化器就會使用正確的執行計劃:
mysql> explain
SELECT count(con.id) ,
MAX(DAYNAME(con.date)) ,
now() ,
pcz.type,
pcz.c_c
FROM con AS con
join orders o on con.order_id = o.id
JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id
left join c c on con.c_id = c.id
WHERE con.date = current_date()
and pcz.type = "T_D"
GROUP BY con.date, pcz.c_c, pcz.type;
+----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+
| 1 | SIMPLE | con | NULL | ref | FK_order,IDX_date | IDX_date | 3 | const | 3074 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.con.c_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | o | NULL | eq_ref | PRIMARY,dpcz_FK | PRIMARY | 8 | custom.con.order_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | pcz | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.o.d_p_c_z_id | 1 | 10.00 | Using where |
+----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+
4 rows in set, 1 warning (0.01 sec)
結論
mysql優化器依賴表的統計資訊的準確性來選擇優化的執行計劃。我們可以控制表的統計資訊的準確性,通過修改innodb_stats_persistent_pages。
我們也可以強制重新統計表的統計資訊,通過重構表,從而重構索引,這樣可以增強表的統計資訊的準確性。
重構表我們可以使用alter table或者pt工具來完成。