1. 程式人生 > 資料庫 >MySQL之select in 子查詢優化的實現

MySQL之select in 子查詢優化的實現

下面的演示基於MySQL5.7.27版本

一、關於MySQL子查詢的優化策略介紹:

子查詢優化策略

對於不同型別的子查詢,優化器會選擇不同的策略。

1. 對於 IN、=ANY 子查詢,優化器有如下策略選擇:

  • semijoin
  • Materialization
  • exists

2. 對於 NOT IN、<>ALL 子查詢,優化器有如下策略選擇:

  • Materialization
  • exists

3. 對於 derived 派生表,優化器有如下策略選擇:
derived_merge,將派生表合併到外部查詢中(5.7 引入 );
將派生表物化為內部臨時表,再用於外部查詢。

注意:update 和 delete 語句中子查詢不能使用 semijoin、materialization 優化策略

二、建立資料進行模擬演示

為了方便分析問題先建兩張表並插入模擬資料:

CREATE TABLE `test02` (
 `id` int(11) NOT NULL,`a` int(11) DEFAULT NULL,`b` int(11) DEFAULT NULL,PRIMARY KEY (`id`),KEY `a` (`a`)
) ENGINE=InnoDB;

drop procedure idata;
delimiter ;;
create procedure idata()
begin
 declare i int;
 set i=1;
 while(i<=10000)do
  insert into test02 values(i,i,i);
  set i=i+1;
 end while;
end;;
delimiter ;
call idata();

create table test01 like test02;
insert into test01 (select * from test02 where id<=1000)

三、舉例分析SQL例項

子查詢示例:

SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)

大部分人可定會簡單的認為這個 SQL 會這樣執行:

SELECT test02.b FROM test02 WHERE id < 10

結果:1,2,3,4,5,6,7,8,9

SELECT * FROM test01 WHERE test01.a IN (1,9);

但實際上 MySQL 並不是這樣做的。MySQL 會將相關的外層表壓到子查詢中,優化器認為這樣效率更高。也就是說,優化器會將上面的 SQL 改寫成這樣:

select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);

提示: 針對mysql5.5以及之前的版本

檢視執行計劃如下,發現這條SQL對錶test01進行了全表掃描1000,效率低下:

root@localhost [dbtest01]>desc select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type    | table | partitions | type | possible_keys | key   | key_len | ref | rows  | filtered | Extra    |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| 1 | PRIMARY      | test01 | NULL    | ALL  | NULL     | NULL  | NULL  | NULL | 1000  |  100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | test02 | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL |   9 |  10.00 | Using where |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
2 rows in set,2 warnings (0.00 sec)

但是此時實際執行下面的SQL,發現也不慢啊,這不是自相矛盾嘛,別急,咱們繼續往下分析:

SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)

檢視此條SQL的執行計劃如下:

root@localhost [dbtest01]>desc SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key   | key_len | ref      | rows | filtered | Extra    |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| 1 | SIMPLE    | <subquery2> | NULL    | ALL  | NULL     | NULL  | NULL  | NULL     | NULL |  100.00 | Using where |
| 1 | SIMPLE    | test01   | NULL    | ref  | a       | a    | 5    | <subquery2>.b |  1 |  100.00 | NULL    |
| 2 | MATERIALIZED | test02   | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL     |  9 |  100.00 | Using where |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
3 rows in set,1 warning (0.00 sec)

發現優化器使用到了策略MATERIALIZED。於是對此策略進行了資料查詢和學習。
https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html

原因是從MySQL5.6版本之後包括MySQL5.6版本,優化器引入了新的優化策略:materialization=[off|on],semijoin=[off|on],(off代表關閉此策略,on代表開啟此策略)
可以採用show variables like 'optimizer_switch'; 來檢視MySQL採用的優化器策略。當然這些策略都是可以線上進行動態修改的
set global optimizer_switch='materialization=on,semijoin=on';代表開啟優化策略materialization和semijoin

MySQL5.7.27預設的優化器策略:

root@localhost [dbtest01]>show variables like 'optimizer_switch';                                                               
+------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name  | Value                                                                                                                                                                                                      |
+------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

所以在MySQL5.6及以上版本時

執行下面的SQL是不會慢的。因為MySQL的優化器策略materialization和semijoin 對此SQL進行了優化

SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)

然而咱們把mysql的優化器策略materialization和semijoin 關閉掉測試,發現SQL確實對test01進行了全表的掃描(1000):

set global optimizer_switch='materialization=off,semijoin=off';

執行計劃如下test01表確實進行了全表掃描:

root@localhost [dbtest01]>desc SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type    | table | partitions | type | possible_keys | key   | key_len | ref | rows  | filtered | Extra    |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| 1 | PRIMARY      | test01 | NULL    | ALL  | NULL     | NULL  | NULL  | NULL | 1000  |  100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | test02 | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL |   9 |  10.00 | Using where |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
2 rows in set,1 warning (0.00 sec)

下面咱們分析下這個執行計劃:

!!!!再次提示:如果是mysql5.5以及之前的版本,或者是mysql5.6以及之後的版本關閉掉優化器策略materialization=off,semijoin=off,得到的SQL執行計劃和下面的是相同的

root@localhost [dbtest01]>desc select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type    | table | partitions | type | possible_keys | key   | key_len | ref | rows | filtered | Extra    |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | PRIMARY      | test01 | NULL    | ALL  | NULL     | NULL  | NULL  | NULL | 1000 |  100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | test02 | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL |  9 |  10.00 | Using where |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
2 rows in set,2 warnings (0.00 sec)

不相關子查詢變成了關聯子查詢(select_type:DEPENDENT SUBQUERY),子查詢需要根據 b 來關聯外表 test01,因為需要外表的 test01 欄位,所以子查詢是沒法先執行的。執行流程為:

  1. 掃描 test01,從 test01 取出一行資料 R;
  2. 從資料行 R 中,取出欄位 a 執行子查詢,如果得到結果為 TRUE,則把這行資料 R 放到結果集;
  3. 重複 1、2 直到結束。

總的掃描行數為 1000+1000*9=10000(這是理論值,但是實際值比10000還少,怎麼來的一直沒想明白,看規律是子查詢結果集每多一行,總掃描行數就會少幾行)。

Semi-join優化器:

這樣會有個問題,如果外層表是一個非常大的表,對於外層查詢的每一行,子查詢都得執行一次,這個查詢的效能會非常差。我們很容易想到將其改寫成 join 來提升效率:

select test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;

# 檢視此SQL的執行計劃:

desc select test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;

root@localhost [dbtest01]>EXPLAIN extended select test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref        | rows | filtered | Extra    |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+
| 1 | SIMPLE   | test02 | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL       |  9 |  100.00 | Using where |
| 1 | SIMPLE   | test01 | NULL    | ref  | a       | a    | 5    | dbtest01.test02.b |  1 |  100.00 | NULL    |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+
2 rows in set,2 warnings (0.00 sec)

這樣優化可以讓 t2 表做驅動表,t1 表關聯欄位有索引,查詢效率非常高。

但這裡會有個問題,join 是有可能得到重複結果的,而 in(select ...) 子查詢語義則不會得到重複值。
而 semijoin 正是解決重複值問題的一種特殊聯接。
在子查詢中,優化器可以識別出 in 子句中每組只需要返回一個值,在這種情況下,可以使用 semijoin 來優化子查詢,提升查詢效率。
這是 MySQL 5.6 加入的新特性,MySQL 5.6 以前優化器只有 exists 一種策略來“優化”子查詢。

經過 semijoin 優化後的 SQL 和執行計劃分為:

root@localhost [dbtest01]>desc SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key   | key_len | ref      | rows | filtered | Extra    |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| 1 | SIMPLE    | <subquery2> | NULL    | ALL  | NULL     | NULL  | NULL  | NULL     | NULL |  100.00 | Using where |
| 1 | SIMPLE    | test01   | NULL    | ref  | a       | a    | 5    | <subquery2>.b |  1 |  100.00 | NULL    |
| 2 | MATERIALIZED | test02   | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL     |  9 |  100.00 | Using where |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
3 rows in set,1 warning (0.00 sec)
select 
  `test01`.`id`,`test01`.`a`,`test01`.`b` 
from `test01` semi join `test02` 
where
  ((`test01`.`a` = `<subquery2>`.`b`) 
  and (`test02`.`id` < 10)); 

##注意這是優化器改寫的SQL,客戶端上是不能用 semi join 語法的

semijoin 優化實現比較複雜,其中又分 FirstMatch、Materialize 等策略,上面的執行計劃中 select_type=MATERIALIZED 就是代表使用了 Materialize 策略來實現的 semijoin
這裡 semijoin 優化後的執行流程為:

先執行子查詢,把結果儲存到一個臨時表中,這個臨時表有個主鍵用來去重;
從臨時表中取出一行資料 R;
從資料行 R 中,取出欄位 b 到被驅動表 t1 中去查詢,滿足條件則放到結果集;
重複執行 2、3,直到結束。
這樣一來,子查詢結果有 9 行,即臨時表也有 9 行(這裡沒有重複值),總的掃描行數為 9+9+9*1=27 行,比原來的 10000 行少了很多。

MySQL 5.6 版本中加入的另一種優化特性 materialization,就是把子查詢結果物化成臨時表,然後代入到外查詢中進行查詢,來加快查詢的執行速度。記憶體臨時表包含主鍵(hash 索引),消除重複行,使表更小。
如果子查詢結果太大,超過 tmp_table_size 大小,會退化成磁碟臨時表。這樣子查詢只需要執行一次,而不是對於外層查詢的每一行都得執行一遍。
不過要注意的是,這樣外查詢依舊無法通過索引快速查詢到符合條件的資料,只能通過全表掃描或者全索引掃描,

semijoin 和 materialization 的開啟是通過 optimizer_switch 引數中的 semijoin={on|off}、materialization={on|off} 標誌來控制的。
上文中不同的執行計劃就是對 semijoin 和 materialization 進行開/關產生的
總的來說對於子查詢,先檢查是否滿足各種優化策略的條件(比如子查詢中有 union 則無法使用 semijoin 優化)
然後優化器會按成本進行選擇,實在沒得選就會用 exists 策略來“優化”子查詢,exists 策略是沒有引數來開啟或者關閉的。

下面舉一個delete相關的子查詢例子:

把上面的2張測試表分別填充350萬資料和50萬資料來測試delete語句

root@localhost [dbtest01]>select count(*) from test02;
+----------+
| count(*) |
+----------+
| 3532986 |
+----------+
1 row in set (0.64 sec)
root@localhost [dbtest01]>create table test01 like test02;
Query OK,0 rows affected (0.01 sec)

root@localhost [dbtest01]>insert into test01 (select * from test02 where id<=500000)

root@localhost [dbtest01]>select count(*) from test01;
+----------+
| count(*) |
+----------+
|  500000 |

執行delete刪除語句執行了4s

root@localhost [dbtest01]>delete FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);
Query OK,9 rows affected (4.86 sec)

檢視 執行計劃,對test01表進行了幾乎全表掃描:

root@localhost [dbtest01]>desc delete FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type    | table | partitions | type | possible_keys | key   | key_len | ref | rows  | filtered | Extra    |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| 1 | DELETE       | test01 | NULL    | ALL  | NULL     | NULL  | NULL  | NULL | 499343 |  100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | test02 | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL |   9 |  10.00 | Using where |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
2 rows in set (0.00 sec)

於是修改上面的delete SQL語句偽join語句

root@localhost [dbtest01]>desc delete test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref        | rows | filtered | Extra    |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+
| 1 | SIMPLE   | test02 | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL       |  9 |  100.00 | Using where |
| 1 | DELETE   | test01 | NULL    | ref  | a       | a    | 5    | dbtest01.test02.b |  1 |  100.00 | NULL    |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+
2 rows in set (0.01 sec)

執行非常的快
root@localhost [dbtest01]>delete test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;
Query OK,9 rows affected (0.01 sec)

root@localhost [dbtest01]>select test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;
Empty set (0.00 sec)

下面的這個表執行要全表掃描,非常慢,基本對錶test01進行了全表掃描:

root@lcalhost [dbtest01]>desc delete FROM test01 WHERE id IN (SELECT id FROM test02 WHERE id='350000');
+----+--------------------+--------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+
| id | select_type    | table | partitions | type | possible_keys | key   | key_len | ref  | rows  | filtered | Extra    |
+----+--------------------+--------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+
| 1 | DELETE       | test01 | NULL    | ALL  | NULL     | NULL  | NULL  | NULL | 499343 |  100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | test02 | NULL    | const | PRIMARY    | PRIMARY | 4    | const |   1 |  100.00 | Using index |
+----+--------------------+--------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+
2 rows in set (0.00 sec)

然而採用join的話,效率非常的高:

root@localhost [dbtest01]>desc delete test01.* FROM test01 inner join test02 WHERE test01.id=test02.id and test02.id=350000 ;
+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref  | rows | filtered | Extra    |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
| 1 | DELETE   | test01 | NULL    | const | PRIMARY    | PRIMARY | 4    | const |  1 |  100.00 | NULL    |
| 1 | SIMPLE   | test02 | NULL    | const | PRIMARY    | PRIMARY | 4    | const |  1 |  100.00 | Using index |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
2 rows in set (0.01 sec)

 
root@localhost [dbtest01]> desc delete test01.* from test01 join test02 on test01.a=test02.b and test02.id=350000;
+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref  | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE   | test02 | NULL    | const | PRIMARY    | PRIMARY | 4    | const |  1 |  100.00 | NULL |
| 1 | DELETE   | test01 | NULL    | ref  | a       | a    | 5    | const |  1 |  100.00 | NULL |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
2 rows in set (0.00 sec)

參考文件:

https://www.cnblogs.com/zhengyun_ustc/p/slowquery1.html
https://www.jianshu.com/p/3989222f7084
https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html

到此這篇關於MySQL之select in 子查詢優化的實現的文章就介紹到這了,更多相關MySQL select in 子查詢優化內容請搜尋我們以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援我們!