Mysql優化_第十三篇(HashJoin篇)
阿新 • • 發佈:2020-11-30
Mysql優化_第十三篇(HashJoin篇)
目錄1 適用場景
純等值查詢,不能使用索引
從MYSQL 8.0.18開始,MYSQL實現了對於相等條件下的HASHJOIN,並且,join條件中無法使用任何索引,比如下面的語句:
SELECT *
FROM t1
JOIN t2
ON t1.c1=t2.c1;
等值查詢,使用到索引
當然,如果有一個或者多個索引可以適用於單表謂詞,hash join也可以使用到。(這句話不是很懂?原句為:A hash join can also be used when there are one or more indexes that can be used for single-table predicates.
相對於Blocked Nested Loop Algorithm
,以下簡稱BNL,hash join效能更高,並且兩者的使用場景相同,所以從8.0.20開始,BNL已經被移除。使用hash join替代之。
通常在EXPLAIN的結果裡面,在Extra列,會有如下描述:
Extra: Using where; Using join buffer (hash join)
說明使用到了hash join。
多個join條件中至少包含一個等值查詢(可以包含非等值)
雖然hash join適用於等值join,但是,從原則上來講,在多個join條件中,只要有每對join條件中,至少存在一個等值,Mysql就可以使用到hash join來提升速度,比如下面的語句:
SELECT * FROM t1 JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2) 該語句包含非等值的join條件 JOIN t3 ON (t2.c1 = t3.c1);
EXPLAIN FORMAT=TREE的結果如下:
EXPLAIN: -> Inner hash join (t3.c1 = t1.c1) (cost=1.05 rows=1)
-> Table scan on t3 (cost=0.35 rows=1)
-> Hash
-> Filter: (t1.c2 < t2.c2) (cost=0.70 rows=1)
-> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Hash
-> Table scan on t1 (cost=0.35 rows=1)
多個join條件對中完全沒有等值查詢(從8.0.20開始)
在Mysql8.0.20之前,如果join條件中有任何一個條件沒有包含等值,那麼BNL就會被應用,但是從8.0.20開始,hash join也可以應用到下面的語句:
mysql> EXPLAIN FORMAT=TREE
-> SELECT * FROM t1
-> JOIN t2 ON (t1.c1 = t2.c1)
-> JOIN t3 ON (t2.c1 < t3.c1)\G 該join條件不包含等值,會作為filter來使用
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.c1 < t3.c1) (cost=1.05 rows=1)
-> Inner hash join (no condition) (cost=1.05 rows=1)
-> Table scan on t3 (cost=0.35 rows=1)
-> Hash
-> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Hash
-> Table scan on t1 (cost=0.35 rows=1)
笛卡爾積
當然,也可以適用於笛卡爾積(沒有指定join條件):
mysql> EXPLAIN FORMAT=TREE
-> SELECT *
-> FROM t1
-> JOIN t2
-> WHERE t1.c2 > 50\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (cost=0.70 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Hash
-> Filter: (t1.c2 > 50) (cost=0.35 rows=1) where條件提早過濾
-> Table scan on t1 (cost=0.35 rows=1)
普通inner join完全沒有等值
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.c1 < t2.c1) (cost=4.70 rows=12) //join條件變成了filter
-> Inner hash join (no condition) (cost=4.70 rows=12)
-> Table scan on t2 (cost=0.08 rows=6)
-> Hash
-> Table scan on t1 (cost=0.85 rows=6)
Semijoin(Mysql文件EXPLAIN有誤,這裡更正下)
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1
-> WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G
*************************** 1. row ***************************
| -> Filter: (t1.c1 < t2.c1) (cost=0.70 rows=1)
-> Inner hash join (no condition) (cost=0.70 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Hash
-> Table scan on t1 (cost=0.35 rows=1)
|
Antijoin(Mysql文件EXPLAIN有誤,這裡更正下)
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t2
-> WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
| -> Hash antijoin (t1.c1 = t2.c2) (cost=0.70 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Hash
-> Table scan on t1 (cost=0.35 rows=1)
|
Left outer join
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t2.c1 = t1.c1) (cost=3.99 rows=36)
-> Table scan on t1 (cost=0.85 rows=6)
-> Hash
-> Table scan on t2 (cost=0.14 rows=6)
Right outer join(MYSQL會把所有的右外連線轉換為左外連線):
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 RIGHT JOIN t2 ON t1.c1 = t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t1.c1 = t2.c1) (cost=3.99 rows=36)
-> Table scan on t2 (cost=0.85 rows=6)
-> Hash
-> Table scan on t1 (cost=0.14 rows=6)
相關配置
目前可以使用 join_buffer_size
系統變數來控制hash join使用到的記憶體大小,如果需要使用到的記憶體超過了這個大小,那麼就會下盤,這個時候效率就會比較低了,需要使用者進行優化。