1. 程式人生 > 實用技巧 >Mysql優化_第十三篇(HashJoin篇)

Mysql優化_第十三篇(HashJoin篇)

Mysql優化_第十三篇(HashJoin篇)

目錄

1 適用場景

純等值查詢,不能使用索引

從MYSQL 8.0.18開始,MYSQL實現了對於相等條件下的HASHJOIN,並且,join條件中無法使用任何索引,比如下面的語句:

SELECT *
    FROM t1
    JOIN t2
        ON t1.c1=t2.c1;

等值查詢,使用到索引

當然,如果有一個或者多個索引可以適用於單表謂詞,hash join也可以使用到。(這句話不是很懂?原句為:A hash join can also be used when there are one or more indexes that can be used for single-table predicates.

相對於Blocked Nested Loop Algorithm,以下簡稱BNL,hash join效能更高,並且兩者的使用場景相同,所以從8.0.20開始,BNL已經被移除。使用hash join替代之。

通常在EXPLAIN的結果裡面,在Extra列,會有如下描述:

Extra: Using where; Using join buffer (hash join)

說明使用到了hash join。

多個join條件中至少包含一個等值查詢(可以包含非等值)

雖然hash join適用於等值join,但是,從原則上來講,在多個join條件中,只要有每對join條件中,至少存在一個等值,Mysql就可以使用到hash join來提升速度,比如下面的語句:

SELECT * FROM t1
    JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)  該語句包含非等值的join條件
    JOIN t3 ON (t2.c1 = t3.c1);

EXPLAIN FORMAT=TREE的結果如下:

EXPLAIN: -> Inner hash join (t3.c1 = t1.c1)  (cost=1.05 rows=1)
    -> Table scan on t3  (cost=0.35 rows=1)
    -> Hash
        -> Filter: (t1.c2 < t2.c2)  (cost=0.70 rows=1)
            -> Inner hash join (t2.c1 = t1.c1)  (cost=0.70 rows=1)
                -> Table scan on t2  (cost=0.35 rows=1)
                -> Hash
                    -> Table scan on t1  (cost=0.35 rows=1)

多個join條件對中完全沒有等值查詢(從8.0.20開始)

在Mysql8.0.20之前,如果join條件中有任何一個條件沒有包含等值,那麼BNL就會被應用但是從8.0.20開始,hash join也可以應用到下面的語句

mysql> EXPLAIN FORMAT=TREE
    -> SELECT * FROM t1
    ->     JOIN t2 ON (t1.c1 = t2.c1)
    ->     JOIN t3 ON (t2.c1 < t3.c1)\G   該join條件不包含等值,會作為filter來使用
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.c1 < t3.c1)  (cost=1.05 rows=1)
    -> Inner hash join (no condition)  (cost=1.05 rows=1)
        -> Table scan on t3  (cost=0.35 rows=1)
        -> Hash
            -> Inner hash join (t2.c1 = t1.c1)  (cost=0.70 rows=1)
                -> Table scan on t2  (cost=0.35 rows=1)
                -> Hash
                    -> Table scan on t1  (cost=0.35 rows=1)

笛卡爾積

當然,也可以適用於笛卡爾積(沒有指定join條件):

mysql> EXPLAIN FORMAT=TREE
    -> SELECT *
    ->     FROM t1
    ->     JOIN t2
    ->     WHERE t1.c2 > 50\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join  (cost=0.70 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Filter: (t1.c2 > 50)  (cost=0.35 rows=1)  where條件提早過濾
            -> Table scan on t1  (cost=0.35 rows=1)

普通inner join完全沒有等值

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.c1 < t2.c1)  (cost=4.70 rows=12)  //join條件變成了filter
    -> Inner hash join (no condition)  (cost=4.70 rows=12)
        -> Table scan on t2  (cost=0.08 rows=6)
        -> Hash
            -> Table scan on t1  (cost=0.85 rows=6)

Semijoin(Mysql文件EXPLAIN有誤,這裡更正下)

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 
    ->     WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G
*************************** 1. row ***************************
| -> Filter: (t1.c1 < t2.c1)  (cost=0.70 rows=1)
    -> Inner hash join (no condition)  (cost=0.70 rows=1)
        -> Table scan on t2  (cost=0.35 rows=1)
        -> Hash
            -> Table scan on t1  (cost=0.35 rows=1)
 |

Antijoin(Mysql文件EXPLAIN有誤,這裡更正下)

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t2 
    ->     WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
| -> Hash antijoin (t1.c1 = t2.c2)  (cost=0.70 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t1  (cost=0.35 rows=1)
 |

Left outer join

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t2.c1 = t1.c1)  (cost=3.99 rows=36)
    -> Table scan on t1  (cost=0.85 rows=6)
    -> Hash
        -> Table scan on t2  (cost=0.14 rows=6)

Right outer join(MYSQL會把所有的右外連線轉換為左外連線):

mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 RIGHT JOIN t2 ON t1.c1 = t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t1.c1 = t2.c1)  (cost=3.99 rows=36)
    -> Table scan on t2  (cost=0.85 rows=6)
    -> Hash
        -> Table scan on t1  (cost=0.14 rows=6)

相關配置

目前可以使用 join_buffer_size 系統變數來控制hash join使用到的記憶體大小,如果需要使用到的記憶體超過了這個大小,那麼就會下盤,這個時候效率就會比較低了,需要使用者進行優化。