1. 程式人生 > 實用技巧 >mysql中的semi-join

mysql中的semi-join

1. 背景介紹

什麼是semi-join?
所謂的semi-join是指semi-join子查詢。 當一張表在另一張表找到匹配的記錄之後,半連線(semi-jion)返回第一張表中的記錄。與條件連線相反,即使在右節點中找到幾條匹配的記錄,左節點 的表也只會返回一條記錄。另外,右節點的表一條記錄也不會返回。半連線通常使用IN 或 EXISTS 作為連線條件。 該子查詢具有如下結構:

SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...

即在where條件的“IN”中的那個子查詢。
這種查詢的特點是我們只關心outer_table中與semi-join相匹配的記錄。
換句話說,最後的結果集是在outer_tables中的,而semi-join的作用只是對outer_tables中的記錄進行篩選。這也是我們進行 semi-join優化的基礎,即我們只需要從semi-join中獲取到最少量的足以對outer_tables記錄進行篩選的資訊就足夠了。
所謂的最少量,體現到優化策略上就是如何去重。
以如下語句為例:

select * from Country 
where 
  Country.Code in 
(select City.country 
                   from City 
                   where City.Population>1*1000*1000);

當中的semi-join: “

select City.country 
                   from City 
                   where City.Population>1*1000*1000

” 可能返回的結果集如下: China(Beijin), China(Shanghai), France(Paris)...
我們可以看到這裡有2個China,分別來至2條城市記錄Beijin和Shanghai, 但實際上我們只需要1個China就足夠對outer_table

Country進行篩選了。所以我們需要去重。

2. Mysql支援的Semi-join策略
Mysql支援的semi-join策略主要有5個,它們分別為:
1. DuplicateWeedout: 使用臨時表對semi-join產生的結果集去重。 Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.

對應的匹配條件為:


2. FirstMatch: 只選用內部表的第1條與外表匹配的記錄。 FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.

對應的匹配條件為:


3. LooseScan: 把inner-table資料基於索引進行分組,取每組第一條資料進行匹配。 LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.

對應的匹配條件為:

4. Materializelookup: 將inner-table去重固化成臨時表,遍歷outer-table,然後在固化表上去尋找匹配。 Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned.
對應的匹配條件:



5. MaterializeScan: 將inner-table去重固化成臨時表,遍歷固化表,然後在outer-table上尋找匹配。

對應的條件:
Each of these strategies except Duplicate Weedout can be enabled or disabled using the
optimizer_switch
system variable. Thesemijoinflag controls whether semi-joins are used. If it is set toon, thefirstmatch,loosescan, andmaterializationflags enable finer control over the permitted semi-join strategies. These flags areonby default.

The use of semi-join strategies is indicated inEXPLAINoutput as follows:

  • Semi-joined tables show up in the outer select.EXPLAIN EXTENDEDplusSHOW WARNINGSshows the rewritten query, which displays the semi-join structure. From this you can get an idea about which tables were pulled out of the semi-join. If a subquery was converted to a semi-join, you will see that the subquery predicate is gone and its tables andWHEREclause were merged into the outer query join list andWHEREclause.

  • Temporary table use for Duplicate Weedout is indicated byStart temporaryandEnd temporaryin theExtracolumn. Tables that were not pulled out and are in the range ofEXPLAINoutput rows covered byStart temporaryandEnd temporarywill have theirrowidin the temporary table.

  • FirstMatch(tbl_name)in theExtracolumn indicates join shortcutting.

  • LooseScan(m..n)in theExtracolumn indicates use of the LooseScan strategy.mandnare key part numbers.

  • As of MySQL 5.6.7, temporary table use for materialization is indicated by rows with aselect_typevalue ofMATERIALIZEDand rows with atablevalue of<subqueryN>.

    Before MySQL 5.6.7, temporary table use for materialization is indicated in theExtracolumn byMaterializeif a single table is used, or byStart materializeandEnd materializeif multiple tables are used. IfScanis present, no temporary table index is used for table reads. Otherwise, an index lookup is used.

  • mysql> SELECT @@optimizer_switch\G
    *************************** 1. row ***************************
    @@optimizer_switch: index_merge=on,index_merge_union=on,
                        index_merge_sort_union=on,
                        index_merge_intersection=on,
                        engine_condition_pushdown=on,
                        index_condition_pushdown=on,
                        mrr=on,mrr_cost_based=on,
                        block_nested_loop=on,batched_key_access=off,
                        materialization=on,semijoin=on,loosescan=on,
                        firstmatch=on,
                        subquery_materialization_cost_based=on,
                        use_index_extensions=on

參考: http://blog.itpub.net/22664653/viewspace-672772/ http://blog.sina.com.cn/s/blog_4673e60301011qvx.html http://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html 原文https://www.cnblogs.com/xiaotengyi/p/3908347.html