MySQL實驗內連線優化order by+limit 以及新增索引再次改進

阿新 • • 發佈：2020-07-06

MySQL實驗內連線優化order by+limit 以及新增索引再次改進

在進行子查詢優化雙引數limit時我萌生了測試更加符合實際生產需要的ORDER BY + LIMIT的想法，或許我們也可以對ORDER BY + LIMIT 也進行適當優化

實驗準備

使用MySQL官方的大資料庫employees進行實驗，匯入該示例庫見此

準備使用其中的employees表，先檢視一下表結構和表內的記錄數量

mysql> desc employees;

+------------+---------------+------+-----+---------+-------+

| Field      | Type          | Null | Key | Default | Extra |
 

+------------+---------------+------+-----+---------+-------+

| emp_no     | int(11)       | NO   | PRI | NULL    |       |

| birth_date | date          | NO   |     | NULL    |       |

| first_name | varchar(14)   | NO   |     | NULL    |       |

| last_name  | varchar(16)   | NO   |     | NULL    |       |
 

| gender     | enum('M','F') | NO   |     | NULL    |       |

| hire_date  | date          | NO   |     | NULL    |       |

+------------+---------------+------+-----+---------+-------+

6 rows in set (0.00 sec)

mysql> select count(*) from employeed;

ERROR 1146 (42S02): Table 'employees.employeed' doesn't exist
 

mysql> select count(*) from employees;

+----------+

| count(*) |

+----------+

|   300024 |

+----------+

1 row in set (0.05 sec)

我們可以看到，只有主鍵emp_no有索引

實驗過程

MySQL5.7官網對Explain各項引數的解釋

官網對ORDER BY機制的詳解

explain引數5.7版本推薦參考部落格

老版本explain推薦參考部落格（即新版本預設explain extended）

關於explain引數的拓展連結

MySQL explain key值的解釋

使用未優化order by + limit

mysql> select * from employees order by birth_date limit 200000,10;

+--------+------------+------------+------------+--------+------------+

| emp_no | birth_date | first_name | last_name  | gender | hire_date  |

+--------+------------+------------+------------+--------+------------+

| 498507 | 1960-09-24 | Perla      | Delgrange  | M      | 1989-12-08 |

| 494212 | 1960-09-25 | Susuma     | Baranowski | M      | 1989-05-15 |

| 496888 | 1960-09-25 | Rosalyn    | Rebaine    | M      | 1985-11-27 |

| 497766 | 1960-09-25 | Matt       | Atrawala   | F      | 1987-02-11 |

| 481404 | 1960-09-25 | Sanjeeva   | Eterovic   | F      | 1986-06-05 |

| 483269 | 1960-09-25 | Mitchel    | Pramanik   | F      | 1997-07-23 |

| 483270 | 1960-09-25 | Geoff      | Gulik      | F      | 1993-11-25 |

|  59683 | 1960-09-25 | Supot      | Millington | F      | 1991-06-03 |

| 101264 | 1960-09-25 | Mansur     | Atchley    | F      | 1990-05-22 |

|  92453 | 1960-09-25 | Khalid     | Trystram   | M      | 1993-11-10 |

+--------+------------+------------+------------+--------+------------+

10 rows in set (0.20 sec)

mysql> explain select * from employees order by birth_date limit 200000,10;

+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+

| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra          |

+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+

|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 299468 |   100.00 | Using filesort |

+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+

1 row in set, 1 warning (0.00 sec)

我們可以看到，未優化時使用的是全表掃描，花費0.2s

內連線優化

優化思路：我們可以利用主鍵emp_no的索引樹，在索引樹上將符合order by birth_date limit 200000,10的元組（即，行）的主鍵找出來，再用內連線返回10行emp_no的所有資訊。

（內連線只返回表中與連線條件相匹配的行，也就是說，select emp_no from employees order by birth_date limit 200000,10只會返回10個emp_no，那麼內連線後，結果集中也只有10個emp_no對應的所有資訊）

（另外這裡的內連線時使用了emp_no，即，子查詢中也有"覆蓋索引"減少磁碟I/O的功勞）

mysql> select * from employees inner join (select emp_no from employees order by birth_date limit 200000,10) as temp_table using (emp_no);

+--------+------------+------------+-----------+--------+------------+

| emp_no | birth_date | first_name | last_name | gender | hire_date  |

+--------+------------+------------+-----------+--------+------------+

| 427365 | 1960-09-24 | Yuping     | Sethi     | M      | 1990-06-21 |

| 424219 | 1960-09-25 | Woody      | Bernini   | M      | 1989-03-10 |

| 469218 | 1960-09-25 | George     | Plotkin   | M      | 1992-02-19 |

| 404121 | 1960-09-25 | Domenico   | Birnbaum  | M      | 1993-08-01 |

| 404266 | 1960-09-25 | Quingbo    | Jervis    | F      | 1985-03-15 |

| 409133 | 1960-09-25 | Nitsan     | Kleiser   | F      | 1985-05-18 |

| 409558 | 1960-09-25 | Shunichi   | Hofting   | F      | 1992-07-06 |

| 412045 | 1960-09-25 | Kristin    | Bolotov   | F      | 1985-06-28 |

| 481404 | 1960-09-25 | Sanjeeva   | Eterovic  | F      | 1986-06-05 |

| 483269 | 1960-09-25 | Mitchel    | Pramanik  | F      | 1997-07-23 |

+--------+------------+------------+-----------+--------+------------+

10 rows in set (0.10 sec)

mysql> explain select * from employees inner join (select emp_no from employees order by birth_date limit 100000,10) as table_temp using (emp_no);

+----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+

| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref               | rows   | filtered | Extra          |

+----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+

|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL              | 100010 |   100.00 | NULL           |

|  1 | PRIMARY     | employees  | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | table_temp.emp_no |      1 |   100.00 | NULL           |

|  2 | DERIVED     | employees  | NULL       | ALL    | NULL          | NULL    | NULL    | NULL              | 299468 |   100.00 | Using filesort |

+----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+

3 rows in set, 1 warning (0.00 sec)

可見效率提高了一倍，在explain中

第三行的select_type為DERIVED，是指這行是包含在from子句中的查詢，我們可以看到，子句查詢也沒有使用索引
<derived2>是指，第一行的查詢說明表示當前查詢依賴 id=N 的查詢，此處N=2，那我們先看第二行：

第二行type為eq_ref是指primary key 或 unique key 索引被連線（join）使用，，對於每個索引鍵的關聯查詢，返回匹配唯一行資料（有且只有1個）。在這裡就是說在子查詢查詢到emp_no後，子查詢中產生的臨時表與employees表進行連線。
（對於這裡的explain的解釋只包含了對explain各項引數的解釋，但似乎沒有辦法直接驗證優化思路，還望各位看官前輩指點）

為排序欄位加上索引

既然我們在內連線中是通過排序欄位birth_date後對emp_no進行查詢，那麼我們或許能再為排序欄位加上索引以再次提高效率。

mysql> alter table employees add index birthdate_index (birth_date);

Query OK, 0 rows affected (0.75 sec)

Records: 0  Duplicates: 0  Warnings: 0

mysql> desc employees;

+------------+---------------+------+-----+---------+-------+

| Field      | Type          | Null | Key | Default | Extra |

+------------+---------------+------+-----+---------+-------+

| emp_no     | int(11)       | NO   | PRI | NULL    |       |

| birth_date | date          | NO   | MUL | NULL    |       |

| first_name | varchar(14)   | NO   |     | NULL    |       |

| last_name  | varchar(16)   | NO   |     | NULL    |       |

| gender     | enum('M','F') | NO   |     | NULL    |       |

| hire_date  | date          | NO   |     | NULL    |       |

+------------+---------------+------+-----+---------+-------+

6 rows in set (0.00 sec)

然後我們再次執行未優化和通過內連線優化的兩條查詢語句。

mysql> select * from employees order by birth_date limit 200000,10;

+--------+------------+------------+------------+--------+------------+

| emp_no | birth_date | first_name | last_name  | gender | hire_date  |

+--------+------------+------------+------------+--------+------------+

| 498507 | 1960-09-24 | Perla      | Delgrange  | M      | 1989-12-08 |

| 494212 | 1960-09-25 | Susuma     | Baranowski | M      | 1989-05-15 |

| 496888 | 1960-09-25 | Rosalyn    | Rebaine    | M      | 1985-11-27 |

| 497766 | 1960-09-25 | Matt       | Atrawala   | F      | 1987-02-11 |

| 481404 | 1960-09-25 | Sanjeeva   | Eterovic   | F      | 1986-06-05 |

| 483269 | 1960-09-25 | Mitchel    | Pramanik   | F      | 1997-07-23 |

| 483270 | 1960-09-25 | Geoff      | Gulik      | F      | 1993-11-25 |

|  59683 | 1960-09-25 | Supot      | Millington | F      | 1991-06-03 |

| 101264 | 1960-09-25 | Mansur     | Atchley    | F      | 1990-05-22 |

|  92453 | 1960-09-25 | Khalid     | Trystram   | M      | 1993-11-10 |

+--------+------------+------------+------------+--------+------------+

10 rows in set (0.20 sec)

mysql> select * from employees inner join (select emp_no from employees order by birth_date limit 200000,10) as temp_table using (emp_no);

+--------+------------+------------+------------+--------+------------+

| emp_no | birth_date | first_name | last_name  | gender | hire_date  |

+--------+------------+------------+------------+--------+------------+

| 498507 | 1960-09-24 | Perla      | Delgrange  | M      | 1989-12-08 |

|  23102 | 1960-09-25 | Hsiangchu  | Harbusch   | M      | 1986-03-14 |

|  29961 | 1960-09-25 | Susumu     | Munoz      | F      | 1989-12-31 |

|  32061 | 1960-09-25 | Dipankar   | Buescher   | M      | 1992-10-24 |

|  36216 | 1960-09-25 | Xianlong   | Rassart    | F      | 1987-09-05 |

|  37058 | 1960-09-25 | Khue       | Osgood     | M      | 1991-11-04 |

|  38365 | 1960-09-25 | Sariel     | Ramsak     | M      | 1993-02-26 |

|  39901 | 1960-09-25 | Jianhui    | Ushiama    | M      | 1985-12-03 |

|  59683 | 1960-09-25 | Supot      | Millington | F      | 1991-06-03 |

|  63784 | 1960-09-25 | Rosita     | Zyda       | M      | 1988-08-12 |

+--------+------------+------------+------------+--------+------------+

10 rows in set (0.03 sec)

我們可以看到，普通查詢語句並沒有得到效率上的提升，但是內連線的查詢效率得到了很大的提升，花費時間從原來的0.1s縮減為0.03秒，也就是說，再次優化後的內連線差不多可以應對百萬（甚至千萬級，因為實際生產中所使用的硬體設施肯定會遠遠好與我現在的基礎班ECS）級別的資料了。

對於加上 birthdate_index索引後普通查詢效率未提升的說明：

因為我們查詢的是select *，即使emp_no和birth_date上有索引，在查詢其他列資訊的時候，我們依然需要回表。因此即使加上索引後，我們的普通查詢依然使用的是全表掃描。

小結

經過試驗證明，內連線對於order by+雙引數limit有一定效果，在合適的內連線子查詢下，增加相應的索引，能夠使效能進一步提升。從0.2到0.1在到0.03，當縮減一個數量級時，那都是很大的突破。（完結撒花~）

最後的補充

EXPLAIN不會告訴你關於觸發器、儲存過程的資訊或使用者自定義函式對查詢的影響情況
EXPLAIN不考慮各種Cache
EXPLAIN不能顯示MySQL在執行查詢時所作的優化工作
部分統計資訊是估算的，並非精確值
EXPALIN只能解釋SELECT操作，其他操作要重寫為SELECT後檢視執行計劃

MySQL實驗內連線優化order by+limit 以及新增索引再次改進

MySQL實驗內連線優化order by+limit 以及新增索引再次改進

實驗準備

實驗過程

使用未優化order by + limit

內連線優化

為排序欄位加上索引

小結

最後的補充

MySQL實驗內連線優化order by+limit 以及新增索引再次改進

MySQL實驗子查詢優化雙引數limit

Mysql優化order by語句的方法詳解

MySQL利用索引優化ORDER BY排序語句

Mysql排序和分頁(order by&limit)及存在的坑

MySQL利用索引優化ORDER BY排序語句的方法

mysql--大資料表order by + limit 導致查詢緩慢問題

資料庫學習之MySQL (八）——排序查詢 ORDER BY ASC DSC

【開發總結】order by 為什麼沒有走索引？

「MySQL系列」索引設計原則、索引失效場景、limit 、order by、group by 等常見場景優化

Mysql order by 和limit 同時使用的優化問題

MySQL中（JOIN/ORDER BY）語句的查詢過程及優化方法

《MySQL必知必會》檢索資料，排序檢索資料(select ,* ,distinct ,limit , . , order by ,desc)

MySQL中ORDER BY與LIMIT一起使用（有坑）

mysql 高階知識【order by 排序優化】

MySQL效能和索引優化，order by，explain優化

MySQL 5.7和8.0版本，不同組內排序方法——定義變數@rank 和Row_number()over(...order by (...))

order by 與 limit 的優化

mysql sql99語法內連線非等值連線詳解

Mysql優化技巧之Limit查詢的優化分析

MySQL實驗 內連線優化order by+limit 以及新增索引再次改進

MySQL實驗 內連線優化order by+limit 以及新增索引再次改進

實驗準備

實驗過程

使用未優化order by + limit

內連線優化

為排序欄位加上索引

小結

最後的補充

相關推薦

MySQL實驗內連線優化order by+limit 以及新增索引再次改進

MySQL實驗內連線優化order by+limit 以及新增索引再次改進