MySQL實驗 內連線優化order by+limit 以及新增索引再次改進
阿新 • • 發佈:2020-07-06
# MySQL實驗 內連線優化order by+limit 以及新增索引再次改進
在進行[子查詢優化雙引數limit](https://www.cnblogs.com/G-Aurora/p/13254473.html)時我萌生了測試更加符合實際生產需要的`ORDER BY + LIMIT`的想法,或許我們也可以對`ORDER BY + LIMIT` 也進行適當優化
## 實驗準備
使用MySQL官方的大資料庫employees進行實驗,[匯入該示例庫見此](https://www.cnblogs.com/G-Aurora/p/13171234.html)
準備使用其中的employees表,先檢視一下表結構和表內的記錄數量
```
mysql> desc employees;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
```
```
mysql> select count(*) from employeed;
ERROR 1146 (42S02): Table 'employees.employeed' doesn't exist
mysql> select count(*) from employees;
+----------+
| count(*) |
+----------+
| 300024 |
+----------+
1 row in set (0.05 sec)
```
我們可以看到,只有主鍵emp_no有索引
## 實驗過程
[MySQL5.7官網對Explain各項引數的解釋](https://dev.mysql.com/doc/refman/5.7/en/explain-output.html)
[官網對ORDER BY機制的詳解](https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html)
[explain引數5.7版本推薦參考部落格](https://blog.csdn.net/yhl_jxy/article/details/88570154)
[老版本explain推薦參考部落格](https://www.cnblogs.com/butterfly100/archive/2018/01/15/8287569.html)(即新版本預設explain extended)
[關於explain引數的拓展連結](https://blog.csdn.net/lzrit/article/details/81585941)
[MySQL explain key值的解釋](https://www.cnblogs.com/yy20141204bb/p/8421338.html)
### 使用未優化order by + limit
```
mysql> select * from employees order by birth_date limit 200000,10;
+--------+------------+------------+------------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+------------+--------+------------+
| 498507 | 1960-09-24 | Perla | Delgrange | M | 1989-12-08 |
| 494212 | 1960-09-25 | Susuma | Baranowski | M | 1989-05-15 |
| 496888 | 1960-09-25 | Rosalyn | Rebaine | M | 1985-11-27 |
| 497766 | 1960-09-25 | Matt | Atrawala | F | 1987-02-11 |
| 481404 | 1960-09-25 | Sanjeeva | Eterovic | F | 1986-06-05 |
| 483269 | 1960-09-25 | Mitchel | Pramanik | F | 1997-07-23 |
| 483270 | 1960-09-25 | Geoff | Gulik | F | 1993-11-25 |
| 59683 | 1960-09-25 | Supot | Millington | F | 1991-06-03 |
| 101264 | 1960-09-25 | Mansur | Atchley | F | 1990-05-22 |
| 92453 | 1960-09-25 | Khalid | Trystram | M | 1993-11-10 |
+--------+------------+------------+------------+--------+------------+
10 rows in set (0.20 sec)
```
```
mysql> explain select * from employees order by birth_date limit 200000,10;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
| 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 100.00 | Using filesort |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
1 row in set, 1 warning (0.00 sec)
```
我們可以看到,未優化時使用的是全表掃描,花費0.2s
### 內連線優化
**優化思路**:**我們可以利用主鍵emp_no的索引樹,在索引樹上將符合`order by birth_date limit 200000,10`的元組(即,行)的主鍵找出來,再用內連線返回10行emp_no的所有資訊。**
(內連線只返回表中與連線條件相匹配的行,也就是說,`select emp_no from employees order by birth_date limit 200000,10`只會返回10個emp_no,那麼內連線後,結果集中也只有10個emp_no對應的所有資訊)
(另外這裡的內連線時使用了emp_no,即,子查詢中也有"覆蓋索引"減少磁碟I/O的功勞)
```
mysql> select * from employees inner join (select emp_no from employees order by birth_date limit 200000,10) as temp_table using (emp_no);
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-----------+--------+------------+
| 427365 | 1960-09-24 | Yuping | Sethi | M | 1990-06-21 |
| 424219 | 1960-09-25 | Woody | Bernini | M | 1989-03-10 |
| 469218 | 1960-09-25 | George | Plotkin | M | 1992-02-19 |
| 404121 | 1960-09-25 | Domenico | Birnbaum | M | 1993-08-01 |
| 404266 | 1960-09-25 | Quingbo | Jervis | F | 1985-03-15 |
| 409133 | 1960-09-25 | Nitsan | Kleiser | F | 1985-05-18 |
| 409558 | 1960-09-25 | Shunichi | Hofting | F | 1992-07-06 |
| 412045 | 1960-09-25 | Kristin | Bolotov | F | 1985-06-28 |
| 481404 | 1960-09-25 | Sanjeeva | Eterovic | F | 1986-06-05 |
| 483269 | 1960-09-25 | Mitchel | Pramanik | F | 1997-07-23 |
+--------+------------+------------+-----------+--------+------------+
10 rows in set (0.10 sec)
```
```
mysql> explain select * from employees inner join (select emp_no from employees order by birth_date limit 100000,10) as table_temp using (emp_no);
+----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+
| 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 100010 | 100.00 | NULL |
| 1 | PRIMARY | employees | NULL | eq_ref | PRIMARY | PRIMARY | 4 | table_temp.emp_no | 1 | 100.00 | NULL |
| 2 | DERIVED | employees | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 100.00 | Using filesort |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+
3 rows in set, 1 warning (0.00 sec)
```
可見效率提高了一倍,在explain中
- 第三行的select_type為DERIVED,是指這行是包含在from子句中的查詢,我們可以看到,子句查詢也沒有使用索引
- ``是指,第一行的查詢說明表示當前查詢依賴 id=N 的查詢,此處N=2,那我們先看第二行:
第二行type為`eq_ref`是指primary key 或 unique key 索引被連線(join)使用,,對於每個索引鍵的關聯查詢,返回匹配唯一行資料(有且只有1個)。在這裡就是說在子查詢查詢到emp_no後,子查詢中產生的臨時表與employees表進行連線。
- (對於這裡的explain的解釋只包含了對explain各項引數的解釋,但似乎沒有辦法直接驗證優化思路,還望各位看官前輩指點)
### 為排序欄位加上索引
既然我們在內連線中是通過排序欄位`birth_date`後對`emp_no`進行查詢,那麼我們或許能再為排序欄位加上索引以再次提高效率。
```
mysql> alter table employees add index birthdate_index (birth_date);
Query OK, 0 rows affected (0.75 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> desc employees;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | MUL | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
```
然後我們再次執行未優化和通過內連線優化的兩條查詢語句。
```
mysql> select * from employees order by birth_date limit 200000,10;
+--------+------------+------------+------------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+------------+--------+------------+
| 498507 | 1960-09-24 | Perla | Delgrange | M | 1989-12-08 |
| 494212 | 1960-09-25 | Susuma | Baranowski | M | 1989-05-15 |
| 496888 | 1960-09-25 | Rosalyn | Rebaine | M | 1985-11-27 |
| 497766 | 1960-09-25 | Matt | Atrawala | F | 1987-02-11 |
| 481404 | 1960-09-25 | Sanjeeva | Eterovic | F | 1986-06-05 |
| 483269 | 1960-09-25 | Mitchel | Pramanik | F | 1997-07-23 |
| 483270 | 1960-09-25 | Geoff | Gulik | F | 1993-11-25 |
| 59683 | 1960-09-25 | Supot | Millington | F | 1991-06-03 |
| 101264 | 1960-09-25 | Mansur | Atchley | F | 1990-05-22 |
| 92453 | 1960-09-25 | Khalid | Trystram | M | 1993-11-10 |
+--------+------------+------------+------------+--------+------------+
10 rows in set (0.20 sec)
```
```
mysql> select * from employees inner join (select emp_no from employees order by birth_date limit 200000,10) as temp_table using (emp_no);
+--------+------------+------------+------------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+------------+--------+------------+
| 498507 | 1960-09-24 | Perla | Delgrange | M | 1989-12-08 |
| 23102 | 1960-09-25 | Hsiangchu | Harbusch | M | 1986-03-14 |
| 29961 | 1960-09-25 | Susumu | Munoz | F | 1989-12-31 |
| 32061 | 1960-09-25 | Dipankar | Buescher | M | 1992-10-24 |
| 36216 | 1960-09-25 | Xianlong | Rassart | F | 1987-09-05 |
| 37058 | 1960-09-25 | Khue | Osgood | M | 1991-11-04 |
| 38365 | 1960-09-25 | Sariel | Ramsak | M | 1993-02-26 |
| 39901 | 1960-09-25 | Jianhui | Ushiama | M | 1985-12-03 |
| 59683 | 1960-09-25 | Supot | Millington | F | 1991-06-03 |
| 63784 | 1960-09-25 | Rosita | Zyda | M | 1988-08-12 |
+--------+------------+------------+------------+--------+------------+
10 rows in set (0.03 sec)
```
我們可以看到,普通查詢語句並沒有得到效率上的提升,但是內連線的查詢效率得到了很大的提升,花費時間從原來的0.1s縮減為0.03秒,也就是說,再次優化後的內連線差不多可以應對百萬(甚至千萬級,因為實際生產中所使用的硬體設施肯定會遠遠好與我現在的基礎班ECS)級別的資料了。
對於加上` birthdate_index`索引後普通查詢效率未提升的說明:
因為我們查詢的是`select *`,即使emp_no和birth_date上有索引,在查詢其他列資訊的時候,我們依然需要回表。因此即使加上索引後,我們的普通查詢依然使用的是全表掃描。
## 小結
經過試驗證明,內連線對於order by+雙引數limit有一定效果,在合適的內連線子查詢下,增加相應的索引,能夠使效能進一步提升。從0.2到0.1在到0.03,當縮減一個數量級時,那都是很大的突破。(完結撒花~)
## 最後的補充
- EXPLAIN不會告訴你關於觸發器、儲存過程的資訊或使用者自定義函式對查詢的影響情況
- EXPLAIN不考慮各種Cache
- EXPLAIN不能顯示MySQL在執行查詢時所作的優化工作
- 部分統計資訊是估算的,並非精確值
- EXPALIN只能解釋SELECT操作,其他操作要重寫為SELECT後檢視執行計劃