MySQL效能優化之filesort
阿新 • • 發佈:2018-12-02
MySQL效能優化之filesort
0. 問題
因為專案需求,需要使用一個group + join操作,但是因為表的資料量特別大,導致執行SQL的過程極其緩慢。在azkaban中呼叫大概需要100min。
裡面的SQL怎麼寫的呢?主要如下:
select
t_1.*
,dd.department_type as department_type
,'${DATE_KEY}' as date_key
,current_timestamp as ETL_Created_Date
,current_timestamp as ETL_Modified_Date
,'${PROCESS_ID}' as process_id
,'mnt_fact' as rec_source
,dd.department_name as bns_area_name
from
(select
0 as is_qr
,5 as qualifier_id
···
,sum(mf.bare_order_qt) as bare_order_qt
,sum(mf.bare_logic_bns_amount) as bare_logic_bns_amount
,sum(mf.bare_sell_devices) as bare_sell_devices
,count(mf. device_code) as device_qt
,count(distinct mf.shop_id) as shop_qt
from mnt_fact mf
where date_key <= '${DATE_KEY}'
and date_key >= date_format(date_sub('${DATE_KEY}', interval day('${DATE_KEY}')-1 day),'%Y%m%d')
group by
mf.bns_area1_id
,mf.device_type_class
)t_1
inner join datamart.vw_dim_department_cur dd
on t_1.bns_area_id = dd.department_id
使用explain 檢視執行計劃如下:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1564460
filtered: 100.00
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: dim_department
partitions: NULL
type: ref
possible_keys: idx_department_id
key: idx_department_id
key_len: 8
ref: t_1.bns_area_id
rows: 5
filtered: 10.00
Extra: Using index condition
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: mf
partitions: part_20181101,part_20181102,part_20181103,part_20181104,part_20181105,part_20181106,part_20181107,part_20181108,part_20181109,part_20181110,part_20181111,part_20181112,part_20181113,part_20181114,part_20181115,part_20181116,part_20181117,part_20181118,part_20181119,part_20181120,part_20181121,part_20181122,part_20181123,part_20181124,part_20181125,part_20181126,part_20181127,part_20181128
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 14082963
filtered: 11.11
Extra: Using where; Using filesort
3 rows in set, 1 warning (0.00 sec)
可以看到這裡的第三行,出現了Using filesort
,這個就是在group by時,我們就會出現檔案排序的問題。下面我詳細解釋一下這個filesort
1. filesort
是什麼?
2. 原因
2. 解決辦法
建立索引,避免使用filesort。
檢視原表索引如下:
mysql> show index from mnt_fact;
+----------+------------+-----------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+-----------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| mnt_fact | 0 | PRIMARY | 1 | ID | A | 15902350 | NULL | NULL | | BTREE | | |
| mnt_fact | 0 | PRIMARY | 2 | date_key | A | 15978838 | NULL | NULL | | BTREE | | |
| mnt_fact | 1 | idx_device | 1 | device_code | A | 15914067 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_dept_key | 1 | department_key | A | 60584 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_emp_shop_devtype | 1 | employee_code | A | 66114 | NULL | NULL | | BTREE | | |
| mnt_fact | 1 | idx_emp_shop_devtype | 2 | shop_id | A | 8483195 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_emp_shop_devtype | 3 | device_type_class | A | 7341482 | NULL | NULL | YES | BTREE | | |
可以看到是沒有mf.bns_area1_id
和 mf.device_type_class
的索引,雖然這個索引的Cardinality
是非常的小,但是因為其可以在group by中起到非常重要的作用,所以在執行上述這個SQL時,仍然會按照索引group。
針對group by 欄位新增如下索引:
| mnt_fact | 1 | idx_bns6_deviceTypeClass_empCode_shopID | 1 | bns_area6_id | A | 21778 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_bns6_deviceTypeClass_empCode_shopID | 2 | device_type_class | A | 66270 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_bns6_deviceTypeClass_empCode_shopID | 3 | employee_code | A | 176908 | NULL | NULL | | BTREE | | |
| mnt_fact | 1 | idx_bns6_deviceTypeClass_empCode_shopID | 4 | shop_id | A | 7485426 | NULL | NULL | YES | BTREE | | |
執行計劃如下:
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: mf
partitions: part_20181101,part_20181102,part_20181103,part_20181104,part_20181105,part_20181106,part_20181107,part_20181108,part_20181109,part_20181110,part_20181111,part_20181112,part_20181113,part_20181114,part_20181115,part_20181116,part_20181117,part_20181118,part_20181119,part_20181120,part_20181121,part_20181122,part_20181123,part_20181124,part_20181125,part_20181126,part_20181127,part_20181128
type: index
possible_keys: idx_device_type_class_bns4,idx_bns1_bns2_bns3_bns4_bns5_bns6_deviceTypeClass
key: idx_device_type_class_bns4
key_len: 214
ref: NULL
rows: 14103728
filtered: 11.11
Extra: Using where
3 rows in set, 1 warning (0.00 sec)
可以看到這次沒有執行filesort
。然後執行SQL會從原先的100min 縮減到10 min。