1. 程式人生 > 資料庫 >03 mysql索引優化-tuling

03 mysql索引優化-tuling

st:2020年12月15日
et:2020年12月17日

01 mysql如何選擇合適的索引

# employees表的資料結構如下:
mysql> select * from employees order by id limit 10;
+----+-----------+-----+----------+---------------------+
| id | name      | age | position | hire_time           |
+----+-----------+-----+----------+---------------------+
|  4 | LiLei     |  22 | manager  | 2020-12-14 21:08:18 |
|  5 | HanMeimei |  23 | dev      | 2020-12-14 21:08:18 |
|  6 | Lucy      |  23 | dev      | 2020-12-14 21:08:18 |
|  7 | user7     |  21 | dev      | 2020-12-15 20:46:20 |
|  8 | user8     |  28 | dev      | 2020-12-15 20:46:20 |
|  9 | user9     |  17 | dev      | 2020-12-15 20:46:20 |
| 10 | user10    |  23 | dev      | 2020-12-15 20:46:20 |
| 11 | user11    |  29 | dev      | 2020-12-15 20:46:20 |
| 12 | user12    |  32 | dev      | 2020-12-15 20:46:20 |
| 13 | user13    |  21 | dev      | 2020-12-15 20:46:20 |
+----+-----------+-----+----------+---------------------+
10 rows in set (0.00 sec)
mysql> select * from employees order by id desc limit 3;     
+--------+------------+-----+----------+---------------------+
| id     | name       | age | position | hire_time           |
+--------+------------+-----+----------+---------------------+
| 432828 | user432828 |  31 | dev      | 2020-12-15 20:58:34 |
| 432827 | user432827 |  25 | dev      | 2020-12-15 20:58:34 |
| 432826 | user432826 |  57 | dev      | 2020-12-15 20:58:34 |
+--------+------------+-----+----------+---------------------+
3 rows in set (0.00 sec)
# 表總共432825行,滿足以下查詢條件的有432825行,而rows=432390,這是一個估計值。 
mysql> explain select * from employees where name > 'a';
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys         | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | idx_name_age_position | NULL | NULL    | NULL | 432390 |    50.00 | Using where |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

如果用name索引需要遍歷name欄位聯合索引樹,然後還需要根據遍歷出來的主鍵值去主鍵索引樹裡再去查出最終資料,成本比全表掃描還高,可以用覆蓋索引優化,這樣只需要遍歷name欄位的聯合索引樹就能拿到所有結果,如下:

note:結合InnoDB索引的主鍵索引和非主鍵索引B+Tree來思考,以上sql是查了非主鍵索引B+Tree,得到了主鍵索引,還要去主鍵索引的B+Tree裡面查select*中除聯合索引(name、age、position)之外的hire_time欄位。

mysql> explain select name, age, position from employees where name > 'a';
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL | 216195 |   100.00 | Using where; Using index |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

為什麼如下情況又會走索引?因為usf大於use*(同理,v也可以,因為v大於u*)。

mysql> explain select * from employees where name > 'usf';
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL |    1 |   100.00 | Using index condition |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

對於上面這兩種name>'a'和name>'usf'的執行結果,mysql最終是否選擇走索引或者一張表涉及多個索引,mysql最終如何選擇索引,我們可以用trace工具來一查究竟,開啟trace工具會影響mysql效能,所以只能臨時分析sql使用,用完之後立即關閉。

(1.1) trace工具用法

# 開啟trace
mysql> set session optimizer_trace="enabled=on",end_markers_in_json=on; 

mysql> select * from employees where name > 'a' order by position;
mysql> SELECT * FROM information_schema.OPTIMIZER_TRACE;

# 分析完後立即關閉trace
mysql> set session optimizer_trace="enabled=off";
{
  "steps": [
    {
      // 第1階段:sql準備階段
      "join_preparation": { 
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `employees`.`id` AS `id`,`employees`.`name` AS `name`,`employees`.`age` AS `age`,`employees`.`position` AS `position`,`employees`.`hire_time` AS `hire_time` from `employees` where (`employees`.`name` > 'a') order by `employees`.`position`"
          }
        ] /* steps */
      } /* join_preparation */
    },
    {
      // 第2階段:sql優化階段
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            // 條件處理
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`employees`.`name` > 'a')",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                }
              ] /* steps */
            } /* condition_processing */
          },
          {
            "substitute_generated_columns": {
            } /* substitute_generated_columns */
          },
          {
            // 表依賴詳情
            "table_dependencies": [
              {
                "table": "`employees`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              }
            ] /* table_dependencies */
          },
          {
            "ref_optimizer_key_uses": [
            ] /* ref_optimizer_key_uses */
          },
          {
            // 預估表的訪問成本
            "rows_estimation": [
              {
                "table": "`employees`",
                "range_analysis": {
                  // 全表掃描
                  "table_scan": {
                    // 掃描行數
                    "rows": 432390,
                    // 查詢成本
                    "cost": 87795
                  } /* table_scan */,
                  // 查詢表可能使用的索引
                  "potential_range_indexes": [
                    {
                      // 主鍵索引
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      // 輔助索引
                      "index": "idx_name_age_position",
                      "usable": true,
                      "key_parts": [
                        "name",
                        "age",
                        "position",
                        "id"
                      ] /* key_parts */
                    }
                  ] /* potential_range_indexes */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  } /* group_index_range */,
                  // 分析各個索引使用成本
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "idx_name_age_position",
                        // 索引使用範圍
                        "ranges": [
                          "a < name"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        // 使用該索引獲取的記錄是否按照主鍵排序
                        "rowid_ordered": false,
                        "using_mrr": false,
                        // 是否使用覆蓋索引
                        "index_only": false,
                        // 索引掃描行數
                        "rows": 216195,
                        // 索引使用成本
                        "cost": 259435,
                        // 是否選擇該索引
                        "chosen": false,
                        "cause": "cost"
                      }
                    ] /* range_scan_alternatives */,
                    "analyzing_roworder_intersect": {
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    } /* analyzing_roworder_intersect */
                  } /* analyzing_range_alternatives */
                } /* range_analysis */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`employees`",
                // 最優訪問路徑
                "best_access_path": {
                  // 最終選擇的訪問路徑
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 432390,
                      // 訪問型別:為scan,全表掃描
                      "access_type": "scan",
                      "resulting_rows": 432390,
                      "cost": 87793,
                      // 確定選擇
                      "chosen": true,
                      "use_tmp_table": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "condition_filtering_pct": 100,
                "rows_for_plan": 432390,
                "cost_for_plan": 87793,
                "sort_cost": 432390,
                "new_cost_for_plan": 520183,
                "chosen": true
              }
            ] /* considered_execution_plans */
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "(`employees`.`name` > 'a')",
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                {
                  "table": "`employees`",
                  "attached": "(`employees`.`name` > 'a')"
                }
              ] /* attached_conditions_summary */
            } /* attaching_conditions_to_tables */
          },
          {
            "clause_processing": {
              "clause": "ORDER BY",
              "original_clause": "`employees`.`position`",
              "items": [
                {
                  "item": "`employees`.`position`"
                }
              ] /* items */,
              "resulting_clause_is_simple": true,
              "resulting_clause": "`employees`.`position`"
            } /* clause_processing */
          },
          {
            "reconsidering_access_paths_for_index_ordering": {
              "clause": "ORDER BY",
              "index_order_summary": {
                "table": "`employees`",
                "index_provides_order": false,
                "order_direction": "undefined",
                "index": "unknown",
                "plan_changed": false
              } /* index_order_summary */
            } /* reconsidering_access_paths_for_index_ordering */
          },
          {
            "refine_plan": [
              {
                "table": "`employees`"
              }
            ] /* refine_plan */
          }
        ] /* steps */
      } /* join_optimization */
    },
    {
      // 第3階段:sql執行階段
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "filesort_information": [
              {
                "direction": "asc",
                "table": "`employees`",
                "field": "position"
              }
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": {
              "usable": false,
              "cause": "not applicable (no LIMIT)"
            } /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": {
              "rows": 432825,
              "examined_rows": 432825,
              "number_of_tmp_files": 127,
              "sort_buffer_size": 262056,
              "sort_mode": "<sort_key, packed_additional_fields>"
            } /* filesort_summary */
          }
        ] /* steps */
      } /* join_execution */
    }
  ] /* steps */
}

結論:全表掃描的成本低於索引掃描,所以mysql最終選擇全表掃描。

# 開啟trace
mysql> set session optimizer_trace="enabled=on",end_markers_in_json=on; 

mysql> select * from employees where name > 'usf' order by position;
mysql> SELECT * FROM information_schema.OPTIMIZER_TRACE;
 
分析完後立即關閉trace
mysql> set session optimizer_trace="enabled=off";

檢視trace欄位可知索引掃描的成本低於全表掃描,所以mysql最終選擇索引掃描。(trace記錄如下:)

{
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `employees`.`id` AS `id`,`employees`.`name` AS `name`,`employees`.`age` AS `age`,`employees`.`position` AS `position`,`employees`.`hire_time` AS `hire_time` from `employees` where (`employees`.`name` > 'usf') order by `employees`.`position`"
          }
        ] /* steps */
      } /* join_preparation */
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`employees`.`name` > 'usf')",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`employees`.`name` > 'usf')"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "(`employees`.`name` > 'usf')"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "(`employees`.`name` > 'usf')"
                }
              ] /* steps */
            } /* condition_processing */
          },
          {
            "substitute_generated_columns": {
            } /* substitute_generated_columns */
          },
          {
            "table_dependencies": [
              {
                "table": "`employees`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              }
            ] /* table_dependencies */
          },
          {
            "ref_optimizer_key_uses": [
            ] /* ref_optimizer_key_uses */
          },
          {
            "rows_estimation": [
              {
                "table": "`employees`",
                "range_analysis": {
                  "table_scan": {
                    "rows": 432390,
                    "cost": 87795
                  } /* table_scan */,
                  "potential_range_indexes": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "idx_name_age_position",
                      "usable": true,
                      "key_parts": [
                        "name",
                        "age",
                        "position",
                        "id"
                      ] /* key_parts */
                    }
                  ] /* potential_range_indexes */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  } /* group_index_range */,
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "idx_name_age_position",
                        "ranges": [
                          "usf < name"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 1,
                        "cost": 2.21,
                        "chosen": true
                      }
                    ] /* range_scan_alternatives */,
                    "analyzing_roworder_intersect": {
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    } /* analyzing_roworder_intersect */
                  } /* analyzing_range_alternatives */,
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "idx_name_age_position",
                      "rows": 1,
                      "ranges": [
                        "usf < name"
                      ] /* ranges */
                    } /* range_access_plan */,
                    "rows_for_plan": 1,
                    "cost_for_plan": 2.21,
                    "chosen": true
                  } /* chosen_range_access_summary */
                } /* range_analysis */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`employees`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 1,
                      "access_type": "range",
                      "range_details": {
                        "used_index": "idx_name_age_position"
                      } /* range_details */,
                      "resulting_rows": 1,
                      "cost": 2.41,
                      "chosen": true,
                      "use_tmp_table": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "condition_filtering_pct": 100,
                "rows_for_plan": 1,
                "cost_for_plan": 2.41,
                "sort_cost": 1,
                "new_cost_for_plan": 3.41,
                "chosen": true
              }
            ] /* considered_execution_plans */
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "(`employees`.`name` > 'usf')",
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                {
                  "table": "`employees`",
                  "attached": "(`employees`.`name` > 'usf')"
                }
              ] /* attached_conditions_summary */
            } /* attaching_conditions_to_tables */
          },
          {
            "clause_processing": {
              "clause": "ORDER BY",
              "original_clause": "`employees`.`position`",
              "items": [
                {
                  "item": "`employees`.`position`"
                }
              ] /* items */,
              "resulting_clause_is_simple": true,
              "resulting_clause": "`employees`.`position`"
            } /* clause_processing */
          },
          {
            "reconsidering_access_paths_for_index_ordering": {
              "clause": "ORDER BY",
              "index_order_summary": {
                "table": "`employees`",
                "index_provides_order": false,
                "order_direction": "undefined",
                "index": "idx_name_age_position",
                "plan_changed": false
              } /* index_order_summary */
            } /* reconsidering_access_paths_for_index_ordering */
          },
          {
            "refine_plan": [
              {
                "table": "`employees`",
                "pushed_index_condition": "(`employees`.`name` > 'usf')",
                "table_condition_attached": null
              }
            ] /* refine_plan */
          }
        ] /* steps */
      } /* join_optimization */
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "filesort_information": [
              {
                "direction": "asc",
                "table": "`employees`",
                "field": "position"
              }
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": {
              "usable": false,
              "cause": "not applicable (no LIMIT)"
            } /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": {
              "rows": 0,
              "examined_rows": 0,
              "number_of_tmp_files": 0,
              "sort_buffer_size": 262056,
              "sort_mode": "<sort_key, packed_additional_fields>"
            } /* filesort_summary */
          }
        ] /* steps */
      } /* join_execution */
    }
  ] /* steps */
}

02 常見sql深入優化

(2.1) order by與group by優化

(1) case:explain select * from employees where name = 'LiLei' and position = 'manager' order by age;

mysql> explain select * from employees where name = 'LiLei' and position = 'manager' order by age;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |    10.00 | Using index condition |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

利用最左字首法則:中間欄位不能斷,因此查詢用到了name索引,從key_len=74也能看出;age索引列用在排序過程中,因為Extra欄位裡沒有using filesort

note:結合聯合索引B+Tree來思考,通過name和position,已經圈定了一些符合要求的資料,再通過age進行排序,而在這個圈定的範圍中,其實已經通過age進行排序過了,所以age索引列會用在排序過程中。

(2) case:explain select * from employees where name = 'LiLei' order by position;

mysql> explain select * from employees where name = 'LiLei' order by position;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

從explain的執行結果來看:key_len=74,查詢使用了name索引,由於用了position進行排序,跳過了age,出現了Using filesort

(3) case:explain select * from employees where name = 'LiLei' order by age, position;

mysql> explain select * from employees where name = 'LiLei' order by age, position;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |   100.00 | Using index condition |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

查詢只用到索引name,age和position用於排序,無Using filesort

(4) case:explain select * from employees where name = 'LiLei' order by position, age;

mysql> explain select * from employees where name = 'LiLei' order by position, age;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

和case(3)中explain的執行結果一樣,但是出現了Using filesort,因為索引的建立順序為name,age,position,但是排序的時候age和position顛倒位置了。

(5) case:explain select * from employees where name = 'LiLei' and age = 22 order by position, age;

mysql> explain select * from employees where name = 'LiLei' and age = 22 order by position, age;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-----------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref         | rows | filtered | Extra                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 78      | const,const |    1 |   100.00 | Using index condition |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

與case(4)對比,在Extra中並未出現Using filesort,因為age為常量,在排序中被優化,所以索引未顛倒,不會出現Using filesort

(6) case:explain select * from employees where name = 'LiLei' order by age asc, position desc;

mysql> explain select * from employees where name = 'LiLei' order by age asc, position desc;
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
| id | select_type | table     | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra                                 |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_name_age_position | idx_name_age_position | 74      | const |    1 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

雖然排序的欄位列與索引順序一樣,且order by預設升序,這裡position desc變成了降序,導致與索引的
排序方式不同,從而產生Using filesort。mysql8以上版本有降序索引可以支援該種查詢方式。

(7) case:explain select * from employees where name in ('LiLei','user10') order by age, position;

mysql> explain select * from employees where name in ('LiLei','user10') order by age, position;
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL |    2 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.01 sec)

對於排序來說,多個相等條件也是範圍查詢。

(8) case:explain select * from employees where name > 'a' order by name;

mysql> explain select * from employees where name > 'a' order by name;
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-----------------------------+
| id | select_type | table     | partitions | type | possible_keys         | key  | key_len | ref  | rows   | filtered | Extra                       |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-----------------------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | idx_name_age_position | NULL | NULL    | NULL | 432390 |    50.00 | Using where; Using filesort |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+--------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)

# 可以用覆蓋索引優化
mysql> explain select name, age, position from employees where name > 'a' order by name;
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL | 216195 |   100.00 | Using where; Using index |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

(2.2) order by優化總結

  1. mysql支援兩種方式的排序filesortindex,Using index是指mysql掃描索引本身完成排序。index效率高,filesort效率低。
  2. order by滿足兩種情況會使用Using index。
    • (1) order by語句使用索引最左字首法則。
    • (2) 使用where子句與order by子句條件列組合滿足索引最左字首法則。
  3. 儘量在索引列上完成排序,遵循索引建立(索引建立的順序)時的最左字首法則。
  4. 如果order by的條件不在索引列上,就會產生Using filesort。
  5. 能用覆蓋索引儘量用覆蓋索引。
  6. group by與order by很類似,其實質是先排序後分組,遵照索引建立順序的最左字首法則。對於group by的優化如果不需要排序的可以加上order by null禁止排序。注意,where高於having,能寫在where中的限定條件就不要去having限定了。

(2.3) Using filesort檔案排序原理詳解

filesort檔案排序方式:(針對filesort排序不能優化到index排序的優化思路)

(1) 單路排序:是一次性取出滿足條件行的所有欄位,然後在sort buffer中進行排序;用trace工具可以看到sort_mode資訊裡顯示<sort_key, additional_fields>或者<sort_key, packed_additional_fields>

note:< sort_key, additional_fields >,sort_key:排序欄位;additional_fields:其他欄位。

(2) 雙路排序(又叫回表排序模式):是首先根據相應的條件取出相應的排序欄位可以直接定位行資料的欄位(如primary id或unique index),然後在sort buffer中進行排序,排序完後需要再次取回其他需要的欄位;用trace工具可以看到sort_mode資訊裡顯示<sort_key, rowid>

note:< sort_key, rowid >,sort_key:排序欄位;rowid:可以直接定位行的欄位。

mysql通過比較系統變數max_length_for_sort_data(預設1024位元組)的大小和需要查詢的欄位總大小來判斷使用哪種排序模式。(需要查詢的欄位總大小:比如name,age,position,則總大小為140。)

(1) 如果max_length_for_sort_data比查詢欄位的總長度大,那麼使用單路排序模式;

(2) 如果max_length_for_sort_data比查詢欄位的總長度小,那麼使用雙路排序模式。

set session optimizer_trace="enabled=on",end_markers_in_json=on; 

select * from employees where name = 'user' order by position;
SELECT * FROM information_schema.OPTIMIZER_TRACE;

{
  // sql執行階段
  "join_execution": {
    "select#": 1,
    "steps": [
      {
        "filesort_information": [
          {
            "direction": "asc",
            "table": "`employees`",
            "field": "position"
          }
        ] /* filesort_information */,
        "filesort_priority_queue_optimization": {
          "usable": false,
          "cause": "not applicable (no LIMIT)"
        } /* filesort_priority_queue_optimization */,
        "filesort_execution": [
        ] /* filesort_execution */,
        // 檔案排序資訊
        "filesort_summary": {
          // 預估掃描行數
          "rows": 0,
          // 引數排序的行
          "examined_rows": 0,
          // 臨時使用檔案的個數,這裡值為0代表全部使用sort_buffer記憶體排序,否則使用的磁碟檔案排序
          "number_of_tmp_files": 0,
          // 排序快取的大小(即sort_buffer的大小)
          "sort_buffer_size": 262056,
          // 排序方式,這裡使用單路排序
          "sort_mode": "<sort_key, packed_additional_fields>"
        } /* filesort_summary */
      }
    ] /* steps */
  } /* join_execution */
}
set max_length_for_sort_data = 10

select * from employees where name = 'user' order by position;
SELECT * FROM information_schema.OPTIMIZER_TRACE;

set session optimizer_trace="enabled=off";
{
  "join_execution": {
    "select#": 1,
    "steps": [
      {
        "filesort_information": [
          {
            "direction": "asc",
            "table": "`employees`",
            "field": "position"
          }
        ] /* filesort_information */,
        "filesort_priority_queue_optimization": {
          "usable": false,
          "cause": "not applicable (no LIMIT)"
        } /* filesort_priority_queue_optimization */,
        "filesort_execution": [
        ] /* filesort_execution */,
        "filesort_summary": {
          "rows": 0,
          "examined_rows": 0,
          "number_of_tmp_files": 0,
          "sort_buffer_size": 262136,
          // 雙路排序
          "sort_mode": "<sort_key, rowid>"
        } /* filesort_summary */
      }
    ] /* steps */
  } /* join_execution */
}
  • 單路排序的詳細過程:
  1. 從索引name找到第一個滿足name = 'user'條件的主鍵id;
  2. 根據主鍵id取出整行,取出所有欄位的值,存入sort_buffer中;
  3. 從索引name找到下一個滿足name = 'user'條件的主鍵id;
  4. 重複步驟2、3直到不滿足name = 'user';
  5. 對sort_buffer中的資料按照欄位position進行排序;
  6. 返回結果給客戶端;
  • 雙路排序的詳細過程:
  1. 從索引name找到第一個滿足name = 'user'的主鍵id;
  2. 根據主鍵id取出整行,把排序欄位position和主鍵id這兩個欄位放到sort buffer中;
  3. 從索引name取下一個滿足name = 'user'記錄的主鍵id;
  4. 重複3、4直到不滿足name = 'user';
  5. 對sort_buffer中的欄位position和主鍵id按照欄位position進行排序;
  6. 遍歷排序好的id和欄位position,按照id的值回到原表中取出所有欄位的值返回給客戶端;

其實對比兩個排序模式,單路排序會把所有需要查詢的欄位都放到sort buffer中,而雙路排序只會把主鍵和需要排序的欄位放到sort buffer中進行排序,然後再通過主鍵回到原表查詢需要的欄位。

如果mysql中排序記憶體配置的比較小(即記憶體小導致配的sort_buffer小)並且沒有條件繼續增加了,可以適當把max_length_for_sort_data配置小點,讓優化器選擇使用雙路排序演算法,可以在sort_buffer中一次排序更多的行,只是需要再根據主鍵回到原表取資料。

如果mysql排序記憶體有條件可以配置比較大,可以適當增大max_length_for_sort_data的值,讓優化器優先選擇全欄位排序(單路排序),把需要的欄位放到sort_buffer中,這樣排序後就會直接從記憶體裡返回查詢結果了。

所以,mysql通過max_length_for_sort_data這個引數來控制排序,在不同場景使用不同的排序模式,從而提升排序效率。(即通過調整這個引數,來走sort_buffer排序,在記憶體排序,從而提升排序效率。)

note:如果全部使用sort_buffer記憶體排序一般情況下效率會高於磁碟檔案排序,但不能因為這個就隨便增大sort_buffer(預設1M),mysql很多引數設定都是做過優化的,不要輕易調整。

(2.4) 分頁查詢優化

很多時候我們業務系統實現分頁功能可能會用如下sql實現:

select * from employees limit 10000, 10;

表示從表employees中取出從10001行開始的10行記錄。看似只查詢了10條記錄,實際這條SQL是先讀取10010條記錄,然後拋棄前10000條記錄,然後讀到後面10條想要的資料。因此要查詢一張大表比較靠後的資料,執行效率是非常低的。

(1) 根據自增且連續的主鍵排序的分頁查詢

mysql> select * from employees limit 400000, 5;
+--------+------------+-----+----------+---------------------+
| id     | name       | age | position | hire_time           |
+--------+------------+-----+----------+---------------------+
| 400004 | user400004 |  30 | dev      | 2020-12-15 20:57:38 |
| 400005 | user400005 |  84 | dev      | 2020-12-15 20:57:38 |
| 400006 | user400006 |  28 | dev      | 2020-12-15 20:57:38 |
| 400007 | user400007 |  31 | dev      | 2020-12-15 20:57:38 |
| 400008 | user400008 |  16 | dev      | 2020-12-15 20:57:38 |
+--------+------------+-----+----------+---------------------+
5 rows in set (0.10 sec)

mysql> explain select * from employees limit 400000, 5;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 432390 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
1 row in set, 1 warning (0.00 sec)

該sql表示查詢從第400001開始的五行資料,沒新增單獨order by,表示通過主鍵排序。我們再看錶employees,如果主鍵是自增並且連續的,所以可以改寫成按照主鍵去查詢從第400001開始的五行資料,如下:

mysql> select * from employees where id > 400000 limit 5;
+--------+------------+-----+----------+---------------------+
| id     | name       | age | position | hire_time           |
+--------+------------+-----+----------+---------------------+
| 400001 | user400001 |  28 | dev      | 2020-12-15 20:57:38 |
| 400002 | user400002 |  19 | dev      | 2020-12-15 20:57:38 |
| 400003 | user400003 |  95 | dev      | 2020-12-15 20:57:38 |
| 400004 | user400004 |  30 | dev      | 2020-12-15 20:57:38 |
| 400005 | user400005 |  84 | dev      | 2020-12-15 20:57:38 |
+--------+------------+-----+----------+---------------------+
5 rows in set (0.00 sec)
mysql> explain select * from employees where id > 400000 limit 5;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL | 66638 |   100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

顯然改寫後的sql走了索引,而且掃描的行數大大減少,執行效率更高。但是,這條改寫的sql在很多場景並不實用,因為表中可能某些記錄被刪後,主鍵空缺,導致結果不一致。(如上結果就不一致)

兩條sql的結果並不一樣,因此,如果主鍵不連續,不能使用上面描述的優化方法。另外如果原sql是order by非主鍵的欄位,按照上面說的方法改寫會導致兩條sql的結果不一致。所以這種改寫得滿足以下兩個條件:

  • 主鍵自增且連續
  • 結果是按照主鍵排序的

(2) 根據非主鍵欄位排序的分頁查詢

再看一個根據非主鍵欄位排序的分頁查詢,sql如下:

mysql> select * from employees order by name limit 400000, 5;
+-------+-----------+-----+----------+---------------------+
| id    | name      | age | position | hire_time           |
+-------+-----------+-----+----------+---------------------+
| 70456 | user70456 |  22 | dev      | 2020-12-15 20:48:20 |
| 70457 | user70457 |  26 | dev      | 2020-12-15 20:48:20 |
| 70458 | user70458 |  23 | dev      | 2020-12-15 20:48:20 |
| 70459 | user70459 |  12 | dev      | 2020-12-15 20:48:20 |
|  7046 | user7046  |  36 | dev      | 2020-12-15 20:46:33 |
+-------+-----------+-----+----------+---------------------+
5 rows in set (0.39 sec)

mysql> explain select * from employees order by name limit 400000, 5;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra          |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 432389 |   100.00 | Using filesort |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

發現並沒有使用name欄位的索引(key欄位對應的值為NULL),具體原因是:掃描整個索引並查詢到沒索引
的行(可能要遍歷多個索引樹)的成本比掃描全表的成本更高,所以優化器放棄使用索引。

知道不走索引的原因,那麼怎麼優化呢?其實關鍵是讓排序時返回的欄位儘可能少(即儘可能使用覆蓋索引,而不是select*),所以可以讓排序和分頁操作先查出主鍵(這個肯定會走主鍵索引、因為查詢的是覆蓋索引),然後根據主鍵查到對應的記錄,sql改寫如下:

# 雖然查了兩次,查第1次後進行回表查第2次,但是效率還是要比上面的高。
mysql> select * from employees e inner join (select id from employees order by name limit 400000, 5) ed on e.id = ed.id;
+-------+-----------+-----+----------+---------------------+-------+
| id    | name      | age | position | hire_time           | id    |
+-------+-----------+-----+----------+---------------------+-------+
| 70456 | user70456 |  22 | dev      | 2020-12-15 20:48:20 | 70456 |
| 70457 | user70457 |  26 | dev      | 2020-12-15 20:48:20 | 70457 |
| 70458 | user70458 |  23 | dev      | 2020-12-15 20:48:20 | 70458 |
| 70459 | user70459 |  12 | dev      | 2020-12-15 20:48:20 | 70459 |
|  7046 | user7046  |  36 | dev      | 2020-12-15 20:46:33 |  7046 |
+-------+-----------+-----+----------+---------------------+-------+
5 rows in set (0.10 sec)

mysql> explain select * from employees e inner join (select id from employees order by name limit 400000, 5) ed on e.id = ed.id;
+----+-------------+------------+------------+--------+---------------+-----------------------+---------+-------+--------+----------+-------------+
| id | select_type | table      | partitions | type   | possible_keys | key                   | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+------------+------------+--------+---------------+-----------------------+---------+-------+--------+----------+-------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL                  | NULL    | NULL  | 400005 |   100.00 | NULL        |
|  1 | PRIMARY     | e          | NULL       | eq_ref | PRIMARY       | PRIMARY               | 4       | ed.id |      1 |   100.00 | NULL        |
|  2 | DERIVED     | employees  | NULL       | index  | NULL          | idx_name_age_position | 140     | NULL  | 400005 |   100.00 | Using index |
+----+-------------+------------+------------+--------+---------------+-----------------------+---------+-------+--------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

需要的結果與原sql一致,執行時間減少了一半以上,我們再對比優化前後sql的執行計劃,原sql使用的是filesort排序,而優化後的sql使用的是index排序。

(2.5) join關聯查詢優化

建立實驗表:(如果要組合欄位,如user100,可以用CONCAT('user',100)

CREATE TABLE `t1`(
`id` int(11) NOT NULL AUTO_INCREMENT,
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

create table t2 like t1;

# 注意:刪除儲存過程procedure不需要後面的括號。
drop procedure if exists insert_t
delimiter ;;
create procedure insert_t()
begin
	declare i int;
	set i=1;
	while(i<=10000)do
		insert into t1(a,b) values(i,i);
		set i=i+1;
	end while;
end;;
delimiter ;;
call insert_t();

# t2表中插入100行資料,t1表中是插入了10000行資料。
drop procedure if exists insert_t
delimiter ;;
create procedure insert_t()
begin
	declare i int;
	set i=1;
	while(i<=100)do
		insert into t2(a,b) values(i,i);
		set i=i+1;
	end while;
end;;
delimiter ;;
call insert_t();

mysql的表關聯常見有兩種演算法:

  • Nested-Loop Join演算法
  • Block Nested-Loop Join演算法

(1) 巢狀迴圈連線Nested-Loop Join(NLJ)演算法

一次一行迴圈地從第一張表(稱為驅動表)中讀取行,在這行資料中取到關聯欄位,根據關聯欄位在另一張表(被驅動表)裡取出滿足條件的行,然後取出兩張表的結果合集。

mysql> select * from t1 inner join t2 on t1.a = t2.a;
+-----+------+------+-----+------+------+
| id  | a    | b    | id  | a    | b    |
+-----+------+------+-----+------+------+
|   1 |    1 |    1 |   1 |    1 |    1 |
|   2 |    2 |    2 |   2 |    2 |    2 |
|   3 |    3 |    3 |   3 |    3 |    3 |
|  .. |   .. |   .. |  .. |   .. |   .. |
|  99 |   99 |   99 |  99 |   99 |   99 |
| 100 |  100 |  100 | 100 |  100 |  100 |
+-----+------+------+-----+------+------+

# 從上往下執行,先查t2表,再和t1表關聯查詢。(執行計劃結果的id如果一樣則按從上到下順序執行sql)
mysql> explain select * from t1 inner join t2 on t1.a = t2.a;
+----+-------------+-------+------------+------+---------------+-------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+-------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | t2    | NULL       | ALL  | idx_a         | NULL  | NULL    | NULL        |  100 |   100.00 | Using where |
|  1 | SIMPLE      | t1    | NULL       | ref  | idx_a         | idx_a | 5       | tuling.t2.a |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+------+---------------+-------+---------+-------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

從執行計劃中可以看到這些資訊:

  • 驅動表是t2,被驅動表是t1,優化器一般會優先選擇小表做驅動表。所以使用inner join時,排在前面的表並不一定就是驅動表。
  • 使用了NLJ演算法。一般join語句中,如果執行計劃Extra中未出現Using join buffer則表示使用的join演算法是NLJ

上面sql的大致流程如下:

  • (1) 從表t2中讀取一行資料;
  • (2) 從第1步的資料中,取出關聯欄位a,到表t1中查詢;
  • (3) 取出表t1中滿足條件的行,跟t2中獲取到的結果合併,作為結果返回給客戶端;
  • (4) 重複上面三步。

整個過程會讀取t2表的所有資料(掃描100行),然後遍歷這每行資料中欄位a的值,根據t2表中a的值索引掃描t1表中的對應行(掃描100次t1表的索引,1次掃描可以認為最終只掃描t1表一行完整資料,也就是總共t1表也掃描了100行)。因此整個過程掃描了200行。

Note:由於掃描t1表的索引很快,就把它看成掃描1次索引就能得到1行t1表中的資料;實際上B+Tree索引樹的高度是2-4,所以確切的說應該是掃描索引200-400次。即總掃描了的次數為300-500。

如果被驅動表的關聯欄位沒索引,使用NLJ演算法效能會比較低,mysql會選擇Block Nested-Loop Join演算法。

(2) 基於塊的巢狀迴圈連線Block Nested-Loop Join(BNL)演算法

驅動表的資料讀入到join_buffer中,然後掃描被驅動表,把被驅動表每一行取出來跟join_buffer中的資料做對比。

mysql>   explain select * from t1 inner join t2 on t1.b = t2.b;
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                                              |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------------------------------------------+
|  1 | SIMPLE      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   100 |   100.00 | NULL                                               |
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10337 |    10.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

Extra中的Using join buffer(Block Nested Loop)說明該關聯查詢使用的是BNL演算法。

上面sql的大致流程如下:

  • (1) 把t2的所有資料放入到join_buffer中;
  • (2) 把表t1中每一行取出來,跟join_buffer中的資料做對比;
  • (3) 返回滿足join條件的資料。

整個過程對錶t1和t2都做了一次全表掃描,因此掃描的總行數為10000(表t1的資料總量)+100(表t2的資料總量)=10100行。並且join_buffer裡的資料是無序的,因此對錶t1中的每一行,都要做100次判斷,所以記憶體中的判斷次數是100*10000=100萬次。

Question:被驅動表的關聯欄位沒索引為什麼要選擇使用BNL演算法而不使用NLJ呢?

如果上面第二條sql使用NLJ演算法,那麼掃描行數為100*10000=100萬行,這個是磁碟掃描。而使用BNL演算法,掃描行數是10100行,判斷次數是100萬。很顯然,用BNL磁碟掃描次數少很多,相比於磁碟掃描,BNL記憶體計算會快得多。因此mysql對於被驅動表的關聯欄位沒索引的關聯查詢,一般都會使用BNL演算法。如果有索引一般選擇NLJ演算法,有索引的情況下NLJ演算法比BNL演算法效能更高

對於關聯sql的優化:

  • (1) 關聯欄位加索引,讓mysql做join操作時儘量選擇NLJ演算法;
  • (2) 小表驅動大表,寫多表連線sql時如果明確知道哪張表是小表可以用straight_join寫法固定連線驅動方式,省去mysql優化器自己判斷的時間。

straight_join解釋:straight_join功能同join類似,但能讓左邊的表來驅動右邊的表,能改表優化器對於聯表查詢的執行順序。比如:select * from t2 straight_join t1 on t2.a = t1.a; 代表指定mysql選擇t2表作為驅動表。

  • (1) straight_join只適用於inner join,並不適用於left join,right join。(因為left join,right join已經代表指定了表的執行順序。)
  • (2) 儘可能讓優化器去判斷,因為大部分情況下mysql優化器是比人要聰明的。使用straight_join一定要慎重,因為部分情況下人為指定的執行順序並不一定會比優化引擎要靠譜。

(2.6) in和exsits優化

原則:小表驅動大表,即小的資料集驅動大的資料集。

(1) in:當B表的資料集小於A表的資料集時,in優於exists。(即如下寫法適用於該情況)

select * from A where id in (select id from B);

# 等價於,因為希望for迴圈的次數越小越好
for(select id from B){
    select * from A where A.id = B.id;
}

Note:以下兩種情況,都是先查詢的t2表,而不一定是先執行in()裡面的sql。mysql在底層做了很多優化操作。

mysql> explain select * from t1 where id in (select id from t2);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref          | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
|  1 | SIMPLE      | t2    | NULL       | index  | PRIMARY       | idx_a   | 5       | NULL         |  100 |   100.00 | Using index |
|  1 | SIMPLE      | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | tuling.t2.id |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql> explain select * from t2 where id in (select id from t1);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref          | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
|  1 | SIMPLE      | t2    | NULL       | ALL    | PRIMARY       | NULL    | NULL    | NULL         |  100 |   100.00 | NULL        |
|  1 | SIMPLE      | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | tuling.t2.id |    1 |   100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

(1) exists:當A表的資料集小於B表的資料集時,exists優於in。

將主查詢A的資料,放到子查詢B中做條件驗證,根據驗證結果(true或false)來決定主查詢的資料是否保留:

select * from A where exists (select 1 from B where B.id = A.id);

# 等價於
for(select * from A){
    select * from B where B.id = A.id;
}
mysql> explain select * from t1 where exists (select 1 from t2 where t2.id = t1.id);
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+-------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref          | rows  | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+-------+----------+-------------+
|  1 | PRIMARY            | t1    | NULL       | ALL    | NULL          | NULL    | NULL    | NULL         | 10337 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | tuling.t1.id |     1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+-------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

mysql> explain select * from t2 where exists (select 1 from t1 where t2.id = t1.id);
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref          | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
|  1 | PRIMARY            | t2    | NULL       | ALL    | NULL          | NULL    | NULL    | NULL         |  100 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | tuling.t2.id |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------+------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)
  • (1) exists(subquery)只返回true或false,因此子查詢中的select *也可以用select 1替換,官方說法是實際執行時會忽略select清單,因此沒有區別;
  • (2) exists子查詢的實際執行過程可能經過了優化而不是我們理解上的逐條對比;(即並不是和for迴圈裡的一樣。)
  • (3) exists子查詢往往也可以用join來代替,何種最優需要具體問題具體分析。

(2.7) count(*)查詢優化

# 為了檢視sql多次執行的真實時間,臨時關閉mysql查詢快取
set global query_cache_size=0;
set global query_cache_type=0;
mysql> explain select count(1) from employees;
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key                   | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | idx_name_age_position | 140     | NULL | 432389 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(id) from employees;
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key                   | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | idx_name_age_position | 140     | NULL | 432389 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(name) from employees;
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key                   | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | idx_name_age_position | 140     | NULL | 432389 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from employees;
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key                   | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | idx_name_age_position | 140     | NULL | 432389 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------------------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

四個sql的執行計劃一樣,說明這四個sql執行效率應該差不多,區別在於根據某個欄位count不會統計欄位為null值的資料行。(即,使用count(name),如果name有空值,有n個空值,則查的結果會少n個。)

note:掃描非主鍵索引B+Tree的葉子節點,掃描到一個非null的,就加1。

如上,為什麼mysql最終選擇輔助索引而不是主鍵聚集索引?因為二級索引相對主鍵索引儲存資料更少,檢索效能應該更高。所以count(name)效率可以高於count(id)。

count(1) > count(name) ≈ count(*) > count(id);5.7版本推薦使用count(*)

常見優化方法:

(1) 查詢mysql自己維護的總行數對於MyISAM儲存引擎的表做不帶where條件的count查詢效能是很高的,因為MyISAM儲存引擎的表的總行數會被mysql儲存在磁碟上,查詢不需要計算。

# 建立一個和t2表一樣的t3表,並插入100行資料。
mysql> create table t3 like t2;
mysql> alter table t3 engine='MyISAM';

mysql> select count(*) from t3;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.00 sec)

mysql> explain select count(*) from t3;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

對於InnoDB儲存引擎的表mysql不會儲存表的總記錄行數,查詢count需要實時計算。

(2) show table status

如果只需要知道表總行數的估計值可以用如下sql查詢,效能很高。

mysql> select count(*) from employees;
+----------+
| count(*) |
+----------+
|   432824 |
+----------+

# 只是估計值,Rows。
mysql> show table status like 'employees';
+-----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+-----------------+
| Name      | Engine | Version | Row_format | Rows   | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation       | Checksum | Create_options | Comment         |
+-----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+-----------------+
| employees | InnoDB |      10 | Dynamic    | 432389 |             49 |    21544960 |               0 |     22626304 |   9437184 |         432829 | 2020-12-15 21:21:40 | 2020-12-16 20:22:50 | NULL       | utf8_general_ci |     NULL |                | 員工記錄表      |
+-----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+-----------------+
1 row in set (0.01 sec)

(3) 將總數維護到redis裡

插入或刪除表資料行的時候同時維護redis裡的表總行數key的計數值(用incr或decr命令),但是這種方式可能不準,很難保證表操作和redis操作的事務一致性。

(4) 增加計數表

插入或刪除表資料行的時候同時維護計數表,讓它們在同一個事務裡操作。