MySQL 8.0 plan optimization 源碼閱讀筆記
以下基於社區版8.0代碼
預備知識:
MySQL JOIN syntax: https://dev.mysql.com/doc/refman/8.0/en/join.html
Straight join: is similar to
JOIN
, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order. STRAIGHT_JOIN有兩種用法:一種是加在JOIN處作為INNER JOIN的一種特殊類型hint該join的順序;另一種是加在SELECT處使該select下的所有JOIN都強制為用戶table的join順序,從優化代碼上看該用法與semi-join不可同時存在(Optimize_table_order::optimize_straight_join: DBUG_ASSERT(join->select_lex->sj_nests.is_empty())
join order hint: Join-order hints affect the order in which the optimizer joins tables, including
JOIN_FIXED_ORDER
,JOIN_ORDER
,JOIN_PREFIX
,JOIN_SUFFIX
各種JOIN類型:INNER JOIN, OUTER JOIN, SEMI JOIN, LEFT/RIGHT JOIN, etc.
Materialization(物化): Usually happens in subquery (sometimes known as semi-join). Materialization speeds up query execution by generating a subquery result as a temporary table, normally in memory.
Statistics (統計信息):從存儲獲取表的rowcount、min/max/sum/avg/keyrange等元信息,用於輔助plan優化。
table dependencies: A LEFT JOIN B : B depends on A and A‘s own dependencies。(待確認 DEPEND JOIN語義是否也是由table dependencies關系表示)
table access path: An access path may use either an
index scan
, atable scan
, arange scan
orref
access, known as join type in explain.index scan: 一般index scan指的是二級索引scan (MySQL主鍵索引會帶著data存放)
table scan: 直接掃表
range scan: 對於索引列的一些可能轉化為範圍查詢的條件,MySQL會試圖將其轉化為range scan來減少範圍外無用的scan。單個範圍的range query類似帶range條件下推的index scan或table scan;range query支持抽取出多個範圍查詢。
ref: join field是索引,但不是pk或unique not null 索引
eq_ref: join field是索引且是pk或unique not null索引,意味著對於每個record最多只會join到右表的一行。
MySQL源碼中JOIN對象的tables的存放layout(參考註釋,單看變量名有歧義):
/** Before plan has been created, "tables" denote number of input tables in the query block and "primary_tables" is equal to "tables". After plan has been created (after JOIN::get_best_combination()), the JOIN_TAB objects are enumerated as follows: - "tables" gives the total number of allocated JOIN_TAB objects - "primary_tables" gives the number of input tables, including materialized temporary tables from semi-join operation. - "const_tables" are those tables among primary_tables that are detected to be constant. - "tmp_tables" is 0, 1 or 2 (more if windows) and counts the maximum possible number of intermediate tables in post-processing (ie sorting and duplicate removal). Later, tmp_tables will be adjusted to the correct number of intermediate tables, @see JOIN::make_tmp_tables_info. - The remaining tables (ie. tables - primary_tables - tmp_tables) are input tables to materialized semi-join operations. The tables are ordered as follows in the join_tab array: 1. const primary table 2. non-const primary tables 3. intermediate sort/group tables 4. possible holes in array 5. semi-joined tables used with materialization strategy */ uint tables; ///< Total number of tables in query block uint primary_tables; ///< Number of primary input tables in query block uint const_tables; ///< Number of primary tables deemed constant uint tmp_tables; ///< Number of temporary tables used by query
源碼剖析
Join表示一個query的join plan,同時也作為plan的context流轉(因此在para query等一些優化實現中,並行查詢裏除了最上層的父查詢有實際優化的價值外,Join起的作用更像一個context)。
best_positions存放最終優化的table order結果。
best_read 存放最終cost
best_ref 存放輸入的table序列,the optimizer optimizes best_ref
make_join_plan
在JOIN::optimize裏被調用,計算最佳的join order並構建join plan。 Steps:
Here is an overview of the logic of this function:
- Initialize JOIN data structures and setup basic dependencies between tables.
- Update dependencies based on join information. 對於存在outer join或recursive的tables進行關系傳遞propagate_dependencies()(用傳遞閉包算法),構建出完整的依賴關系。(recursive這裏具體指代未確定,nested?WITH RECURSIVE語法?)
- Make key descriptions (update_ref_and_keys()). 這一步驟較為煩雜,本意是想從conditions中找出join連接的condition,並識別出join condition相關的key(key指的就是索引),為後續決定join_type到底是ref/ref_or_null/index等做好準備。但MySQL在這一步又加了不少特殊判斷,比如對key is null的特殊處理等。
- Pull out semi-join tables based on table dependencies.
- Extract tables with zero or one row as const tables. 從這步開始的四個步驟都是const table優化,核心就是先把const table算出來,將變量替換成常量。這裏是依靠獲取采樣判斷const table。
- Read contents of const tables, substitute columns from these tables with
actual data. Also keep track of empty tables vs. one-row tables.
- After const table extraction based on row count, more tables may
have become functionally dependent. Extract these as const tables.
- Add new sargable predicates based on retrieved const values.
- Calculate number of rows to be retrieved from each table. 獲取采樣結果的步驟。
- Calculate cost of potential semi-join materializations.
- Calculate best possible join order based on available statistics. 即下文的Optimize_table_order::choose_table_order
- Fill in remaining information for the generated join order.
Statistics
核心對象ha_statistics
。最主要的是records表示table rowcount。
class ha_statistics {
ulonglong data_file_length; /* Length off data file */
ulonglong max_data_file_length; /* Length off data file */
ulonglong index_file_length;
ulonglong max_index_file_length;
ulonglong delete_length; /* Free bytes */
ulonglong auto_increment_value;
/*
The number of records in the table.
0 - means the table has exactly 0 rows
other - if (table_flags() & HA_STATS_RECORDS_IS_EXACT)
the value is the exact number of records in the table
else
it is an estimate
*/
ha_rows records;
ha_rows deleted; /* Deleted records */
ulong mean_rec_length; /* physical reclength */
/* TODO: create_time should be retrieved from the new DD. Remove this. */
time_t create_time; /* When table was created */
ulong check_time;
ulong update_time;
uint block_size; /* index block size */
/*
number of buffer bytes that native mrr implementation needs,
*/
uint mrr_length_per_rec;
}
myrocks是在handler::info中更新stats的。而info在除了insert的寫和部分查詢場景會被調用以更新采樣信息(調用處多達十余處)。
/**
General method to gather info from handler
::info() is used to return information to the optimizer.
SHOW also makes use of this data Another note, if your handler
doesn't proved exact record count, you will probably want to
have the following in your code:
if (records < 2)
records = 2;
The reason is that the server will optimize for cases of only a single
record. If in a table scan you don't know the number of records
it will probably be better to set records to two so you can return
as many records as you need.
Along with records a few more variables you may wish to set are:
records
deleted
data_file_length
index_file_length
delete_length
check_time
Take a look at the public variables in handler.h for more information.
See also my_base.h for a full description.
@param flag Specifies what info is requested
*/
virtual int info(uint flag) = 0;
// 以下為可能的flag對應bit取值。 CONST除了初始化較少用;大部分情況下用VARIABLE,因為VARIABLE涉及的變量確實是較頻繁更新的;ERRKEY在正常路徑不會用到,用來報錯查信息;AUTO專門針對自增值,自增值可從內存裏table級別對象拿到。
/*
Recalculate loads of constant variables. MyISAM also sets things
directly on the table share object.
Check whether this should be fixed since handlers should not
change things directly on the table object.
Monty comment: This should NOT be changed! It's the handlers
responsibility to correct table->s->keys_xxxx information if keys
have been disabled.
The most important parameters set here is records per key on
all indexes. block_size and primar key ref_length.
For each index there is an array of rec_per_key.
As an example if we have an index with three attributes a,b and c
we will have an array of 3 rec_per_key.
rec_per_key[0] is an estimate of number of records divided by
number of unique values of the field a.
rec_per_key[1] is an estimate of the number of records divided
by the number of unique combinations of the fields a and b.
rec_per_key[2] is an estimate of the number of records divided
by the number of unique combinations of the fields a,b and c.
Many handlers only set the value of rec_per_key when all fields
are bound (rec_per_key[2] in the example above).
If the handler doesn't support statistics, it should set all of the
above to 0.
update the 'constant' part of the info:
handler::max_data_file_length, max_index_file_length, create_time
sortkey, ref_length, block_size, data_file_name, index_file_name.
handler::table->s->keys_in_use, keys_for_keyread, rec_per_key
*/
#define HA_STATUS_CONST 8
/*
update the 'variable' part of the info:
handler::records, deleted, data_file_length, index_file_length,
check_time, mean_rec_length
*/
#define HA_STATUS_VARIABLE 16
/*
This flag is used to get index number of the unique index that
reported duplicate key.
update handler::errkey and handler::dupp_ref
see handler::get_dup_key()
*/
#define HA_STATUS_ERRKEY 32
/*
update handler::auto_increment_value
*/
#define HA_STATUS_AUTO 64
Join reorder
Optimize_table_order
類負責實際的join reorder操作,入口方法為其惟一的public方法 choose_table_order
,在make_join_plan
中被調用。Optimize_table_order
依賴三個前提:
- tables的依賴關系已經排好序
- access paths 排好序
- statistics 采樣已經完成
choose_table_order
Steps:
初始化const_tables的cost,如果全是const_tables則可以直接短路返回
如果是在一個sjm(semi-join materialization) plan優化過程中,則做一次排序將semi-join(即子查詢的query提前預計算,可根據需求物化)
否則,非STRAIGHT_JOIN且depend無關的tables是按照row_count從小到大排序的
if (SELECT_STRAIGHT_JOIN option is set) reorder tables so dependent tables come after tables they depend on, otherwise keep tables in the order they were specified in the query else Apply heuristic: pre-sort all access plans with respect to the number of records accessed. Sort algo is merge-sort (tbl >= 5) or insert-sort (tbl < 5)
- 如果有where_cond,需要把where_cond涉及的列 遍歷設置到table->cond_set的bitmap中。
- STRAIGHT_JOIN的tables優化
optimize_straight_join
。STRAIGHT_JOIN相當於用戶限定了JOIN的順序,所以此處的優化工作如其註釋所說:Select the best ways to access the tables in a query without reordering them. 非STRAIGHT_JOIN則使用 啟發式貪心算法
greedy_search
進行join reorder。
optimize_straight_join
:
只支持straight_join,
DBUG_ASSERT(join->select_lex->sj_nests.is_empty());
與semi-join不兼容,只關註primary tables。對每個JOIN_TABLE,
best_access_path
計算其最優的access path,best_access_path
通俗的思路概括可參見上面table access path
explain文檔中關於join types的介紹。set_prefix_join_cost
計算當前表基於對應access path下的cost,並計入總的cost model。Cost計算如下:m_row_evaluate_cost = 0.1 // default value /* Cost of accessing the table in course of the entire complete join execution, i.e. cost of one access method use (e.g. 'range' or 'ref' scan ) multiplied by estimated number of rows from tables earlier in the join sequence. */ read_cost = get_read_cost(table) void set_prefix_join_cost(uint idx, const Cost_model_server *cm) { if (idx == 0) { prefix_rowcount = rows_fetched; prefix_cost = read_cost + prefix_rowcount * m_row_evaluate_cost; } else { // this - 1 means last table prefix_rowcount = (this - 1)->prefix_rowcount * rows_fetched; prefix_cost = (this - 1)->prefix_cost + read_cost + prefix_rowcount * m_row_evaluate_cost; } // float filter_effect [0,1] means cond filters in executor may reduce rows. 1 means nothing filtered, 0 means all rows filtered and no rows left. It is used to calculate how many row combinations will be joined with the next table prefix_rowcount *= filter_effect; }
greedy_search
:
bool Optimize_table_order::best_extension_by_limited_search( table_map remaining_tables, uint idx, uint current_search_depth);
procedure greedy_search
input: remaining_tables
output: partial_plan;
{
partial_plan = <>;
do {
(table, a) = best_extension_by_limited_search(partial_plan, remaining_tables, limit_search_depth);
partial_plan = concat(partial_plan, (table, a));
remaining_tables = remaining_tables - table;
} while (remaining_tables != {})
return pplan;
}
// 簡單理解就是每一步找拓展出去join路徑最佳的table,按順序加進plan裏面。
// 這種方案會很受選取的第一個表影響(因為第一個表沒有join關系,只能依靠篩選之後的cardinality,一般都是小表),選小表作第一個表不一定是最優選擇。一般Greedy的優化方案會把每個表都當第一個表去評估一次cost,然後從N個cost裏選最小的作為最終plan。MySQL裏只是返回找到的第一個完整的plan。
best_extension_by_limited_search
是一個啟發式的搜索過程,search_depth即最大可搜索的深度。best_extension_by_limited_search
前半部分邏輯和optimize_straight_join
類似:
計算
best_access_path
並計算cost。如果此時的cost已經大於best_read,則直接剪枝,無需繼續搜索。
如果
prune_level=PRUNE_BY_TIME_OR_ROWS
開啟,則判斷如果best_row_count和best_cost已經大於當前的rows和cost(註意新版本是and關系),且該表沒有被後續的其他表依賴 (可以理解成該表是這個圖路徑上的最後一個節點,所以可以直接prune;但不一定是整個plan的最後一個節點),則將best_row_count和best_cost設為當前的。寫的很繞的代碼,結合整個循環看,大致就是每次找基於(rowcount, cost)二維最優的表,所謂的剪枝實際變成了類似加強貪心。對eq_ref做優先的選擇,遇到第一個eq_ref後便遞歸將所有的eq_ref join找出來。(原作者認為eq_ref是個1:1的mapping,所以基本可以認為cost是恒定的,單獨將這個eq_ref的序列提前生成,在後續優化時可以看作一整塊放在任何一個順序位置。當然前提是eq_ref是可以連續的。)
如果還有remaining_tables,則遞歸繼續處理直至remaining 為空。
MySQL 8.0 plan optimization 源碼閱讀筆記