1. 程式人生 > 實用技巧 >MySQL的統計資訊學習總結

MySQL的統計資訊學習總結

統計資訊概念

MySQL統計資訊是指資料庫通過取樣、統計出來的表、索引的相關資訊,例如,表的記錄數、聚集索引page個數、欄位的Cardinality....。MySQL在生成執行計劃時,需要根據索引的統計資訊進行估算,計算出最低代價(或者說是最小開銷)的執行計劃.MySQL支援有限的索引統計資訊,因儲存引擎不同而統計資訊收集的方式也不同. MySQL官方關於統計資訊的概念介紹幾乎等同於無,不過對於已經接觸過其它型別資料庫的同學而言,理解這個概念應該不在話下。相對於其它資料庫而言,MySQL統計資訊無法手工刪除。MySQL 8.0之前的版本,MySQL是沒有直方圖的。

統計資訊引數

MySQL的InnoDB儲存引擎的統計資訊引數有7(個別版本有8個之多),如下所示:

MySQL 5.6.41 有8個引數:

mysql> show variables like'innodb_stats%';
+--------------------------------------+-------------+
| Variable_name | Value |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc | ON |
| innodb_stats_include_delete_marked | OFF |
| innodb_stats_method | nulls_equal |
| innodb_stats_on_metadata | OFF |
| innodb_stats_persistent | ON |
| innodb_stats_persistent_sample_pages | 20 |
| innodb_stats_sample_pages | 8 |
| innodb_stats_transient_sample_pages | 8 |
+--------------------------------------+-------------+
8 rowsinset (0.00 sec)

MySQL 8.0.18 有7個引數:

mysql> show variables like
'innodb_stats%';
+--------------------------------------+-------------+
| Variable_name | Value |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc | ON |
| innodb_stats_include_delete_marked | OFF |
| innodb_stats_method | nulls_equal |
| innodb_stats_on_metadata | OFF |
| innodb_stats_persistent | ON |
| innodb_stats_persistent_sample_pages | 20 |
| innodb_stats_transient_sample_pages | 8 |
+--------------------------------------+-------------+

關於這些引數的功能,下面做了一個大概的整理、收集。

引數名稱

引數意義

innodb_stats_auto_recalc

是否自動觸發更新統計資訊。當被修改的資料超過10%時就會觸發統計資訊重新統計計算

innodb_stats_include_delete_marked

控制在重新計算統計資訊時是否會考慮刪除標記的記錄。

innodb_stats_method

null值的統計方法

innodb_stats_on_metadata

操作元資料時是否觸發更新統計資訊

innodb_stats_persistent

統計資訊是否持久化

innodb_stats_sample_pages

不推薦使用,已經被innodb_stats_persistent_sample_pages替換

innodb_stats_persistent_sample_pages

持久化抽樣page

innodb_stats_transient_sample_pages

瞬時抽樣page

引數innodb_stats_auto_recalc

該引數innodb_stats_auto_recalc控制是否自動重新計算統計資訊,當表中資料有大於10%被修改時就會重新計算統計資訊(注意,由於統計資訊重新計算是在後臺發生,而且它是非同步處理,這個可能存在延時,不會立即觸發,具體見下面介紹)。如果關閉了innodb_stats_auto_recalc,需要通過analyze table來保證統計資訊的準確性。不管有沒有開啟全域性變數innodb_stats_auto_recalc。即使innodb_stats_auto_recalc=OFF時,當新索引被增加到表中,所有索引的統計資訊會被重新計算並且更新到innodb_index_stats表上。

下面驗證一下系統變數innodb_stats_auto_recalc=OFF時,建立索引時,會觸發該表所有索引重新統計計算。

mysql> setglobal innodb_stats_auto_recalc=off;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like'innodb_stats_auto_recalc%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_stats_auto_recalc | OFF |
+--------------------------+-------+
1 rowinset (0.00 sec)
mysql> select * from mysql.innodb_index_stats 
 -> where database_name='MyDB'and table_name = 'test';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB | test | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | n_diff_pfx01 | 2 | 1 | DB_ROW_ID |
| MyDB | test | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| MyDB | test | GEN_CLUST_INDEX | 2019-10-28 14:54:48 | size | 1 | NULL | Number of pages in the index |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rowsinset (0.00 sec)
mysql> createindex ix_test_name on test(name);
mysql> select * from mysql.innodb_index_stats 
 -> where database_name='MyDB'and table_name = 'test';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB | test | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | n_diff_pfx01 | 2 | 1 | DB_ROW_ID |
| MyDB | test | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| MyDB | test | GEN_CLUST_INDEX | 2019-10-28 22:02:07 | size | 1 | NULL | Number of pages in the index |
| MyDB | test | ix_test_name | 2019-10-28 22:02:07 | n_diff_pfx01 | 1 | 1 | name |
| MyDB | test | ix_test_name | 2019-10-28 22:02:07 | n_diff_pfx02 | 2 | 1 | name,DB_ROW_ID |
| MyDB | test | ix_test_name | 2019-10-28 22:02:07 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| MyDB | test | ix_test_name | 2019-10-28 22:02:07 | size | 1 | NULL | Number of pages in the index |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rowsinset (0.00 sec)

下面是我另外一個測試,全域性變數innodb_stats_auto_recalc=ON的情況,修改表的屬性STATS_AUTO_RECALC=0,然後新建索引,測試驗證發現也會重新計算所有索引的統計資訊。

mysql> select * from mysql.innodb_index_stats 
 -> where database_name='MyDB'and table_name = 'test';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB | test | PRIMARY | 2019-10-30 15:49:00 | n_diff_pfx01 | 0 | 1 | id |
| MyDB | test | PRIMARY | 2019-10-30 15:49:00 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| MyDB | test | PRIMARY | 2019-10-30 15:49:00 | size | 1 | NULL | Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rowsinset (0.01 sec)
mysql> ALTERTABLE test STATS_AUTO_RECALC=0;
Query OK, 0 rows affected (0.27 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> select * from mysql.innodb_index_stats 
 -> where database_name='MyDB'and table_name = 'test';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB | test | PRIMARY | 2019-10-30 15:49:00 | n_diff_pfx01 | 0 | 1 | id |
| MyDB | test | PRIMARY | 2019-10-30 15:49:00 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| MyDB | test | PRIMARY | 2019-10-30 15:49:00 | size | 1 | NULL | Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rowsinset (0.00 sec)
mysql> CREATEINDEX ix_test_name ON test(name);
Query OK, 0 rows affected (1.41 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> select * from mysql.innodb_index_stats 
 -> where database_name='MyDB'and table_name = 'test';
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
| MyDB | test | PRIMARY | 2019-10-30 15:54:22 | n_diff_pfx01 | 0 | 1 | id |
| MyDB | test | PRIMARY | 2019-10-30 15:54:22 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| MyDB | test | PRIMARY | 2019-10-30 15:54:22 | size | 1 | NULL | Number of pages in the index |
| MyDB | test | ix_test_name | 2019-10-30 15:54:22 | n_diff_pfx01 | 999 | 17 | name |
| MyDB | test | ix_test_name | 2019-10-30 15:54:22 | n_diff_pfx02 | 999 | 17 | name,id |
| MyDB | test | ix_test_name | 2019-10-30 15:54:22 | n_leaf_pages | 17 | NULL | Number of leaf pages in the index |
| MyDB | test | ix_test_name | 2019-10-30 15:54:22 | size | 18 | NULL | Number of pages in the index |
+---------------+------------+--------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rowsinset (0.00 sec)
mysql> 

關於統計資訊重新計算延時,官方的介紹如下:

Because of the asynchronous nature of automatic statistics recalculation, which occurs in the background, statistics may not be recalculated instantly after running a DML operation that affects more than 10% of a table, even when innodb_stats_auto_recalc is enabled. Statistics recalculation can be delayed by few seconds in some cases. If up-to-date statistics are required immediately, run ANALYZE TABLE to initiate a synchronous (foreground) recalculation of statistics

引數innodb_stats_include_delete_marked

重新計算統計資訊時是否會考慮刪除標記的記錄.

innodb_stats_include_delete_marked can be enabled to ensure that delete-marked records are included when calculating persistent optimizer statistics.

網上有個關於innodb_stats_include_delete_marked的建議,如下所示,但是限於經驗無法對這個建議鑑定真偽,個人覺得還是選擇預設關閉,除非有特定場景真有這種需求。

· innodb_stats_include_delete_marked建議設定開啟,這樣可以針對未提交事務中刪除的資料也收集統計資訊。

By default, InnoDB reads uncommitted data when calculating statistics. In the case of an uncommitted transaction that deletes rows from a table, delete-marked records are excluded when calculating row estimates and index statistics, which can lead to non-optimal execution plans for other transactions that are operating on the table concurrently using a transaction isolation level other than READ UNCOMMITTED. To avoid this scenario, innodb_stats_include_delete_marked can be enabled to ensure that delete-marked records are included when calculating persistent optimizer statistics.

When innodb_stats_include_delete_marked is enabled, ANALYZE TABLE considers delete-marked records when recalculating statistics.innodb_stats_include_delete_marked is a global setting that affects all InnoDB tables, and it is only applicable to persistent optimizer statistics.innodb_stats_include_delete_marked was introduced in MySQL 5.6.34.

引數innodb_stats_method

Specifies how InnoDB index statistics collection code should treat NULLs. Possible values are NULLS_EQUAL (default), NULLS_UNEQUAL and NULLS_IGNORED

· 當變數設定為nulls_equal時,所有NULL值都被視為相同(即,它們都形成一個 value group)

· 當變數設定為nulls_unequal時,NULL值不被視為相同。相反,每個NULLvalue 形成一個單獨的 value group,大小為 1

· 當變數設定為nulls_ignored時,將忽略NULL值。

更多詳細資訊,參考官方文件InnoDB and MyISAM Index Statistics Collection,另外,還有一個系統變數myisam_stats_method控制MyISAM表對Null值的統計方法。

mysql> show variables like'myisam_stat%';
+---------------------+---------------+
| Variable_name | Value |
+---------------------+---------------+
| myisam_stats_method | nulls_unequal |
+---------------------+---------------+
1 rowinset (0.00 sec)

引數innodb_stats_on_metadata

引數innodb_stats_on_metadataMySQL 5.6.6之前的版本預設開啟(預設值為O),每當查詢information_schema元資料庫裡的表時(例如,information_schema.TABLESinformation_schema.TABLE_CONSTRAINTS .... )或show table statusSHOW INDEX..這類操作時,Innodb還會隨機提取其他資料庫每個表索引頁的部分資料,從而更新information_schema.STATISTICS表,並返回剛才查詢的結果。當你的表很大,且數量很多時,耗費的時間就很長,以致很多經常不訪問的資料也會進入Innodb_buffer_pool緩衝池中,造成池汙染,關閉這個引數,可以加快對於schema庫表訪問,同時也可以改善查詢執行計劃的穩定性(對於Innodb表的訪問)。所以從MySQL 5.6.6這個版本開始,此引數預設為OFF

注意僅當優化器統計資訊配置為非永續性時,此選項才生效。這個引數開啟的時候,InnoDB會更新非持久統計資訊

官方文件的介紹如下:

innodb_stats_on_metadata

Property

Value

Command-Line Format

--innodb-stats-on-metadata[={OFF|ON}]

System Variable

innodb_stats_on_metadata

Scope

Global

Dynamic

Yes

Type

Boolean

Default Value

OFF

This option only applies when optimizer statistics are configured to be non-persistent. Optimizer statistics are not persisted to disk when innodb_stats_persistent is disabled or when individual tables are created or altered with STATS_PERSISTENT=0. For more information, see Section 14.8.11.2, “Configuring Non-Persistent Optimizer Statistics Parameters”.

When innodb_stats_on_metadata is enabled, InnoDB updates non-persistent statistics when metadata statements such as SHOW TABLE STATUS or when accessing the INFORMATION_SCHEMA.TABLES or INFORMATION_SCHEMA.STATISTICS tables. (These updates are similar to what happens for ANALYZE TABLE.) When disabled,InnoDB does not update statistics during these operations. Leaving the setting disabled can improve access speed for schemas that have a large number of tables or indexes. It can also improve the stability of execution plans for queries that involve InnoDB tables.

To change the setting, issue the statement SET GLOBAL innodb_stats_on_metadata=mode, where mode is either ON or OFF (or 1 or 0). Changing the setting requires privileges sufficient to set global system variables (see Section 5.1.8.1, “System Variable Privileges”) and immediately affects the operation of all connections

引數innodb_stats_persistent

此引數控制統計資訊是否持久化,如果此引數啟用,統計資訊將會儲存到mysql資料庫的innodb_table_statsinnodb_index_stats表中。從MySQL 5.6.6開始,MySQL預設使用持久化的統計資訊,即預設INNODB_STATS_PERSISTENT=ON Persistent optimizer statistics were introduced in MySQL 5.6.2 and were made the default in MySQL 5.6.6置此引數之後我們就不需要實時去收集統計資訊了,因為實時收集統計資訊在高併發下可能會造成一定的效能上影響,並且會導致執行計劃有所不同。

另外,我們可以使用表的建表引數(STATS_PERSISTENT,STATS_AUTO_RECALC和STATS_SAMPLE_PAGES子句)來覆蓋系統變數設定的值,建表選項可以在CREATE TABLE或ALTER TABLE語句中指定。表上面指定的引數會覆蓋全域性變數,也就是說優先順序要高於全域性變數。例子如下:

mysql> ALTERTABLE test STATS_PERSISTENT=1;
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> ALTERTABLE test STATS_AUTO_RECALC=0;
Query OK, 0 rows affected (0.27 sec)
Records: 0 Duplicates: 0 Warnings: 0

持久化統計新儲存在mysql.innodb_index_stats和mysql.innodb_table_stats中,這兩個表的定義如下:

innodb_table_stats

Column name

Description

database_name

資料庫名

table_name

表名,分割槽名或者子分割槽名

last_update

統計資訊最後一次更新時間戳

n_rows

表中資料行數

clustered_index_size

聚集索引page個數

sum_of_other_index_sizes

非聚集索引page個數

innodb_index_stats

Column name

Description

database_name

資料庫名

table_name

表名,分割槽名或者子分割槽名

index_name

索引名

last_update

最後一次更新時間戳

stat_name

統計資訊名

stat_value

統計資訊不同值個數

sample_size

取樣page個數

stat_description

描述

非持久化(Non-persistent optimizer statistics) 儲存在記憶體裡,並在伺服器關閉時丟失。某些業務和某些條件下也會定期更新統計資料。 注意,這裡儲存在記憶體指儲存在哪裡呢?

Optimizer statistics are not persisted to disk when innodb_stats_persistent=OFF or when individual tables are created or altered with STATS_PERSISTENT=0. Instead, statistics are stored in memory, and are lost when the server is shut down. Statistics are also updated periodically by certain operations and under certain conditions.

其實這裡指儲存在內層表(MEMROY TABLE),下面有簡單介紹。

引數innodb_stats_persistent_sample_pages

如果引數innodb_stats_persistent設定為ON,該引數表示ANALYZE TABLE更新Cardinality值時每次取樣頁的數量。預設值為20個頁面。innodb_stats_persistent_sample_pages太少會導致統計資訊不夠準確,太多會導致分析執行太慢。

我們可以在建立表的時候對不同的表指定不同的page數量、是否將統計資訊持久化到磁碟上、是否自動收集統計資訊,如下所示:

CREATETABLE `test` (
`id` int(8) NOTNULL auto_increment,
`data` varchar(255),
`date` datetime,
P
PRIMARYKEY (`id`),
I
INDEX `DATE_IX` (`date`)
) ENGINE=InnoDB,
 STATS_PERSISTENT=1,
 STATS_AUTO_RECALC=1,
 STATS_SAMPLE_PAGES=25;

引數innodb_stats_sample_pages

已棄用. 已用innodb_stats_transient_sample_pages 替代。

引數innodb_stats_transient_sample_pages

innodb_stats_transient_sample_pages控制取樣pages個數,預設為8Innodb_stats_transient_sample_pages可以runtime設定

innodb_stats_transient_sample_pagesinnodb_stats_persistent=0的時候影響取樣。注意點:

1.若值太小,會導致評估不準

2.若果值太大,會導致disk read增加。

3.會生產很不同的執行計劃,因為統計資訊不同。

還有一個引數information_schema_stats_expiry。這個引數的作用如下:

· 對於INFORMATION_SCHEMA下的STATISTICS表和TABLES表中的資訊,8.0中通過快取的方式,以提高查詢的效能。可以通過設定information_schema_stats_expiry引數設定快取資料的過期時間,預設是86400秒。查詢這兩張表的資料的時候,首先是到快取中進行查詢,快取中沒有快取資料,或者快取資料過期了,查詢會從儲存引擎中獲取最新的資料。如果需要獲取最新的資料,可以通過設定information_schema_stats_expiry引數為0或者ANALYZE TABLE操作

檢視統計資訊

統計資訊分持久化(PERSISTENT)與非持久化統計資料(TRANSIENT),那麼它們儲存在哪裡呢?

· 持久化統計資料

儲存在mysql.innodb_index_statsmysql.innodb_table_stats

· 非持久化統計資料

MySQL 8.0之前,儲存在information_schema.INDEXESinformation_schema.TABLES中, 那麼MySQL8.0之後放在那裡呢? INFORMATION_SCHEMA.TABLESINFORMATION_SCHEMA.STATISTICSINNODB_INDEXES。官方文件說非持久化統計資訊放在記憶體中,其實就是記憶體表(MEMORY Table)中。

我們可以用下面指令碼檢視持久化統計資訊資訊,mysql.innodb_index_stats的資料如何看懂,要搞懂stat_namestat_value的具體含義:

select * from mysql.innodb_index_stats 
where table_name = 'test';
select * from mysql.innodb_index_stats 
where database_name='MyDB'and table_name = 'test';

stat_name=size時:stat_value表示索引的頁的數量(Number of pages in the index

stat_name=n_leaf_pages時:stat_value表示葉子節點的數量(Number of leaf pages in the index

stat_name=n_diff_pfxNN時:stat_value表示索引欄位上唯一值的數量,此處做一下具體說明:

*n_diff_pfxNNNN代表數字(例如:0102等),當stat_namen_diff_pfxNN時,stat_value列值顯示索引的first column(即索引的最前索引列,從索引定義順序的第一個列開始)列的唯一值數量,例如:NN01時,stat_value列值就表示索引的第一個列的唯一值數量,當NN02時,stat_value列值就表示索引的第一和第二個列的組合唯一值數量,以此類推。此外,在stat_name = n_diff_pfxNN的情況下,stat_description列顯示一個以逗號分隔的計算索引統計資訊列的列表。

MySQL的直方圖

MySQL 8.0推出了直方圖(histogram), 直方圖資料存放在information_schema.column_statistics這個系統表下,每行記錄對應一個欄位的直方圖,以json格式儲存。同時,新增了一個引數histogram_generation_max_mem_size來配置建立直方圖記憶體大小。

直方圖是數字資料分佈的準確表示。對於RDBMS,直方圖是特定列內資料分佈的近似值。

mysql> show variables like'histogram_generation_max_mem_size';
+-----------------------------------+----------+
| Variable_name | Value |
+-----------------------------------+----------+
| histogram_generation_max_mem_size | 20000000 |
+-----------------------------------+----------+
1 rowinset (0.01 sec)
mysql> 
mysql> desc information_schema.column_statistics;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| SCHEMA_NAME | varchar(64) | NO | | NULL | |
| TABLE_NAME | varchar(64) | NO | | NULL | |
| COLUMN_NAME | varchar(64) | NO | | NULL | |
| HISTOGRAM | json | NO | | NULL | |
+-------------+-------------+------+-----+---------+-------+
4 rowsinset (0.00 sec)
mysql> 

MySQL的直方圖有兩種,等寬直方圖和等高直方圖。等寬直方圖每個桶(bucket)儲存一個值以及這個值累積頻率;等高直方圖每個桶需要儲存不同值的個數,上下限以及累計頻率等。MySQL會自動分配用哪種型別的直方圖,有時候可以通過設定合適Buckets數量來實現。?

建立刪除直方圖

直方圖資料會自動生成嗎? MySQL的直方圖比較特殊,不會在建立索引的時候自動生成直方圖資料,需要手工執行 ANALYZE TABLE [table] UPDATE HISTOGRAM .... 這樣的命令產生表上各列的直方圖,預設情況下這些資訊會被複制到備庫。

ANALYZE TABLE tbl_name UPDATE HISTOGRAM ON col_name [, col_name] WITH N BUCKETS;

ANALYZE TABLE tbl_name DROP HISTOGRAM ON col_name [, col_name];

ANALYZE TABLE test UPDATE HISTOGRAM ON create_date,name WITH 16 BUCKETS;

注意:可指定BUCKETS的值,也可以不指定,它的取值範圍為11024,如果不指定BUCKETS值的話,預設值是100

我們測試如下,首先刪除所有的直方圖資料。然後使用下面SQL生成直方圖資料。

ANALYZE TABLE test UPDATE HISTOGRAM ON name;
SELECT SCHEMA_NAME
 ,TABLE_NAME
 ,COLUMN_NAME
 ,HISTOGRAM->>'$."data-type"'AS'DATA-TYPE'
 ,HISTOGRAM->>'$."sampling-rate"' AS SAMPLING_RATE
 ,HISTOGRAM->>'$."last-updated"'AS LAST_UPDATED
 ,HISTOGRAM->>'$."number-of-buckets-specified"'AS NUM_BUCKETS_SPECIFIED
 ,JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS'BUCKET-COUNT'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE TABLE_NAME = 'test';

其實不是所有預設的BUCKETS都是100,如下所示,如果我將記錄刪除,只剩下49條記錄,然後建立直方圖,你會看到BUCKETS的數量為49,所有這個值還跟表的資料量有關係。如果資料量較大的話,預設是100

另外,如下測試所示,如果BUCKETS超過1024,就會報ERROR 1690 (22003): Number of buckets value is out of range in 'ANALYZE TABLE'

mysql> ANALYZE TABLE test UPDATE HISTOGRAM ON name WITH 1024 BUCKETS;
+-----------+-----------+----------+-------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------+-----------+----------+-------------------------------------------------+
| MyDB.test | histogram | status | Histogram statistics created forcolumn'name'. |
+-----------+-----------+----------+-------------------------------------------------+
1 rowinset (0.13 sec)
mysql> ANALYZE TABLE test UPDATE HISTOGRAM ON name WITH 1025 BUCKETS;
ERROR 1690 (22003): Number of buckets valueisoutof range in'ANALYZE TABLE'
mysql> 

刪除刪除直方圖

--刪除欄位上的統計直方圖資訊

ANALYZE TABLE test DROP HISTOGRAM ON create_date

mysql> ANALYZE TABLE test DROP HISTOGRAM ON name;
+-----------+-----------+----------+-------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------+-----------+----------+-------------------------------------------------+
| MyDB.test | histogram | status | Histogram statistics removed forcolumn'name'. |
+-----------+-----------+----------+-------------------------------------------------+
1 rowinset (0.10 sec)

直方圖資訊檢視

我們知道直方圖的資料是以json格式儲存的,直接將json格式展示出來,看起來非常不直觀。其實有一些SQL可以解決這個問題。

SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) 
FROM information_schema.column_statistics 
WHERE TABLE_NAME='test'\G
SELECT SCHEMA_NAME
 ,TABLE_NAME
 ,COLUMN_NAME
 ,HISTOGRAM->>'$."data-type"'AS'DATA-TYPE'
 ,HISTOGRAM->>'$."sampling-rate"' AS SAMPLING_RATE
 ,HISTOGRAM->>'$."last-updated"'AS LAST_UPDATED
 ,HISTOGRAM->>'$."histogram-type"'AS HISTOGRAM_TYPE
 ,HISTOGRAM->>'$."number-of-buckets-specified"'AS NUM_BUCKETS_SPECIFIED
 ,JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS'BUCKET-COUNT'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE TABLE_NAME = 'test';
SELECT FROM_BASE64(SUBSTRING_INDEX(v, ':', -1)) value, concat(round(c*100,1),'%') cumulfreq, 
 CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq 
FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
 '$[*]' COLUMNS(v VARCHAR(60) PATH'$[0]', c doublePATH'$[1]')) hist 
WHERE schema_name = 'MyDB'and table_name = 'test'and column_name = 'name';
SELECT v value, concat(round(c*100,1),'%') cumulfreq, 
 CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq 
FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
 '$[*]' COLUMNS(v VARCHAR(60) PATH'$[0]', c doublePATH'$[1]')) hist 
WHERE schema_name = 'MyDB'and table_name = 'test'and column_name = 'name';

更新統計資訊

非持久統計統計資訊也會觸發自動更新,非持久化統計資訊在以下情況會被自動更新,官方文件介紹如下:

Non-persistent optimizer statistics are updated when:
Running ANALYZE TABLE.
Running SHOW TABLE STATUS, SHOW INDEX, or querying the INFORMATION_SCHEMA.TABLES or INFORMATION_SCHEMA.STATISTICS tables with theinnodb_stats_on_metadata option enabled.
The default setting for innodb_stats_on_metadata is OFF. Enabling innodb_stats_on_metadata may reduce access speed for schemas that have a large number of tables or indexes, and reduce stability of execution plans for queries that involve InnoDB tables. innodb_stats_on_metadata is configured globally using a SETstatement.
SET GLOBAL innodb_stats_on_metadata=ON
Note
innodb_stats_on_metadata only applies when optimizer statistics are configured to be non-persistent (when innodb_stats_persistent is disabled).
Starting a mysql client with the --auto-rehash option enabled, which is the default. The auto-rehash option causes all InnoDB tables to be opened, and the open table operations cause statistics to be recalculated.
To improve the start up time of the mysql client and to updating statistics, you can turn off auto-rehash using the --disable-auto-rehash option. The auto-rehashfeature enables automatic name completion of database, table, and column names for interactive users.
A table is first opened.
InnoDB detects that 1 / 16 of table has been modified since the last time statistics were updated.

簡單整理如下:

1 執行ANALYZE TABLE

2 innodb_stats_on_metadata=ON情況下,執SHOW TABLE STATUS, SHOW INDEX, 查詢 INFORMATION_SCHEMA下的TABLES, STATISTICS

3 啟用--auto-rehash功能情況下,使用mysql client登入

4 表第一次被開啟

5 距上一次更新統計資訊,表1/16的資料被修改

持久統計資訊的統計資訊更新上面已經有介紹,還有一種方法就是手動更新統計資訊,

1、手動更新統計資訊,注意執行過程中會加讀鎖:

ANALYZE TABLE TABLE_NAME;

2、如果更新後統計資訊仍不準確,可考慮增加表取樣的資料頁,兩種方式可以修改:

1) 全域性變數INNODB_STATS_PERSISTENT_SAMPLE_PAGES,預設為20;

2) 單個表可以指定該表的取樣:

ALTER TABLE TABLE_NAME STATS_SAMPLE_PAGES=100;

經測試,此處STATS_SAMPLE_PAGES的最大值是65535,超出會報錯。

mysql> ALTER TABLE test STATS_SAMPLE_PAGES=65535;
Query OK, 0 rows affected (0.12 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> ALTER TABLE test STATS_SAMPLE_PAGES=65536;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '65536' at line 1
mysql>

參考資料:

https://dev.mysql.com/doc/refman/8.0/en/innodb-persistent-stats.html

https://dev.mysql.com/doc/refman/8.0/en/index-statistics.html

https://dev.mysql.com/doc/refman/8.0/en/innodb-performance-optimizer-statistics.html

https://www.percona.com/blog/2019/10/29/column-histograms-on-percona-server-and-mysql-8-0/ 重點

http://chinaunix.net/uid-31396856-id-5787793.html

https://mysqlserverteam.com/histogram-statistics-in-mysql/

https://mp.weixin.qq.com/s/698g5lm9CWqbU0B_p0nLMw?