MySQL Waiting for table metadata lock故障分析
源起
線上執行ALTER時,通過show processlist檢視到出現Waiting for table metadata lock ,導致後面的查詢都無法執行。
5217122 | create_table_04 | 172.100.207.148:31291 | finance | Query | 1829 | Waiting for table metadata lock | ALTER TABLE `Pay` MODIFY COLUMN `pay` smallint(6) NULL DEFAULT 0 COMMENT '付??' 5217155 | bx_live_dml | 172.100.210.4:34730 | finance | Prepare | 1714 | Waiting for table metadata lock | select * from Pay where (pay =4 or pay =8) and projectId ='CSZY0'
由於當時直接把ALTER給kill了,所以基本沒有現場,所以下面模擬什麼情況MySQL產生這種事故。
MySQL元資料鎖
MySQL DBA對於Waiting for table metadata lock肯定不會陌生,一般都是進行alter操作時被堵住了,導致了我們在show processlist 時,看到執行緒的狀態是在等metadata lock。
為了在併發環境下維護表元資料的資料一致性,在表上有活動事務(顯式或隱式)的時候,不可以對元資料進行寫入操作。因此從MySQL5.5版本開始引入了MDL鎖(metadata lock),來保護表的元資料資訊,用於解決或者保證DDL操作與DML操作之間的一致性。對於引入MDL,其主要解決了2個問題,一個是事務隔離問題,比如在可重複隔離級別下,會話A在2次查詢期間,會話B對錶結構做了修改,兩次查詢結果就會不一致,無法滿足可重複讀的要求;另外一個是資料複製的問題,比如會話A執行了多條更新語句期間,另外一個會話B做了表結構變更並且先提交,就會導致slave在重做時,先重做alter,再重做update時就會出現複製錯誤的現象。
所以在對錶進行上述操作時,如果表上有活動事務(未提交或回滾),請求寫入的會話會等待在Metadata lock wait 。例如下面的這種情形:
若沒有MDL鎖的保護,則事務2可以直接執行DDL操作,並且導致事務1出錯,5.1版本即是如此。5.5版本加入MDL鎖就在於保護這種情況的發生,由於事務1開啟了查詢,那麼獲得了MDL鎖,鎖的模式為SHARED_READ,事務2要執行DDL,則需獲得EXCLUSIVE鎖,兩者互斥,所以事務2需要等待。
注:支援事務的InnoDB引擎表和不支援事務的MyISAM引擎表,都會出現Metadata Lock Wait等待現象。一旦出現Metadata Lock Wait等待現象,後續所有對該表的訪問都會阻塞在該等待上,導致連線堆積,業務受影響。
引起Metadata lock wait出現的場景:Waiting for table metadata lock
場景一:當前有執行DML操作時執行ALTRE操作。
# SESSION A mysql> insert into sbtest2 select * from sbtest1; # SESSION B mysql> alter table sbtest2 add test1 int; //等待SESSION A執行完; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | 267 | root | localhost | sbtest | Query | 7 | Sending data | insert into sbtest2 select * from sbtest1 | | 271 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | alter table sbtest2 add test1 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ 3 rows in set (0.00 sec) # SESSION D mysql> select * from sbtest2 limit 10; //等待元資料鎖; # SESSION E mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | 267 | root | localhost | sbtest | Query | 20 | Sending data | insert into sbtest2 select * from sbtest1 | | 271 | root | localhost | sbtest | Query | 13 | Waiting for table metadata lock | alter table sbtest2 add test1 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 308 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | select * from sbtest2 limit 10 | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ 4 rows in set (0.00 sec)
從上述例子可以看出,我們在執行DDL語句的時候得事先看一下,程序中是否已經存在某些DML語句佔用了表的元資料鎖,這樣會導致DDL語句處於鎖等待狀態。一旦出現Waiting for table metadata lock等待現象,後續所有對該表的訪問都會阻塞在該等待上,包括讀操作,導致連線堆積,業務受影響。
場景二:當前有對錶的長時間查詢或使用mysqldump/mysqlpump時,使用alter會被堵住。
# SESSION A mysql> select *,sleep(10) from sbtest2; # SESSION B mysql> alter table sbtest2 add test2 int; //等待SESSION A執行完; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------------+ | 267 | root | localhost | sbtest | Query | 12 | User sleep | select *,sleep(10) from sbtest2 | | 271 | root | localhost | sbtest | Query | 8 | Waiting for table metadata lock | alter table sbtest2 add test3 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 311 | root | localhost | NULL | Query | 3 | Waiting for table metadata lock | select * from sbtest.sbtest2 limit 10 | +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------------+ 4 rows in set (0.00 sec)
場景三:顯示或者隱式開啟事務後未提交或回滾,比如查詢完成後未提交或者回滾,使用alter會被堵住。
# SESSION A mysql> begin; mysql> select * from sbtest2; # SESSION B mysql> alter table sbtest2 add test2 int; //等待SESSION A執行完; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | 267 | root | localhost | sbtest | Sleep | 36 | | NULL | | 271 | root | localhost | sbtest | Query | 30 | Waiting for table metadata lock | alter table sbtest2 add test2 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ 3 rows in set (0.00 sec)
場景四:表上有失敗的查詢事務,比如查詢不存在的列,語句失敗返回,但是事務沒有提交,此時alter仍然會被堵住。
# SESSION A mysql> begin; mysql> select error from sbtest2; ERROR 1054 (42S22): Unknown column 'error' in 'field list' # SESSION B mysql> alter table sbtest2 add test3 int; //等待SESSION A提交或回滾; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | 267 | root | localhost | sbtest | Sleep | 7 | | NULL | | 271 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | alter table sbtest2 add test3 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 311 | root | localhost | NULL | Sleep | 413 | | NULL | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ 4 rows in set (0.00 sec) # SESSION D mysql> select * from information_schema.innodb_trx; Empty set (0.00 sec)
其實SESSION A中的事務並未開啟,但是由於select獲取表元資料的語句,語法上是有效的,雖然執行失敗了,但是任然不會釋放元資料鎖,故而導致SESSION B的alter動作被阻塞。
通過SESSION D檢視當前開啟事務時,你會發現沒有,從而找不到原因。所以當出現這種場景時,如何判斷是哪個程序導致的呢,我們可以嘗試查看錶performance_schema. events_statements_current,分析程序狀態來進行判斷。
mysql> select * from performance_schema. events_statements_current\G *************************** 1. row *************************** THREAD_ID: 293 EVENT_ID: 32 END_EVENT_ID: 32 EVENT_NAME: statement/sql/select SOURCE: socket_connection.cc:101 TIMER_START: 212721717099954000 TIMER_END: 212721717213807000 TIMER_WAIT: 113853000 LOCK_TIME: 0 SQL_TEXT: select error from sbtest2 DIGEST: 0bbb2d5d1be45e77debea68111264885 DIGEST_TEXT: SELECT ERROR FROM `sbtest2` CURRENT_SCHEMA: sbtest OBJECT_TYPE: NULL OBJECT_SCHEMA: NULL OBJECT_NAME: NULL OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 1054 RETURNED_SQLSTATE: 42S22 MESSAGE_TEXT: Unknown column 'error' in 'field list' ERRORS: 1
然後找到其sid, kill掉該session,也可以kill掉DDL所在的session解決可以解決此問題。
另外,測試時SESSION A要顯式開啟一個事務,否則查詢會隱式回滾結束,無法重現上面的場景。SESSION B執行alter後,沒有立即阻塞住,而是立馬開始copy to tmp table,這個過程結束後,才進行了MDL鎖等待。這怎麼解釋呢,應該是執行alter操作主要分為建立臨時新表->插入老表的資料->臨時新表rename to老表三個步驟,在這種情況下,到最後一步才需要MDL鎖,所以copy過程中不會阻塞。由於沒有查詢在進行,而且查詢也沒有進入innodb層 (失敗返回),所以show processlist和information_schema.innodb_trx沒有可以參考的資訊。
出現以上幾種情況時,這個時候如果進行如下操作就會引起MDL:
- 建立、刪除索引。
- 修改表結構。
- 表維護操作(optimize table、repair table等)。
- 刪除表。
- 獲取表上表級寫鎖 (lock table tab_name write)。
使用Profile分析場景三:顯示或者隱式開啟事務後未提交或回滾,比如查詢完成後未提交或者回滾,使用alter會被堵住
# SESSION A mysql> set profiling=on; mysql> begin; mysql> select * from sbtest.sbtest2 limit 1; # SESSION B mysql> set profiling=on; mysql> alter table sbtest.sbtest5 add test2 int; //等待SESSION A執行完; # SESSION C mysql> set profiling=on; mysql> select * from sbtest.sbtest2 limit 1; # SESSION D mysql> set profiling=on; mysql> show processlist; +-----+------+-----------+------+---------+------+---------------------------------+------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+------+---------+------+---------------------------------+------------------------------------------+ | 325 | root | localhost | NULL | Query | 25 | Waiting for table metadata lock | alter table sbtest.sbtest2 add test5 int | | 326 | root | localhost | NULL | Query | 3 | Waiting for table metadata lock | select * from sbtest.sbtest2 limit 1 | | 327 | root | localhost | NULL | Query | 0 | starting | show processlist | | 328 | root | localhost | NULL | Sleep | 50 | | NULL | +-----+------+-----------+------+---------+------+---------------------------------+------------------------------------------+ 4 rows in set (0.00 sec)
然後回滾SESSION A,等待SESSION B和SESSION C執行完,檢視profile。
檢視SESSION A
# SESSION A mysql> show profiles; +----------+-------------+------------------------------------------+ | Query_ID | Duration | Query | +----------+-------------+------------------------------------------+ | 1 | 42.81646375 | alter table sbtest.sbtest2 add test5 int | +----------+-------------+------------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show profile for query 1; +--------------------------------+----------+ | Status | Duration | +--------------------------------+----------+ | starting | 0.000069 | | checking permissions | 0.000004 | | checking permissions | 0.000004 | | init | 0.000005 | | Opening tables | 0.000128 | | setup | 0.000032 | | creating table | 0.000654 | | After create | 0.000053 | | Waiting for table metadata loc | 1.000069 | ..................... | After create | 0.000013 | | Waiting for table metadata loc | 0.871435 | | After create | 0.000042 | | System lock | 0.000013 | | preparing for alter table | 0.002475 | | altering table | 9.752928 | | committing alter table to stor | 0.185624 | | end | 0.000021 | | query end | 0.000010 | | closing tables | 0.000007 | | freeing items | 0.000020 | | cleaning up | 0.000011 | +--------------------------------+----------+ 83 rows in set, 1 warning (0.00 sec)
檢視SESSION C
mysql> show profiles; +----------+-------------+--------------------------------------+ | Query_ID | Duration | Query | +----------+-------------+--------------------------------------+ | 1 | 10.75216050 | select * from sbtest.sbtest2 limit 1 | +----------+-------------+--------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show profile for query 1; +--------------------------------+-----------+ | Status | Duration | +--------------------------------+-----------+ | starting | 0.000080 | | checking permissions | 0.000007 | | Opening tables | 0.000012 | | Waiting for table metadata loc | 10.751829 | | Opening tables | 0.000094 | | init | 0.000019 | | System lock | 0.000010 | | optimizing | 0.000004 | | statistics | 0.000011 | | preparing | 0.000009 | | executing | 0.000003 | | Sending data | 0.000040 | | end | 0.000006 | | query end | 0.000008 | | closing tables | 0.000008 | | freeing items | 0.000014 | | cleaning up | 0.000009 | +--------------------------------+-----------+ 17 rows in set, 1 warning (0.00 sec)
從上述測試可以看出,SESSION C需要開啟表時碰到了元資料鎖。MySQL不論SESSION A執行的是select還是delete,此時alter table語句無法獲取到metadata獨佔鎖,會進行等待;所以會影響SESSION C的讀取。
這是最基本的一種情形,這個和MySQL 5.6中的online ddl並不衝突。一般alter table的操作過程中,在after create步驟會獲取metadata獨佔鎖,當進行到altering table的過程時(通常是最花時間的步驟),對該表的讀寫都可以正常進行,這就是online ddl的表現,並不會像之前在整個alter table過程中阻塞寫入,當然並不是所有ALTER語句都支援online ddl。
總之,alter table的語句是很危險的(其實他的危險其實是未提交事物或者長事務導致的),在操作之前最好確認對要操作的表沒有任何進行中的操作、沒有未提交事務、也沒有顯式事務中的報錯語句。如果有alter table的維護任務,在無人監管的時候執行,最好通過lock_wait_timeout設定好超時時間,避免長時間的metedata鎖等待。
原文來自微信公眾號:運維那點事