1. 程式人生 > 資料庫 >MySQL DeadLock故障排查全過程記錄

MySQL DeadLock故障排查全過程記錄

【作者】

劉博:攜程技術保障中心資料庫高階經理,主要關注Sql server和Mysql的運維和故障處理。

【環境】

版本號:5.6.21

隔離級別:REPEATABLE READ

【問題描述】

接到監控報警,有一個線上的應用DeadLock報錯,每15分鐘會準時出現,報錯統計如下圖:


登入Mysql伺服器檢視日誌:

mysql> show engine innodb status\G

*** (1) TRANSACTION:

TRANSACTION 102973,ACTIVE 11 sec starting index read

mysql tables in use 3,locked 3

LOCK WAIT 4 lock struct(s),heap size 1136,3 row lock(s)

MySQL thread id 6,OS thread handle 140024996574976,query id 83 localhost us updating

UPDATE TestTable

SET column1 = 1,Column2 = sysdate(),Column3= '026'

Column4 = 0

AND column5 = 485

AND column6 = 'SEK'

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 417 page no 1493 n bits 1000 index idx_column6 of table test.TestTable trx id 102973 lock_mode X waiting

Record lock,heap no 859 PHYSICAL RECORD: n_fields 2; compact format; info bits 0

0: len 3; hex 53454b; asc SEK;;

1: len 8; hex 80000000007e1452; asc ~ R;;

*** (2) TRANSACTION:

TRANSACTION 102972,ACTIVE 26 sec starting index read

mysql tables in use 3,locked 3

219 lock struct(s),heap size 24784,2906 row lock(s),undo log entries 7

MySQL thread id 5,OS thread handle 140024996841216,query id 84 localhost us updating

UPDATE TestTable

SET Column1 = 1,Column3 = '026'

Column4 = 0

AND Column5 = 485

AND Column6 = 'SEK'

*** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 417 page no 1493 n bits 1000 index idx_Column6 of table test.TestTable trx id 102972 lock_mode X

Record lock,heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0

0: len 8; hex 73757072656d756d; asc supremum;;


Record lock,heap no 859 PHYSICAL RECORD: n_fields 2; compact format; info bits 0

0: len 3; hex 53454b; asc SEK;;

1: len 8; hex 80000000007e1452; asc ~ R;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 601 page no 89642 n bits 1000 index idx_column6 of table test.TestTable trx id 32231892482 lock_mode X locks rec but not gap waiting

Record lock,heap no 38 PHYSICAL RECORD: n_fields 2; compact format; info bits 0

0: len 3; hex 53454b; asc SEK;;

1: len 8; hex 80000000007eea14; asc ~ ;;

大致一看,更新同一索引的同一行,應該是一個Block,報TimeOut的錯才對,怎麼會報DeadLock?

【初步分析】

先分析下(2) TRANSACTION,TRANSACTION 32231892482。

等待的鎖資訊為:

0: len 3; hex 53454b; asc SEK;;

1: len 8; hex 80000000007eea14; asc

持有的鎖資訊為:

0: len 3; hex 53454b; asc SEK;;

1: len 8; hex 80000000007eeac4; asc

再先分析下(1) TRANSACTION,TRANSACTION 32231892617。

等待的鎖資訊為:

0: len 3; hex 53454b; asc SEK;;

1: len 8; hex 80000000007eeac4; asc

於是可以畫出的死鎖表,兩個資源相互依賴,造成死鎖:

TRANSACTION Hold Wait
32231892617 53454b\80000000007eea14 53454b\80000000007eeac4
32231892482 53454b\80000000007eeac4 53454b\80000000007eea14

讓我們再看一下explain結果:

mysql>desc UPDATE TestTable SET Column1=1,Column3 = '025' Column4 = 0 AND Column5 = 477 AND Column6 = 'SEK' \G;

*************************** 1. row ***************************

id: 1

select_type: UPDATE

table: TestTable

partitions: NULL

type: index_merge

possible_keys: column5_index,idx_column5_column6_Column1,idxColumn6

key: column5_index,idxColumn6

key_len: 8,9

ref: NULL

rows: 7

filtered: 100.00

Extra: Using intersect(column5_index,idxColumn6); Using where

可以看到 EXTRA 列:

Using intersect(column5_index,idxColumn6)

從5.1開始,引入了 index merge 優化技術,對同一個表可以使用多個索引分別進行條件掃描。

相關文件:http://dev.mysql.com/doc/refman/5.7/en/index-merge-optimization.html

The Index Merge method is used to retrieve rows with several range scans and to merge their results into one. The merge can produce unions,intersections,or unions-of-intersections of its underlying scans. This access method merges index scans from a single table; it does not merge scans across multiple tables.

【模擬與驗證】

根據以上初步分析,猜測應該就是intersect造成的,於是在測試環境模擬驗證,開啟2個session模擬死鎖:

時間序列 Session1 Session2
1 Begin;
2 UPDATE TestTable SET Column2 = sysdate() Column4 = 0 AND Column5 = 47 AND Column6 = 'SEK
執行成功,影響7行
3 Begin;
4 UPDATE TestTable SET Column2 = sysdate(),Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK';
被Blocking
5 UPDATE TestTable SET Column2 = sysdate(),Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK';
執行成功
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

依據以上資訊可以發現Session2雖然被Block了,但也獲取了一些Session1在時間序列5時所需資源的X鎖,可以再開啟一個查詢select count(Column5) from TestTable where Column5 = 485,設定SET TRANSACTION ISOLATION LEVEL SERIALIZABLE,去查詢Column5 = 485的行,觀察鎖等待的資訊:
mysql> SELECT r.trx_id waiting_trx_id,r.trx_mysql_thread_id waiting_thread,r.trx_query waiting_query,b.trx_id

mysql> SELECT r.trx_id waiting_trx_id,b.trx_id blocking_trx_id,b.trx_mysql_thread_id blocking_thread,b.trx_query blocking_query FROM information_schema.innodb_lock_waits w INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id \G;

*************************** 1. row ***************************

waiting_trx_id: 103006

waiting_thread: 36

waiting_query: UPDATE TestTable SET Column1 = 1,Column3 = '026' Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK'

blocking_trx_id: 103003

blocking_thread: 37

blocking_query: NULL

*************************** 2. row ***************************

waiting_trx_id: 421500433538672

waiting_thread: 39

waiting_query: select count(Column5) from TestTable where Column5 = 485

blocking_trx_id: 103006

blocking_thread: 36

blocking_query: UPDATE TestTable SET Column1 = 1,Column3 = '026' Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK'

2 rows in set,1 warning (0.00 sec)

mysql> select * from information_schema.innodb_lock_waits \G;

*************************** 1. row ***************************

requesting_trx_id: 103006

requested_lock_id: 103006:417:1493:859

blocking_trx_id: 103003

blocking_lock_id: 103003:417:1493:859

*************************** 2. row ***************************

requesting_trx_id: 421500433538672

requested_lock_id: 421500433538672:417:749:2

blocking_trx_id: 103006

blocking_lock_id: 103006:417:749:2

2 rows in set,1 warning (0.00 sec)
mysql> select * from INNODB_LOCKS \G;

*************************** 1. row ***************************

lock_id: 103006:417:1493:859

lock_trx_id: 103006

lock_mode: X

lock_type: RECORD

lock_table: test.TestTable

lock_index: idxColumn6

lock_space: 417

lock_page: 1493

lock_rec: 859

lock_data: 'SEK',8262738

*************************** 2. row ***************************

lock_id: 103003:417:1493:859

lock_trx_id: 103003

lock_mode: X

lock_type: RECORD

lock_table:test.TestTable

lock_index: idxColumn6

lock_space: 417

lock_page: 1493

lock_rec: 859

lock_data: 'SEK',8262738

*************************** 3. row ***************************

lock_id: 421500433538672:417:749:2

lock_trx_id: 421500433538672

lock_mode: S

lock_type: RECORD

lock_table: test.TestTable

lock_index: column5_index

lock_space: 417

lock_page: 749

lock_rec: 2

lock_data: 485,8317620

*************************** 4. row ***************************

lock_id: 103006:417:749:2

lock_trx_id: 103006

lock_mode: X

lock_type: RECORD

lock_table: test.TestTable

lock_index: column5_index

lock_space: 417

lock_page: 749

lock_rec: 2

lock_data: 485,8317620

4 rows in set,1 warning (0.00 sec)

可以看到Session2,trx_id 103006阻塞了trx_id 421500433538672,而trx_id 421500433538672 requested_lock也正好是lock_data: 485,8317620。由此可見Session2雖然別block了,但是還是獲取到了Index column5_index相關的鎖。被Block是因為intersect的原因,還需要idxColumn6的鎖,至此思路已經清晰,對整個分配鎖的資訊簡化一下,如下表格(請求到的鎖用青色表示,需獲取但未獲取到的鎖用紅色表示):

時間點 Session1 Session2
1 477 SEK
2 485SEK
3 485SEK 死鎖發生

可以看到485 SEK這兩個資源形成了一個環狀,最終發生死鎖。

【解決方法】

  • 最佳的方法是新增column5和Column6的聯合索引。
  • 我們環境當時的情況發現Column6的篩選度非常低,就刪除了Column6的索引。
    10:55左右刪除索引後,報錯沒有再發生:

總結

以上就是這篇文章的全部內容了,希望本文的內容對大家的學習或者工作具有一定的參考學習價值,如果有疑問大家可以留言交流,謝謝大家對我們的支援。