1. 程式人生 > 其它 >mysql鎖排查

mysql鎖排查

一、說明

本文將通過實驗介紹mysql鎖該如何排查,本實驗的mysql版本為8.0.23,隔離級別為RC。

二、實驗

場景一:會話a開啟事務,刪除全部資料。會話b開始事務,刪除部分資料

會話A:

(root@localhost)[hello]> begin;
(root@localhost)[hello]> delete from x1;
(root@localhost)[hello]> delete from x2;
(root@localhost)[hello]> select * from x1;
(root@localhost)[hello]> select * from x2;

會話B:

(root@localhost)[hello]> begin;
(root@localhost)[hello]> delete from x1 limit 1;  # 此處發生等待

再起一個視窗,檢視會話列表
ID 34即會話A已執行完成,狀態為Sleep。ID 37即會話B正在執行,狀態為Query。

(root@localhost)[(none)]> select * from information_schema.processlist;
+----+------+-----------+-------+---------+------+-----------+----------------------------------------------+
| ID | USER | HOST      | DB    | COMMAND | TIME | STATE     | INFO                                         |
+----+------+-----------+-------+---------+------+-----------+----------------------------------------------+
| 34 | root | localhost | hello | Sleep   |   49 |           | NULL                                         |
| 36 | root | localhost | NULL  | Query   |    0 | executing | select * from information_schema.processlist |
| 37 | root | localhost | hello | Query   |   12 | updating  | delete from x1 limit 1                       |
+----+------+-----------+-------+---------+------+-----------+----------------------------------------------+

檢視會話阻塞情況,waiting_pid、waiting_trx_id、blocking_pid、blocking_trx_id,分別表示被阻塞的事務和導致阻塞的事務。
結果顯示事務23641阻塞了事務23654。

(root@localhost)[(none)]> select locked_table_schema, locked_table_name, waiting_query, waiting_pid, waiting_trx_id, blocking_pid, blocking_trx_id from sys.innodb_lock_waits;
+---------------------+-------------------+------------------------+-------------+----------------+--------------+-----------------+
| locked_table_schema | locked_table_name | waiting_query          | waiting_pid | waiting_trx_id | blocking_pid | blocking_trx_id |
+---------------------+-------------------+------------------------+-------------+----------------+--------------+-----------------+
| hello               | x1                | delete from x1 limit 1 |          37 |          23654 |           34 |           23641 |
+---------------------+-------------------+------------------------+-------------+----------------+--------------+-----------------+

檢視鎖的資訊,engine_transaction_id對應事務編號,thread_id對應執行緒編號。
前面我們查到事務編號23641阻塞了事務編號23654,這裡顯示事務編號23641對應執行緒85,事務編號23654對應執行緒88。

(root@localhost)[(none)]> select distinct engine_transaction_id, thread_id, object_schema, object_name, lock_type, lock_mode, lock_status from performance_schema.data_locks;
+-----------------------+-----------+---------------+-------------+-----------+---------------+-------------+
| engine_transaction_id | thread_id | object_schema | object_name | lock_type | lock_mode     | lock_status |
+-----------------------+-----------+---------------+-------------+-----------+---------------+-------------+
|                 23654 |        88 | hello         | x1          | TABLE     | IX            | GRANTED     |
|                 23654 |        88 | hello         | x1          | RECORD    | X,REC_NOT_GAP | WAITING     |
|                 23641 |        85 | hello         | x1          | TABLE     | IX            | GRANTED     |
|                 23641 |        85 | hello         | x1          | RECORD    | X,REC_NOT_GAP | GRANTED     |
|                 23641 |        85 | hello         | x2          | TABLE     | IX            | GRANTED     |
|                 23641 |        85 | hello         | x2          | RECORD    | X,REC_NOT_GAP | GRANTED     |
+-----------------------+-----------+---------------+-------------+-----------+---------------+-------------+

檢視造成阻塞的執行緒歷史執行的語句,其中thread_id對應performance_schema.data_locks中的thread_id
結果顯示執行緒85執行過“delete from x1”,這才是執行緒88執行“delete from x1 limit 1”等待的原因。

(root@localhost)[(none)]> select timer_start, thread_id, sql_text, message_text from performance_schema.events_statements_history where event_name like 'statement/sql%' and thread_id in (85) order by timer_start desc;
+-------------------+-----------+----------------------------------+--------------+
| timer_start       | thread_id | sql_text                         | message_text |
+-------------------+-----------+----------------------------------+--------------+
| 22461001917916000 |        85 | select * from x2                 | NULL         |
| 22460129443671000 |        85 | select * from x1                 | NULL         |
| 22460050982468000 |        85 | delete from x2                   | NULL         |
| 22460020034606000 |        85 | delete from x1                   | NULL         |
| 22459990044910000 |        85 | begin                            | NULL         |
| 22416295653612000 |        85 | select USER()                    | NULL         |
| 22416294129231000 |        85 | select @@version_comment limit 1 | NULL         |
+-------------------+-----------+----------------------------------+--------------+

將上面所有的語句進行合併,就可以得到鎖的完整資訊了

(root@localhost)[(none)]> select distinct a.wait_started, a.locked_table, a.waiting_query, a.waiting_pid, a.blocking_pid, c.sql_text 
from sys.innodb_lock_waits a 
join performance_schema.data_locks b on (a.blocking_trx_id = b.engine_transaction_id and a.locked_table_schema = b.object_schema and a.locked_table_name = b.object_name)
join performance_schema.events_statements_history c on (b.thread_id = c.thread_id and c.sql_text regexp b.object_name);
+---------------------+--------------+------------------------+-------------+--------------+------------------+
| wait_started        | locked_table | waiting_query          | waiting_pid | blocking_pid | sql_text         |
+---------------------+--------------+------------------------+-------------+--------------+------------------+
| 2022-02-16 16:26:25 | `hello`.`x1` | delete from x1 limit 1 |          37 |           34 | delete from x1   |
| 2022-02-16 16:26:25 | `hello`.`x1` | delete from x1 limit 1 |          37 |           34 | select * from x1 |
+---------------------+--------------+------------------------+-------------+--------------+------------------+

場景二:會話a對錶加元資料鎖。會話b開始事務,刪除部分資料

會話A:

(root@localhost)[hello]> lock table x1 write;

會話B:

(root@localhost)[hello]> delete from x1 limit 1;  # 此處發生等待

檢視會話列表
ID 39即會話A已執行完成,狀態為Sleep。ID 40即會話B正在執行,狀態為Query。

(root@localhost)[(none)]> select * from information_schema.processlist;
+----+------+-----------+-------+---------+------+---------------------------------+----------------------------------------------+
| ID | USER | HOST      | DB    | COMMAND | TIME | STATE                           | INFO                                         |
+----+------+-----------+-------+---------+------+---------------------------------+----------------------------------------------+
| 49 | root | localhost | hello | Sleep   |   81 |                                 | NULL                                         |
| 51 | root | localhost | NULL  | Query   |    0 | executing                       | select * from information_schema.processlist |
| 52 | root | localhost | hello | Query   |    9 | Waiting for table metadata lock | delete from x1 limit 1                       |
+----+------+-----------+-------+---------+------+---------------------------------+----------------------------------------------+

檢視innodb_lock_waits和data_locks,都沒有資料。

(root@localhost)[(none)]> select * from sys.innodb_lock_waits;
Empty set (0.00 sec)

(root@localhost)[(none)]> select * from performance_schema.data_locks;
Empty set (0.00 sec)

檢視當前資料庫執行過的sql,能找到鎖表的語句,注意這裡的thread_id並不對應processlist表中的id。

(root@localhost)[(none)]> select thread_id, event_name, sql_text from performance_schema.events_statements_current;
+-----------+---------------------------+------------------------------------------------------------------------------------------+
| thread_id | event_name                | sql_text                                                                                 |
+-----------+---------------------------+------------------------------------------------------------------------------------------+
|       100 | statement/sql/lock_tables | lock table x1 write                                                                      |
|       102 | statement/sql/select      | select thread_id, event_name, sql_text from performance_schema.events_statements_current |
|       103 | statement/sql/delete      | delete from x1 limit 1                                                                   |
+-----------+---------------------------+------------------------------------------------------------------------------------------+

檢視threads表,這裡的thread_id對應events_statements_current表中thread_id,processlist_id對應processlist表中id。

(root@localhost)[(none)]> select thread_id, processlist_id, processlist_user, processlist_db, processlist_command, processlist_time, processlist_state, processlist_info from performance_schema.threads where type = 'FOREGROUND';
+-----------+----------------+------------------+----------------+---------------------+------------------+---------------------------------+------------------------+
| thread_id | processlist_id | processlist_user | processlist_db | processlist_command | processlist_time | processlist_state               | processlist_info       |
+-----------+----------------+------------------+----------------+---------------------+------------------+---------------------------------+------------------------+
|        48 |              6 | NULL             | NULL           | Daemon              |            31219 | Suspending                      | NULL                   |
|       100 |             49 | root             | hello          | Sleep               |              172 | NULL                            | NULL                   |
|       103 |             52 | root             | hello          | Query               |              100 | Waiting for table metadata lock | delete from x1 limit 1 |
+-----------+----------------+------------------+----------------+---------------------+------------------+---------------------------------+------------------------+

將以上sql進行合併,從sql_text中找鎖表的語句,殺掉對應的會話即可。

(root@localhost)[(none)]> select b.processlist_id, b.processlist_user, b.processlist_db, b.processlist_command, b.processlist_time, b.processlist_state, b.processlist_info, a.event_name, a.sql_text
from performance_schema.events_statements_current a
join performance_schema.threads b on (a.thread_id = b.thread_id);
+----------------+------------------+----------------+---------------------+------------------+---------------------------------+-------------------------+---------------------------+------------------------+
| processlist_id | processlist_user | processlist_db | processlist_command | processlist_time | processlist_state               | processlist_info        | event_name                | sql_text               |
+----------------+------------------+----------------+---------------------+------------------+---------------------------------+-------------------------+---------------------------+------------------------+
|             49 | root             | hello          | Sleep               |              289 | NULL                            | NULL                    | statement/sql/lock_tables | lock table x1 write    |
|             52 | root             | hello          | Query               |              217 | Waiting for table metadata lock | delete from x1 limit 1  | statement/sql/delete      | delete from x1 limit 1 |
+----------------+------------------+----------------+---------------------+------------------+---------------------------------+-------------------------+---------------------------+------------------------+

三、監控

檢視事務資訊

(root@localhost)[(none)]> select trx_id, trx_state, trx_started, trx_mysql_thread_id, trx_query from information_schema.innodb_trx;
+--------+-----------+---------------------+---------------------+------------------------+
| trx_id | trx_state | trx_started         | trx_mysql_thread_id | trx_query              |
+--------+-----------+---------------------+---------------------+------------------------+
|  23654 | LOCK WAIT | 2022-02-16 16:26:25 |                  37 | delete from x1 limit 1 |
|  23641 | RUNNING   | 2022-02-16 16:25:47 |                  34 | NULL                   |
+--------+-----------+---------------------+---------------------+------------------------+

因此可以通過下面的語句查出當前最長的事務。

(root@localhost)[(none)]> select max(time_to_sec(timediff(now(), trx_started))) max_trx_time from information_schema.innodb_trx;  
+--------------+
| max_trx_time |
+--------------+
|          571 |
+--------------+

四、總結

  1. 等待行鎖最大的等待時長引數為innodb_lock_wait_timeout,預設50s。等待元資料鎖最大的等待時長引數為lock_wait_timeout,預設31536000s(一年)。建議對lock_wait_timeout進行調整。
  2. 通過information_schema.innodb_trx對資料庫進行長事務監控。
  3. 通過sys.innodb_lock_waits檢視被阻塞的sql
  4. 元資料鎖通過performance_schema.events_statements_current,performance_schema.threads關聯起來看