1. 程式人生 > 其它 >MySQL 5.7中鎖的一個通用問題

MySQL 5.7中鎖的一個通用問題

前幾天分析了一個死鎖的問題,有一個網友看了以後,就發了郵件給我問一個問題。一般來說,能夠傳送郵件提出問題的同學,都是很認真的,因為他要準備好日誌,準備好操作過程,準備好他已經在做的事情。所以這類問題,我都會認真的分析一下,如果沒有結果,那就繼續分析再等等,掐指一算,有很多問題已經拖了好久了。

這位網友提的一個問題,我看了以後感覺很是奇怪,因為有些顛覆我對MySQL鎖的一些認識。這該如何是好。

這個環境的事務隔離級別是RR,存在主鍵,存在範圍查詢。

如何復現這個問題,網友提供了資訊。

建立表
mysql> create table tt(a int not null primary key) engine=innodb;
mysql> insert into tt values(10),(20),(30),(40),(50);
復現這個問題可以參考:
session1:
mysql> set session tx_isolation='repeatable-read';
mysql> begin;
mysql> select * from tt where a > 15 and a < 35 for update;
+----+
| a  |
+----+
| 20 |
| 30 |
+----+
session2:
mysql>  insert into tt select 1;
此時這個操作會被阻塞,如果你按照這個思路來看,總是會感覺不對勁。
怎麼MySQL這麼矯情了。
我帶著疑問在新搭建的一套MySQL 5.7環境上做了測試,結果還真是。
接下來的任務就是如何說服我,然後我理解了來說服這個網友。

結果這樣一個操作下來,我連連測試了5個場景,如何SQL稍作改變,結果又會大大不同。

#for update的場景1

先來做一個基於主鍵的操作。先來驗證一個最基本的情況,穩定下自己的情緒。

#session1

mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> select *from tt;
+----+
| a  |
+----+
| 10 |
| 20 |
| 30 |
| 40 |
| 50 |
+----+
5 rows in set (0.00 sec)

mysql> select * from tt where a =10 for update;
+----+
| a  |
+----+
| 10 |
+----+
1 row in set (0.00 sec)

#session2
mysql> insert into tt select 1;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

這是一個最為保守的使用方法,如果這個還有問題,那就明顯證明資料庫有問題了,基於主鍵,去掉範圍掃描,肯定妥妥的。

#for update的場景2

這個場景裡面我們修改下範圍,原來的(15,35)修改為(10,30),結果差別就很大了。有些阻塞的語句我直接就手工取消了。由此也可以看出其中的差別來,不過可能會看得有點懵了。

#session1
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from tt where a <30 and a>10 for update;
+----+
| a  |
+----+
| 20 |
+----+
1 row in set (0.00 sec)

session2:
mysql> insert into tt select 35;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into tt select 31;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into tt select 30;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted

mysql> insert into tt select 5;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into tt select 10;
ERROR 1062 (23000): Duplicate entry '10' for key 'PRIMARY'
mysql> insert into tt select 11;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted

這裡可以明顯感覺到和當前環境的資料分佈關係很微妙,範圍只有一個20,但是似乎和0有著一定的聯絡,至少,我不能保證我的查詢一定得按照這個精確的範圍。

#for update 場景3

這個場景我把最開始碰到的問題做了一些擴充套件,看看其它範圍的資料是否也有類似的情況。我擴大了資料範圍,結果很明顯的,結果讓我有些意料之外。

session1:
mysql> select *from tt;
+----+
| a  |
+----+
| 10 |
| 20 |
| 30 |
| 40 |
| 50 |
+----+
5 rows in set (0.00 sec)
mysql> begin;select * from tt where a <35 and a>15 for update;
Query OK, 0 rows affected (0.00 sec)

+----+
| a  |
+----+
| 20 |
| 30 |
+----+
2 rows in set (0.00 sec)

session2:
mysql>  insert into tt select 1;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 9;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 10;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 35;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 36;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 40;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 50;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted

#for update 場景4

儘管這個時候我已經有些亂了,但是我還是耐著性子測試了另外幾個場景。我把範圍有(15,35)修改為(15,30),結果讓我很意外。原本阻塞的insert就可以了。


session1
mysql> begin;select * from tt where a <30 and a>15 for update;
Query OK, 0 rows affected (0.00 sec)

+----+
| a  |
+----+
| 20 |
+----+
1 row in set (0.00 sec)

session2
mysql>  insert into tt select 1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql>  insert into tt select 15;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 15;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 16;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 14;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 13;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 11;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 10;
ERROR 1062 (23000): Duplicate entry '10' for key 'PRIMARY'
mysql>  insert into tt select 9;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0
mysql>  insert into tt select 30;
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
mysql>  insert into tt select 31;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings:

這一切的對比,從直觀感受來看是和表裡的資料分佈是有一定的關係的。

比如場景4,如果我把範圍由(15,35)修改為(15,30),這個資料的情況有什麼特別之處嗎,從我的猜測來看,應該是和裡面的索引儲存有一定的關係,我查看了Information_schema.innodb_trx,innodb_locks的細節,裡面都是指向了同一行。

mysql> select * from INFORMATION_SCHEMA.innodb_locksG;
*************************** 1. row ***************************
    lock_id: 4081:36:3:2
lock_trx_id: 4081
  lock_mode: X,GAP
  lock_type: RECORD
 lock_table: `test`.`tt`
 lock_index: PRIMARY
 lock_space: 36
  lock_page: 3
   lock_rec: 2
  lock_data: 10
*************************** 2. row ***************************
    lock_id: 4078:36:3:2
lock_trx_id: 4078
  lock_mode: X
  lock_type: RECORD
 lock_table: `test`.`tt`
 lock_index: PRIMARY
 lock_space: 36
  lock_page: 3
   lock_rec: 2
  lock_data: 10
2 rows in set, 1 warning (0.00 sec)

通過上面的資訊可以看到,都是隻想了頁面3的資料第2行,這個明顯就不對應啊。

但是MySQL 5.7中出現這個問題,自己還是帶著一絲的僥倖心理,在MGR上測試了一把,能夠復現,結果今天繼續耐著性子看了下這個問題,在5.6上模擬了一下,5.6全然沒有這個問題,問題到了這裡,就有了柳暗花明的一面,能夠肯定的是這個問題在MySQL 5.7中可以復現,在MySQL 5.6中是正常的。

如此一來,問題的定論就有了方向,很快就在bugs.mysql.com裡面找到了一個相關的bug(85749)

裡面也做了類似的測試,能夠復現,MySQL官方做了確認。

[31 Mar 18:10] Sinisa Milivojevic

Hi!
I have run your test case and got the same results as you have.
Upon further analysis, I concluded that this is a bug.  A small bug , but a bug.
Verified.
而有看點的是問題的提出者定位到了相關的程式碼,還是希望文件的部分能夠把間隙鎖的部分補充一下。
No locks are released in this case, but we do request X lock on the gap before the next, non-matching record when non-unique secondary index is used. Check code starting from this line (https://github.com/mysql/mysql-server/blob/71f48ab393bce80a59e5a2e498cd1f46f6b43f9a/storag...):
			/* Try to place a gap lock on the next index record
			to prevent phantoms in ORDER BY ... DESC queries */
			const rec_t*	next_rec = page_rec_get_next_const(rec);

			offsets = rec_get_offsets(next_rec, index, offsets,
						  ULINT_UNDEFINED, &heap);
			err = sel_set_rec_lock(pcur,
					       next_rec, index, offsets,
					       prebuilt->select_lock_type,
LOCK_GAP, thr, &mtr);

in row_search_mvcc(). See the (potential) reason to set this gap lock in the comment above.
Maybe there is another reason for the behavior we see. Then it should be also documented.