淺談InnoDB的MVCC策略
InnoDB 是採用了多版本併發控制(MVCC)的一套儲存引擎,它在表空間內一個被稱為“rollback segment”的地方,記錄了有變更的資料行的舊版本(修改前)資訊,用來支援一些事務功能:例如一致性和回滾。InnoDB利用“rollback segment”裡的這些資訊實現了事務回滾過程中的undo操作;而在一致性讀的過程中,這些資訊也被用來構建行資料的修改前的值。
InnoDB is a multi-versioned storage engine: it keeps information about old versions of changed rows, to support transactional features such as concurrency and rollback. This information is stored in the tablespace in a data structure called a rollback segment
(after an analogous data structure in Oracle). InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
MVCC在InnoDB的隔離級別為Repeatable Read或Read Commited時才生效。另外兩個隔離級別不相容是因為Read UnCommited要求儲存引擎總是去讀取最新的資料行(而不會根據當前的事務版本去選擇性讀取),而Serializable則要求事務序列化執行(完全沒有併發的概念)。
InnoDB會自動為每個資料行增加3個欄位:
1、DB_TRX_ID(6 byte)標示了最近對該資料行有insert或update操作的事務的transaction identifier。delete操作被視為update,內部會設定一個特殊的標誌位用來標示刪除狀態。
2、DB_ROLL_PTR(7 byte)稱為roll pointer。指向rollback segment內的undo log記錄。一旦資料行有所更新,undo log 包含了重構修改前資料所需要的必要資訊。
3、DB_ROW_ID(6 byte)包含了row ID,當有新資料行insert時單調遞增。
Internally, InnoDB adds three fields to each row stored in the database. A 6-byte DB_TRX_ID field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte DB_ROLL_PTR field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated. A 6-byte DB_ROW_ID field contains a row ID that increases monotonically as new rows are inserted. If InnoDB generates a clustered index automatically, the index contains row ID values. Otherwise, the DB_ROW_ID column does not appear in any index.
由此,Repeatable Read隔離級別下,MVCC對增刪改查的實現方法:
Select:
InnoDB只查詢版本號小於或等於當前事務版本號的資料行。確保事務讀取的行,要麼是在事務開始前就存在的,要麼是事務自身插入或者修改過的。
行的刪除版本要麼未定義,要麼大於當前事務的版本。確保事務讀取到的行,在事務開始之前未被刪除。
只有符合上述兩條的資料行才會被返回作為查詢結果。
Insert:
InnoDB為新插入的行儲存當前SVN作為行版本號。
Delete:
InnoDB為刪除的行儲存當前SVN作為行刪除標識。
Update:
InnoDB對Update的操作會轉義為Insert+Delete:插入一行新記錄,儲存當前SVN作為行版本號,同時儲存當前SVN到原來的行作為行刪除標識。
一致性非鎖定讀(Consistent Nonlocking Reads)是指InnoDB使用多版本併發控制的方式向query提供資料庫在某個時間點的快照資料,提高了併發性。
在兩種不同的隔離級別下,對於快照的定義有所不同。Repeatable Read:讀取事務開始時的行資料版本;Read Commited:總是讀取最新的快照資料版本。
A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time.
The query sees the changes made by transactions that committed before that point of time, and no changes made by later or uncommitted transactions.
The exception to this rule is that the query sees the changes made by earlier statements within the same transaction.
This exception causes the following anomaly: If you update some rows in a table, a SELECT sees the latest version of the updated rows, but it might also see older versions of any rows. If other sessions simultaneously update the same table, the anomaly means
that you might see the table in a state that never existed in the database.
注意,快照是在第一次select ... for update;之前讀取所有已提交資料,建立read view 生成快照,而不是開啟事務的時候。
參考資料:
(1)https://dev.mysql.com/doc/refman/5.5/en/innodb-multi-versioning.html
(2)《高效能MySQL》第1章
(3)http://www.web520.cn/archives/29979
(4)http://coding-geek.com/how-databases-work/
(5)https://segmentfault.com/a/1190000008459057