MySQL GTID複製中主從重連如何校驗GTID
MySQL GTID複製中主從重連如何校驗GTID
- 環境:MySQL5.7.18 多執行緒複製
- show master status先檢視主庫的Executed_Gtid_Set
[email protected] : (none) 01:37:02> show master status;
+------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000028 | 4313 | | | 1a324bb7-4a61-11e7-811f-fa163e85255f:1-138 |
+------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
- show slave status先檢視slave的Retrieved_Gtid_Set與Executed_Gtid_Set
- 備庫上show master status檢視資訊
執行set global gtid_purged=”;之前需要先清空@@GLOBAL.GTID_EXECUTED。也就是要先執行reset master
可以看到,show slave status、show master status中的Executed_Gtid_Set與select @@GLOBAL.GTID_EXECUTED的值是同一個。
備庫執行reset master
root@localhost : (none) 01:42:49> reset master;
Query OK, 0 rows affected (0.09 sec)
- show slave status 檢視Retrieved_Gtid_Set與Executed_Gtid_Set
Master_UUID: 1a324bb7-4a61-11e7-811f-fa163e85255f
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:125-138
Executed_Gtid_Set:
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
發現,執行reset master之後,show slave status中的Executed_Gtid_Set會被清空。
- 然後執行set global gtid_purged
[email protected] : (none) 01:43:28> set global gtid_purged='1a324bb7-4a61-11e7-811f-fa163e85255f:1-100';
Query OK, 0 rows affected (0.02 sec)
- 再檢視show slave status中的Retrieved_Gtid_Set與Executed_Gtid_Set
Master_UUID: 1a324bb7-4a61-11e7-811f-fa163e85255f
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:125-138
Executed_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:1-100
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
[email protected] : (none) 01:43:36> show master status;
+------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000001 | 154 | | | 1a324bb7-4a61-11e7-811f-fa163e85255f:1-100 |
+------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
[email protected] : (none) 01:58:53> select @@GLOBAL.GTID_EXECUTED;
+--------------------------------------------+
| @@GLOBAL.GTID_EXECUTED |
+--------------------------------------------+
| 1a324bb7-4a61-11e7-811f-fa163e85255f:1-100 |
+--------------------------------------------+
1 row in set (0.00 sec)
被purged掉的gtid會被當成已經執行過的gtid,設定在Executed_Gtid_Set中。
- 此時,在主庫上操作,建表。
root@localhost : wukong 02:07:48> create table b(id int);
Query OK, 0 rows affected (10.31 sec)
- 檢視主庫的show master status
[email protected] : wukong 02:08:07> show master status;
+------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000028 | 4483 | | | 1a324bb7-4a61-11e7-811f-fa163e85255f:1-139 |
+------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
- 檢視備庫的slave status
Master_UUID: 1a324bb7-4a61-11e7-811f-fa163e85255f
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:125-139
Executed_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:1-100:139
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
- 接著在主庫上建另一張表
root@localhost : wukong 02:12:56> create table c(id int);
Query OK, 0 rows affected (2.36 sec)
- 檢視主庫的show master status
[email protected] : wukong 02:13:05> show master status;
+------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000028 | 4653 | | | 1a324bb7-4a61-11e7-811f-fa163e85255f:1-140 |
+------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
- 此時再檢視備庫的show slave status
[email protected] : (none) 02:12:30> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.12
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000028
Read_Master_Log_Pos: 4653
Relay_Log_File: mysql-relay-bin.000007
Relay_Log_Pos: 1730
Relay_Master_Log_File: mysql-bin.000028
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 4653
Relay_Log_Space: 5333
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 330612
Master_UUID: 1a324bb7-4a61-11e7-811f-fa163e85255f
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:125-140
Executed_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:1-100:139-140
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
發現slave的Executed_Gtid_Set會從139開始。原因:在purged執行之前,slave上的Executed_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:1-138;所以新的會從139開始。當purged的值與原本的Executed_Gtid_Set值不一致,就會造成這種空洞。
- 此時,如果stop slave,然後start slave
[email protected] : (none) 02:13:06> stop slave;
Query OK, 0 rows affected (0.21 sec)
[email protected] : (none) 02:13:35> start slave;
Query OK, 0 rows affected (1.05 sec)
[email protected] : (none) 02:13:39> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.12
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000028
Read_Master_Log_Pos: 4653
Relay_Log_File: mysql-relay-bin.000008
Relay_Log_Pos: 454
Relay_Master_Log_File: mysql-bin.000027
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 2 failed executing transaction '1a324bb7-4a61-11e7-811f-fa163e85255f:103' at master log mysql-bin.000027, end_log_pos 2231. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 1454
Relay_Log_Space: 11130
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 2 failed executing transaction '1a324bb7-4a61-11e7-811f-fa163e85255f:103' at master log mysql-bin.000027, end_log_pos 2231. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 330612
Master_UUID: 1a324bb7-4a61-11e7-811f-fa163e85255f
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 170728 02:13:40
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:101-140
Executed_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:1-102:139-140
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
- 檢視報錯資訊
[email protected] : (none) 02:14:21> select * from performance_schema.replication_applier_status_by_worker\G;
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 1
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: 1a324bb7-4a61-11e7-811f-fa163e85255f:102
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
CHANNEL_NAME:
WORKER_ID: 2
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: 1a324bb7-4a61-11e7-811f-fa163e85255f:103
LAST_ERROR_NUMBER: 1062
LAST_ERROR_MESSAGE: Worker 2 failed executing transaction '1a324bb7-4a61-11e7-811f-fa163e85255f:103' at master log mysql-bin.000027, end_log_pos 2231; Could not execute Write_rows event on table pxs.dd; Duplicate entry '123123123' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 2231
LAST_ERROR_TIMESTAMP: 2017-07-28 14:13:40
如果purged前後沒有進行stop slave與start slave,那麼slave會接著原本的Retrieved_Gtid_Set從master往下接收新的事務,所以當時在這個例子中看到沒有報錯。但是,如果purged之後,stop slave、start slave,那麼slave會將自己的UNION(@@global.gtid_executed, Retrieved_gtid_set - last_received_GTID)傳送給master,在這個例子中是UNION(Executed_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:1-100:139-140,Retrieved_Gtid_Set: 1a324bb7-4a61-11e7-811f-fa163e85255f:125-140)=1a324bb7-4a61-11e7-811f-fa163e85255f:1-100:125-140;master會與之對比自己的Executed_Gtid_Set,在這個例子中是1a324bb7-4a61-11e7-811f-fa163e85255f:1-140。master發現並認為101-124(注意:不是101-138)的gtid對應的事務從庫沒有執行過,所以會將101-124的事務傳送給slave,而實際上103的這個事務slave已經執行過了,所以此時報主鍵衝突的錯誤;如果此時101-124對應的master的binlog被purged掉了,那麼slave就會報error 1236:Got fatal error 1236 from master when reading data from binary log: ‘The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.’
- 主從gtid如何校驗
When using GTIDs, the slave tells the master which transactions it has already received, executed, or both. To compute this set, it reads the global value of gtid_executed and the value of the Retrieved_gtid_set column from SHOW SLAVE STATUS. The GTID of the last transmitted transaction is included in Retrieved_gtid_set only when the full transaction is received. The slave computes the following set:
UNION(@@global.gtid_executed, Retrieved_gtid_set)
Prior to MySQL 5.7.5, the GTID of the last transmitted transaction was included in Retrieved_gtid_set even if the transaction was only partially transmitted, and the last received GTID was subtracted from this set. (Bug #17943188) Thus, the slave computed the following set:
UNION(@@global.gtid_executed, Retrieved_gtid_set - last_received_GTID)
This set is sent to the master as part of the initial handshake, and the master sends back all transactions that it has executed which are not part of the set. If any of these transactions have been already purged from the master’s binary log, the master sends the error ER_MASTER_HAS_PURGED_REQUIRED_GTIDS to the slave, and replication does not start.