PIKA異常01-生產主從關係不能建立及主從斷連問題
阿新 • • 發佈:2020-12-17
技術標籤:pika
整篇文章包含三種主從關係不能建立問題的解決方案
1.rename問題引起主從關係不能建立
起因: db目錄掛載到了磁碟上,而dbsync目錄掛載到本地,等價於兩個目錄掛載到兩塊盤
日誌: 檢視從節點的PIKA.WARRNING日誌如下,可以看到提示rename問題.
解釋: 因為slave接受的master全同步資料是硬連結到db目錄的。需要在同一個檔案系統上。
解決方案: 將db與dbsync都掛載到磁碟上,rename問題解決
Log file created at: 2020/12/01 11:14:21 Running on machine: pika-test-20201128-001 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg W1201 11:14:21.244900 67744 pika_partition.cc:304] Partition: db0, Failed to rename new db path when change db, error: Invalid cross-device link W1201 11:14:21.244971 67744 pika_partition.cc:255] Partition: db0, Failed to change db W1201 11:15:41.613822 67637 pika_repl_client_thread.cc:49] Master conn timeout : pika1:11221 try reconnect W1201 11:20:19.930094 67744 pika_partition.cc:298] Partition: db0, Failed to rename db path when change db, error: No such file or directory W1201 11:20:19.930109 67744 pika_partition.cc:255] Partition: db0, Failed to change db W1201 11:21:38.866683 67637 pika_repl_client_thread.cc:49] Master conn timeout : pika1:11221 try reconnect
2.Timeout引起主從關係不能建立
起因: 主從關係掉了之後,進行全量同步過程中,主節點的dump+傳輸時間過長,導致主從在不斷的全量同步
日誌: 如下主從節點的日誌,發生了超時,在不斷的重新進行全量同步
解釋: 理解為主節點dump+傳輸時間過長
解決方法: slaveof pika1 9221 force(force操作,會讓從節點一直等待主節點dump完成並且傳輸到從節點建立主從)
注: 如果pika主從的建立是通過slaveof pika1 9221這種使用hostname建立的,還有可能日誌會提示使用容器id而不是hostname(pika1)
解決方法先使用容器id建立主從 slaveof containerid 9221 force 當主從建立成功之後執行: slaveof no one slaveof pika1 9221 force config rewrite 這個原理暫時還不能解釋,親測有效
主節點日誌:
I1130 10:44:48.794337 29011 pika_partition.cc:379] db0 bgsave_info: path=/data1/pika/dump/20201130/db0, filenum=2562, offset=13365700 I1130 10:48:02.246551 29011 pika_partition.cc:385] db0 create new backup finished. I1130 10:48:02.246703 29011 pika_server.cc:1085] Partition: db0 Start Send files in /data1/pika/dump/20201130/db0 to 127.0.0.1 I1130 10:58:55.963577 29011 pika_server.cc:1186] Partition: db0 RSync Send Files Success I1130 11:00:15.398463 26013 pika_repl_server_thread.cc:29] ServerThread Close Slave Conn, fd: 201, ip_port: 127.0.0.1:37504 I1130 11:00:15.398608 26013 pika_server.cc:740] Delete Slave Success, ip_port: 127.0.0.1:9221 I1130 11:00:15.398638 26013 pika_rm.cc:90] Remove Slave Node, Partition: (db0:0), ip_port: 127.0.0.1:9221 I1130 11:00:25.094928 26016 pika_repl_server_conn.cc:42] Receive MetaSync, Slave ip: 127.0.0.1, Slave port:9221 I1130 11:00:25.095026 26016 pika_server.cc:843] Add New Slave, 127.0.0.1:9221 I1130 11:00:25.233932 26014 pika_repl_server_conn.cc:108] Receive Trysync, Slave ip: 10.20.134.1, Slave port:9221, Partition: db0, filenum: 0, pro_offset: 0 I1130 11:00:25.233992 26014 pika_repl_server_conn.cc:263] Partition: db0 binlog has been purged, may need full sync I1130 11:00:40.320998 26015 pika_repl_server_conn.cc:324] Handle partition DBSync Request I1130 11:00:40.321120 26015 pika_rm.cc:79] Add Slave Node, partition: (db0:0), ip_port: 127.0.0.1:9221 I1130 11:00:40.322064 26015 pika_repl_server_conn.cc:347] Partition: db0_0 Handle DBSync Request Success, Session: 183 I1130 11:00:52.044495 29011 pika_partition.cc:376] db0 after prepare bgsave I1130 11:00:52.044572 29011 pika_partition.cc:379] db0 bgsave_info: path=/data1/pika/dump/20201130/db0, filenum=2562, offset=13365700 I1130 11:04:03.152256 29011 pika_partition.cc:385] db0 create new backup finished. I1130 11:04:03.152402 29011 pika_server.cc:1085] Partition: db0 Start Send files in /data1/pika/dump/20201130/db0 to 127.0.0.1
從節點日誌:
I1130 10:44:35.124609 53402 pika_repl_client_conn.cc:182] Partition: db0 Need Wait To Sync
I1130 10:58:55.921267 53506 pika_partition.cc:236] Partition: db0 Information from dbsync info, master_ip: 127.0.0.1, master_port: 9221, filenum: 2562, offset: 13365700, term: 0, index: 0
I1130 10:58:55.921336 53506 pika_partition.cc:293] Partition: db0, Prepare change db from: /data2/pika/db/db0_bak
I1130 11:00:15.392289 53399 pika_repl_client_thread.cc:38] ReplClient Timeout conn, fd=95, ip_port=127.0.0.1:11221
I1130 11:00:25.087127 53506 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (127.0.0.1:9221)
I1130 11:00:25.088173 53403 pika_server.cc:618] Mark try connect finish
I1130 11:00:25.088215 53403 pika_repl_client_conn.cc:146] Finish to handle meta sync response
I1130 11:00:25.226863 53404 pika_repl_client_conn.cc:261] Partition: db0 Need To Try DBSync
I1130 11:00:40.315070 53405 pika_repl_client_conn.cc:182] Partition: db0 Need Wait To Sync
I1130 11:15:53.390866 53506 pika_partition.cc:236] Partition: db0 Information from dbsync info, master_ip: 127.0.0.1, master_port: 9221, filenum: 2562, offset: 13365700, term: 0, index: 0
I1130 11:15:53.392174 53506 pika_partition.cc:293] Partition: db0, Prepare change db from: /data2/pika/db/db0_bak
I1130 11:17:12.993613 53399 pika_repl_client_thread.cc:38] ReplClient Timeout conn, fd=70, ip_port=127.0.0.1:11221
I1130 11:17:22.057538 53506 pika_repl_client.cc:146] Try Send Meta Sync Request to Master (127.0.0.1:9221)
3.主節點上已經存在從節點資訊並且不能更新引起主從關係不能建立
起因: 由於網路情況或是其他情況主從斷開,重新建立連線時提示主從關係已經存在
日誌: 如下為從節點的日誌,提示Slave AlreadyExist
解決方案: 將pika版本升級為3.3.6,這個版本已經修復了Slave AlreadyExist 這個問題,可以看一下github上的版本更新
pika_repl_client.cc:145] Try Send Meta Sync Request to Master (pika1:9221)
pika_repl_client_conn.cc:100] Meta Sync Failed: Slave AlreadyExist
Sync error, set repl_state to PIKA_REPL_ERROR
pika_repl_client_thread.cc:21] ReplClient Close conn, fd=364, ip_port=pika1:11221
4.總結
1.Slave AlreadyExist問題更新版本到3.3.6
2.rename問題將db與dbsync掛載到同一目錄下
3.ReplClient Timeout conn問題建立主從的時候需要執行force建立
4.建議將dump db與dbsync一起掛載到磁碟目錄上,這樣也會減少dump的時間