1. 程式人生 > >galera 故障恢復

galera 故障恢復

因機房斷電, mysql 叢集重啟後, mysql.galera 無法完成主從同步

故障日誌如下

170703 00:44:14 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170703 00:44:14 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.ImJsjZ' --pid-file='/var/lib/mysql/hh-yun-db-129042.-
recover.pid'
170703
00:44:22 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1 170703 0:44:22 [Note] WSREP: wsrep_start_position var submitted: '00000000-0000-0000-0000-000000000000:-1' 170703 0:44:22 [Note] WSREP: Read nil XID from storage engines, skipping position init 170703 0:44:22 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
170703 0:44:22 [Note] WSREP: wsrep_load(): Galera 3.5(rXXXX) by Codership Oy <[email protected]> loaded successfully. 170703 0:44:22 [Note] WSREP: CRC-32C: using hardware acceleration. 170703 0:44:22 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat 170703 0:44
:22 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1 170703 0:44:22 [Note] WSREP: Passing config to GCS: base_host = 240.10.129.42; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; ev s.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; ev s.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera. cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_ throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weight = 1 ; proton 170703 0:44:22 [Note] WSREP: Service thread queue flushed. 170703 0:44:22 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1 170703 0:44:22 [Note] WSREP: wsrep_sst_grab() 170703 0:44:22 [Note] WSREP: Start replication 170703 0:44:22 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1 170703 0:44:22 [Note] WSREP: protonet asio version 0 170703 0:44:22 [Note] WSREP: Using CRC-32C (optimized) for message checksums. 170703 0:44:22 [Note] WSREP: backend: asio 170703 0:44:22 [Note] WSREP: GMCast version 0 170703 0:44:22 [Note] WSREP: (b3573054-5f45-11e7-964d-3a604abdf474, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 170703 0:44:22 [Note] WSREP: (b3573054-5f45-11e7-964d-3a604abdf474, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 170703 0:44:22 [Note] WSREP: EVS version 0 170703 0:44:22 [Note] WSREP: PC version 0 170703 0:44:22 [Note] WSREP: gcomm: connecting to group 'yun_clutster', peer '240.10.129.41:' 170703 0:44:22 [Note] WSREP: declaring 456416f2-5f43-11e7-9f2c-7e2e30ef17dd stable 170703 0:44:22 [Note] WSREP: Node 456416f2-5f43-11e7-9f2c-7e2e30ef17dd state prim 170703 0:44:22 [Note] WSREP: view(view_id(PRIM,456416f2-5f43-11e7-9f2c-7e2e30ef17dd,2) memb { 456416f2-5f43-11e7-9f2c-7e2e30ef17dd,0 b3573054-5f45-11e7-964d-3a604abdf474,0 } joined { } left { } partitioned { }) 170703 0:44:23 [Note] WSREP: gcomm: connected 170703 0:44:23 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636 170703 0:44:23 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) 170703 0:44:23 [Note] WSREP: Opened channel 'yun_clutster' 170703 0:44:23 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 170703 0:44:23 [Note] WSREP: Waiting for SST to complete. 170703 0:44:23 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 170703 0:44:23 [Note] WSREP: STATE EXCHANGE: sent state msg: b368f83d-5f45-11e7-b169-2e0622f98c98 170703 0:44:23 [Note] WSREP: STATE EXCHANGE: got state msg: b368f83d-5f45-11e7-b169-2e0622f98c98 from 0 (hh-yun-db-129041.) 170703 0:44:23 [Note] WSREP: STATE EXCHANGE: got state msg: b368f83d-5f45-11e7-b169-2e0622f98c98 from 1 (hh-yun-db-129042.) 170703 0:44:23 [Note] WSREP: Quorum results: version = 3, component = PRIMARY, conf_id = 1, members = 1/2 (joined/total), act_id = 3524, last_appl. = -1, protocols = 0/5/2 (gcs/repl/appl), group UUID = 45649783-5f43-11e7-a935-3e74e94aac23 170703 0:44:23 [Note] WSREP: Flow-control interval: [23, 23] 170703 0:44:23 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 3524) 170703 0:44:23 [Note] WSREP: State transfer required: Group state: 45649783-5f43-11e7-a935-3e74e94aac23:3524 Local state: 00000000-0000-0000-0000-000000000000:-1 170703 0:44:23 [Note] WSREP: New cluster view: global state: 45649783-5f43-11e7-a935-3e74e94aac23:3524, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 2 170703 0:44:23 [Warning] WSREP: Gap in state sequence. Need state transfer. 170703 0:44:25 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address 'xxxx.42' --auth 'xxxxx:xxxxx' --datadir '/var/lib/mysql/' --defaults-fil e '/etc/my.cnf' --parent '162980'' 170703 0:44:25 [Note] WSREP: Prepared SST request: rsync|240.10.129.42:4444/rsync_sst 170703 0:44:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 170703 0:44:25 [Note] WSREP: REPL Protocols: 5 (3, 1) 170703 0:44:25 [Note] WSREP: Service thread queue flushed. 170703 0:44:25 [Note] WSREP: Assign initial position for certification: 3524, protocol version: 3 170703 0:44:25 [Note] WSREP: Service thread queue flushed. 170703 0:44:25 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (4 5649783-5f43-11e7-a935-3e74e94aac23): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():447. IST will be unavailable. 170703 0:44:25 [Note] WSREP: Member 1.0 (hh-yun-db-129042.) requested state transfer from '*any*'. Selected 0.0 (hh-yun-db-129041.)(SYNCED) as donor. 170703 0:44:25 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3524) 170703 0:44:25 [Note] WSREP: Requesting state transfer: success, donor: 0 170703 0:45:12 [Note] WSREP: 0.0 (hh-yun-db-129041.): State transfer to 1.0 (hh-yun-db-129042.) complete. 170703 0:45:12 [Note] WSREP: Member 0.0 (hh-yun-db-129041.) synced with group. WSREP_SST: [INFO] Joiner cleanup. (20170703 00:45:13.006) WSREP_SST: [INFO] Joiner cleanup done. (20170703 00:45:14.276) 170703 0:45:14 [Note] WSREP: SST complete, seqno: 3524 170703 0:45:14 InnoDB: The InnoDB memory heap is disabled 170703 0:45:14 InnoDB: Mutexes and rw_locks use GCC atomic builtins 170703 0:45:14 InnoDB: Compressed tables use zlib 1.2.3 170703 0:45:14 InnoDB: Using Linux native AIO 170703 0:45:14 InnoDB: Initializing buffer pool, size = 16.0G 170703 0:45:15 InnoDB: Completed initialization of buffer pool 170703 0:45:15 InnoDB: highest supported file format is Barracuda. InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 170703 0:45:16 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 170703 0:45:16 InnoDB: Waiting for the background threads to start 170703 0:45:17 Percona XtraDB (http://www.percona.com) 5.5.36-MariaDB-33.0 started; log sequence number 3781527514 170703 0:45:17 [Note] Plugin 'FEEDBACK' is disabled. 170703 0:45:17 [Note] Server socket created on IP: '0.0.0.0'. 170703 0:45:17 [Note] Event Scheduler: Loaded 0 events 170703 0:45:17 [Note] WSREP: Signalling provider to continue. 170703 0:45:17 [Note] WSREP: SST received: 45649783-5f43-11e7-a935-3e74e94aac23:3524 170703 0:45:17 [Note] WSREP: 1.0 (hh-yun-db-129042.): State transfer from 0.0 (hh-yun-db-129041.) complete. 170703 0:45:17 [Note] WSREP: Shifting JOINER -> JOINED (TO: 3524) 170703 0:45:17 [Note] WSREP: Member 1.0 (hh-yun-db-129042.) synced with group. 170703 0:45:17 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3524) 170703 0:45:17 [Note] WSREP: Synchronized with group, ready for connections 170703 0:45:17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 170703 0:45:17 [Note] /usr/libexec/mysqld: ready for connections. Version: '5.5.36-MariaDB-wsrep' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server, wsrep_25.9.r3961 170703 1:08:22 [Note] WSREP: Node b3573054-5f45-11e7-964d-3a604abdf474 state prim 170703 1:08:22 [Note] WSREP: view(view_id(PRIM,b3573054-5f45-11e7-964d-3a604abdf474,3) memb { b3573054-5f45-11e7-964d-3a604abdf474,0 } joined { } left { } partitioned { 456416f2-5f43-11e7-9f2c-7e2e30ef17dd,0 }) 170703 1:08:22 [Note] WSREP: forgetting 456416f2-5f43-11e7-9f2c-7e2e30ef17dd (tcp://240.10.129.41:4567) 170703 1:08:22 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1 170703 1:08:22 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 0d6d078d-5f49-11e7-bba6-bac92eaecf2e 170703 1:08:22 [Note] WSREP: STATE EXCHANGE: sent state msg: 0d6d078d-5f49-11e7-bba6-bac92eaecf2e 170703 1:08:22 [Note] WSREP: STATE EXCHANGE: got state msg: 0d6d078d-5f49-11e7-bba6-bac92eaecf2e from 0 (hh-yun-db-129042) 170703 1:08:22 [Note] WSREP: Quorum results: version = 3, component = PRIMARY, conf_id = 2, members = 1/1 (joined/total), act_id = 69983, last_appl. = 69915, protocols = 0/5/2 (gcs/repl/appl), group UUID = 45649783-5f43-11e7-a935-3e74e94aac23 170703 1:08:22 [Note] WSREP: Flow-control interval: [16, 16] 170703 1:08:22 [Note] WSREP: New cluster view: global state: 45649783-5f43-11e7-a935-3e74e94aac23:69983, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 2 170703 1:08:22 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 170703 1:08:22 [Note] WSREP: REPL Protocols: 5 (3, 1) 170703 1:08:22 [Note] WSREP: Service thread queue flushed. 170703 1:08:22 [Note] WSREP: Assign initial position for certification: 69983, protocol version: 3 170703 1:08:22 [Note] WSREP: Service thread queue flushed. 170703 1:08:22 [Warning] WSREP: Releasing seqno 69983 before 69984 was assigned. 170703 1:08:27 [Note] WSREP: cleaning up 456416f2-5f43-11e7-9f2c-7e2e30ef17dd (tcp://240.10.129.41:4567) 170703 11:29:43 [Note] /usr/libexec/mysqld: Normal shutdown 170703 11:29:43 [Note] WSREP: Stop replication 170703 11:29:43 [Note] WSREP: Closing send monitor... 170703 11:29:43 [Note] WSREP: Closed send monitor. 170703 11:29:43 [Note] WSREP: gcomm: terminating thread 170703 11:29:43 [Note] WSREP: gcomm: joining thread 170703 11:29:43 [Note] WSREP: gcomm: closing backend 170703 11:29:43 [Note] WSREP: view((empty)) 170703 11:29:43 [Note] WSREP: Received self-leave message. 170703 11:29:43 [Note] WSREP: gcomm: closed 170703 11:29:43 [Note] WSREP: Flow-control interval: [0, 0] 170703 11:29:43 [Note] WSREP: Received SELF-LEAVE. Closing connection. 170703 11:29:43 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 69983) 170703 11:29:43 [Note] WSREP: RECV thread exiting 0: Success 170703 11:29:43 [Note] WSREP: recv_thread() joined. 170703 11:29:43 [Note] WSREP: New cluster view: global state: 45649783-5f43-11e7-a935-3e74e94aac23:69983, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol v ersion 2 170703 11:29:43 [Note] WSREP: Closing replication queue. 170703 11:29:43 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 170703 11:29:43 [Note] WSREP: Closing slave action queue. 170703 11:29:43 [Note] WSREP: applier thread exiting (code:0) 170703 11:29:43 [Note] WSREP: applier thread exiting (code:6) 170703 11:29:43 [Note] WSREP: applier thread exiting (code:6) 170703 11:29:43 [Note] WSREP: applier thread exiting (code:6)

故障解決可嘗試通過下面兩種方法進行

1. 需進行資料同步的事務數量太大, 因此, 需要一個比較大的  innodb_log file, 假如之前的 log file 太小, 是無法正常恢復到叢集同步狀態
2. 在執行叢集恢復期間,  需要使用引數 tc-heuristic-recover=ROLLBACK, 對需要恢復的事務執行 ROOLBACK 操作

操作

修改 /etc/my.cnf
innodb-log-file-size = 2048M
tc-heuristic-recover=ROLLBACK

對於日誌, 需要執行手動把日誌檔案刪除操作, 在啟動伺服器時候, MySQL 會自動重新建立響應的日誌檔案

相關推薦

galera 故障恢復

因機房斷電, mysql 叢集重啟後, mysql.galera 無法完成主從同步 故障日誌如下 170703 00:44:14 mysqld_safe Starting mysqld daemon with databases from /var/l

GreenPlum數據庫故障恢復測試

gpdb mirror segment primary segment master failover 本文介紹gpdb的master故障及恢復測試以及segment故障恢復測試。 環境介紹:Gpdb版本:5.5.0 二進制版本操作系統版本: centos linux 7.0Master

故障恢復+並發控制

image ati -c redo 完成 nbsp 恢復 系統故障 http 事務:用戶定義的一個數據庫操作序列,這些操作要麽全做要麽全不做,是一個不可分割的工作單位。 ACID特性:原子性(Atomicity)、一致性(Consistency)、隔離性(Isolation

服務器數據恢復成功案例(raid5陣列故障恢復方法)

Raid5陣列 行數 vpd 明顯 信息 成功案例 raid 5 磁盤陣列 process 服務器數據恢復故障環境: 需要進行數據恢復的服務器型號為IBM DS5300存儲設備,包含一個存儲機頭和多個存儲擴展櫃,創建了2組RAID 5磁盤陣列。發生故障的陣列組為客戶服務器中

[Spark]-結構化流之監控&故障恢復

完成 位置 bject ESS sta llb att console dir 6 流的監控以及故障恢復   6.1.流的運行時數據    結構化流啟動後返回的 StreamingQuery 對象.         val query = df.writeStream

3,Structured Streaming使用checkpoint進行故障恢復

窗口 學習 com cnn for 實現 發生 nes 事件 使用checkpoint進行故障恢復 如果發生故障或關機,可以恢復之前的查詢的進度和狀態,並從停止的地方繼續執行。這是使用Checkpoint和預寫日誌完成的。您可以使用檢查點位置配置查詢,那麽查詢將將所有進度信

MySQL MGR實現分析 - 成員管理與故障恢復實現

www 進程 mes The rapi ember chang ava 網易雲 此文已由作者溫正湖授權網易雲社區發布。歡迎訪問網易雲社區,了解更多網易技術產品運營經驗。MySQL Group Replication(MGR)框架讓MySQL具備了自動主從切換和故障恢復能力,

keepalived設定master故障恢復後不重新搶回VIP配置

允許組播,兩臺裝置上都需要執行iptables -A INPUT -d 224.0.0.18 -j ACCEPT或修改:vim /etc/sysconfig/iptables適當位置新增行:-A INPUT -d 224.0.0.18 -j ACCEPT 預設情況下keepalived 有一臺主伺

kubeadm HA master叢集master重置故障恢復_Kubernetes中文社群

文章楔子 對於一個具有HA master的叢集來說,發生單點故障通常不會影響叢集的正常執行,只要及時復原單點故障,就可以避免潛在的資料、狀態丟失。本文旨在指導讀者,在kubeadm搭建的HA master叢集中,某一master主機遭遇硬體更換、系統重置、k8s配置重置的情況下,應當如何恢復K

mysql資料庫叢集的故障恢復

問題:伺服器宕機之後資料庫無法啟動(資料庫檔案損壞,非正常重啟導致的檔案損壞) 描述:資料庫是mycat+mysql的讀寫分離叢集 解決方式 尋找問題的過程 伺服器宕機了之後,重啟全部的mysql,mycat,keepalived,haproxy。然後通過虛擬ip訪問

RabbitMQ-模式、叢集、故障恢復

MQ並不是在尋找一個訊息佇列,而是解決解耦問題的方法。本質為:解耦請求和操作; 從同步程式設計模型轉向非同步程式設計模型。 解決Rabbit相關問題 編碼與模式 解耦 分離請求和動作 – kfc訂單和取餐 提供擴充套件性: 不用負載均衡器

RabbitMQ叢集故障恢復

原文 http://www.cnblogs.com/zhjh256/p/6475898.html RabbitMQ的mirror queue(映象佇列)機制是最簡單的佇列HA方案,它通過在cluste

RabbitMQ叢集故障恢復詳解

RabbitMQ的mirror queue(映象佇列)機制是最簡單的佇列HA方案,它通過在cluster的基礎上增加ha-mode、ha-param等policy選項,可以根據 需求將cluster中的佇列映象到多個節點上,從而實現高可用,消除cluster模式中佇列內容單點

Ceph monitor故障恢復探討

1 問題 一般來說,在實際執行中,ceph monitor的個數是2n+1(n>=0)個,在線上至少3個,只要正常的節點數>=n+1,ceph的paxos演算法能保證系統的正常執行。所以,對於3個節點,同時只能掛掉一個。一般來說,同時掛掉2個節點的概率比較小,但是萬一掛掉2個呢? 如果ceph

資料庫故障恢復技術

前言 資料庫故障恢復就是把資料庫從錯誤狀態恢復到某一已知的正確狀態(亦稱為一致狀態或完整狀態)。 一、故障的種類 1、事務故障 - 邏輯故障,例如:除以0; - 餘額不允許為負。 2、系統崩潰故障 - 停電、硬體故障,藍屏宕機故障。 3、磁碟故障

DRBD故障恢復(一):節點無法通訊,節點為StandAlone狀態

操作環境問題描述主副節點啟動後,在主節點檢視drbd狀態如下[[email protected] ~]# drbd-overview NOTE: drbd-overview will be deprecated soon. Please consider usin

WordPress部落格故障恢復

不記得之前是怎麼弄的,當時訪問網站是空白的。後來,我慢慢摸索,現今恢復了網站,今憑回憶復原恢復過程,記錄,以備後用。系統與環境:centos7、已使用xampp for linux、雲伺服器、WordPress4.9.51、把原wp-content資料夾另存;記錄網站資料夾名

DRBD UpToDate/DUnknown 故障恢復

[[email protected] ~]# service drbd reload Reloading DRBD configuration: .     [[email protected] ~]# cat /proc/drbd     version: 8.4.3 (api

xp故障恢復控制檯和它的命令

Bootcfg bootcfg 命令啟動配置和故障恢復(對於大多數計算機,即 boot.ini 檔案)。 含有下列引數的 bootcfg 命令僅在使用故障恢復控制檯時才可用。可在命令提示符下使用帶有不同引數的 bootcfg 命令。 用法: bootcfg /default

ZooKeeper 故障恢復

在介紹ZooKeeper處理節點故障的機制之前,我想先給大家講一講在一切都正常的情況下,ZooKeeper是如何工作的,也就是ZooKeeper處理客戶端請求的流程。當然我只側重講解客服端的寫請求部分。 過程如下: 在這主要的6步過程中,任何與之相關的