postgresql 高可用 repmgr 的使用之五 1 Primary + 1 Standby 的 manual failover,node rejoin

阿新 • • 發佈：2018-12-11

os：ubunbu 16.04 postgresql：9.6.8 repmgr：4.1.1

192.168.56.101 node1 192.168.56.102 node2

操作前/etc/repmgr.conf 的內容

node1 節點上的檔案內容，node2 節點上類似

$ cat /etc/repmgr.conf 

node_id=1
node_name=node1
conninfo='host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/9.6/main'
use_replication_slots=true
pg_bindir='/usr/lib/postgresql/9.6/bin'
service_start_command   = 'sudo pg_ctlcluster 9.6 main start'
service_stop_command    = 'sudo pg_ctlcluster 9.6 main stop'
service_restart_command = 'sudo pg_ctlcluster 9.6 main restart'
service_reload_command  = 'sudo pg_ctlcluster 9.6 main reload' 
service_promote_command  = 'sudo pg_ctlcluster 9.6 main promote'

手動關閉主庫模擬異常

node1 節點上操作

$ pg_ctl -D /var/lib/postgresql/9.6/main -m fast stop
或者
$ sudo pg_ctlcluster 9.6 main stop

$ repmgr -f /etc/repmgr.conf cluster show
ERROR: connection to database failed:
  could not connect to server: Connection refused
	Is the server running on host "192.168.56.101" and accepting
	TCP/IP connections on port 5432?

DETAIL: attempted to connect using:
  user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.101 fallback_application_name=repmgr

node2 節點上操作

$ repmgr -f /etc/repmgr.conf cluster show

 ID | Name  | Role    | Status        | Upstream | Location | Connection string                                              
----+-------+---------+---------------+----------+----------+-----------------------------------------------------------------
 1  | node1 | primary | ? unreachable |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running     | node1    | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - when attempting to connect to node "node1" (ID: 1), following error encountered :
"could not connect to server: Connection refused
	Is the server running on host "192.168.56.101" and accepting
	TCP/IP connections on port 5432?"
  - node "node1" (ID: 1) is registered as an active primary but is unreachable

可以看出 node1 的 Status 顯示 unreachable

從庫提升為主庫

現在node1節點的postgresql已經不可用了(手動關閉、程序異常終止、宕機)，需要提升node2上的standby 為 master。 node2 節點上操作

$ repmgr -f /etc/repmgr.conf standby promote

NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using "sudo pg_ctlcluster 9.6 main promote"
DETAIL: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary

node2 上再次檢視

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Connection string                                              
----+-------+---------+-----------+----------+----------+-----------------------------------------------------------------
 1  | node1 | primary | - failed  |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - when attempting to connect to node "node1" (ID: 1), following error encountered :
"could not connect to server: Connection refused
	Is the server running on host "192.168.56.101" and accepting
	TCP/IP connections on port 5432?"

node1 節點變為新的slave

node1 節點上操作,啟動postgresql

# /etc/init.d/postgresql start
$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status               | Upstream | Location | Connection string                                              
----+-------+---------+----------------------+----------+----------+-----------------------------------------------------------------
 1  | node1 | primary | * running            |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby | ! running as primary | node1    | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - node "node2" (ID: 2) is registered as standby but running as primary

node2 節點上操作

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Connection string                                            
----+-------+---------+-----------+----------+----------+-----------------------------------------------------------------
 1  | node1 | primary | ! running |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - node "node1" (ID: 1) is running but the repmgr node record is inactive

問題來了，node1、node2檢視狀態時都有 WARNING 了，接下來需要為node1 的 postgresql 設定新的 master。

node 1 節點上關閉 postgresql

$ sudo pg_ctlcluster 9.6 main stop

使用 repmgr node rejoin 新增到叢集裡，選項可以使用的是 pg_rewind。 (This can optionally use pg_rewind to re-integrate a node which has diverged from the rest of the cluster, typically a failed primary.)

$ repmgr -f /etc/repmgr.conf node rejoin -d 'host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --dry-run --verbose

NOTICE: using provided configuration file "/etc/repmgr.conf"
INFO: prerequisites for using pg_rewind are met
INFO: 0 files would have been copied to "/tmp/repmgr-config-archive-pgsql96"
INFO: temporary archive directory "/tmp/repmgr-config-archive-pgsql96" deleted
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
  /usr/lib/postgresql/9.6/bin/pg_rewind -D '/var/lib/postgresql/9.6/main' --source-server='host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met

$ repmgr -f /etc/repmgr.conf node rejoin -d 'host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose

NOTICE: using provided configuration file "/etc/repmgr.conf"
INFO: prerequisites for using pg_rewind are met
INFO: 0 files copied to "/tmp/repmgr-config-archive-pgsql96"
NOTICE: executing pg_rewind
NOTICE: 0 files copied to /var/lib/postgresql/9.6/main
INFO: directory "/tmp/repmgr-config-archive-pgsql96" deleted
INFO: deleting "recovery.done"
NOTICE: setting node 1's primary to node 2
NOTICE: starting server using "sudo pg_ctlcluster 9.6 main start"
INFO: demoted primary is pingable
INFO: node 1 has attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2

符合預期。

postgresql 高可用 repmgr 的使用之五 1 Primary + 1 Standby 的 manual failover,node rejoin

os：ubunbu 16.04 postgresql：9.6.8 repmgr：4.1.1 192.168.56.101 node1 192.168.56.102 node2 操作前/etc/repmgr.conf 的內容 node1 節點上的檔案內容，node

postgresql 高可用 repmgr 的使用之三 1 Primary + 1 Standby 安裝

os：ubunbu 16.04 postgresql：9.6.8 repmgr：4.1.1 192.168.56.101 node1 192.168.56.102 node2 安裝 postgresql 9.6 軟體 node1、node2都需要使用 apt i

postgresql 高可用 etcd + patroni 之五 command

os: centos 7.4 postgresql: 9.6.9 etcd: 3.2.18 patroni: 1.4.4 記錄一下 etcd、patroni的一些命令，當個手記。 etcd 的一些操作命令 # systemctl status etcd.serv

kubernetes 1.8 高可用安裝（五）

k8s 1.8 calico 網絡5安裝網絡組件calico安裝前需要確認kubelet配置是否已經增加--network-plugin=cni如果沒有配置就加到kubelet配置文件裏Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-

MySQL高可用架構之MHA

mysql1、關於MHAMHA(Master HA)是一款開源的MySQL的高可用程序，它為MySQL主從復制架構提供了automating master failover功能。MHA在監控到master節點故障時，會提升其中擁有的最新數據的slave節點成為新的master節點，在此期間，MHA會通過其它從

京東618：商城交易平臺的高可用架構之路

資源系統定位問題修復 tle 峰值網絡寫入差異據騰訊科技報道，6月18日零點，京東全民年中購物節拉開了高潮的序幕。第一個小時的銷售額超過去年同期的250%。從淩晨開始的海量訂單讓6月1日就拉開序幕的京東年中購物節奏出最強音，大量用戶瞬間湧入，峰值訂單被不斷刷新

mysql實現高可用架構之MHA

行數據 reading glob restart 比較實驗是否其余 one 一、簡介　　MHA（Master HA）是一款開源的 MySQL 的高可用程序，它為 MySQL 主從復制架構提供了 automating master failover 功能。MHA 在監

實現redis高可用主從之sentinel

redis sentinel redis主從 redis高可用 sentinel作用監控（Monitoring）： Sentinel 會不斷地檢查你的主服務器和從服務器是否運作正常。提醒（Notification）：當被監控的某個 Redis 服務器出現問題時， Sentinel 可以

postgresql高可用集群安裝

postgresql一、hosts and topology structure of pg cluster 1.host infos cluster01_node01 192.168.0.108cluster01_node02 192.168.0.109cluster02_node03 192.168.0.

高並發與高可用實戰之基礎知識大型網站架構特征（一）

電商系統保障系統 iptables ID 失敗重試容量設計原則服務調用冪等大型網站架構特征： 1.高並發？（用戶訪問量比較大）解決方案:拆分系統、服務化、消息中間件、緩存、並發化高並發設計原則系統設計不僅需要考慮實現業務功能，還要保證系統高並發、高

MySQL高可用架構之MySQL5.7.19 PXC

mysql高可用 5.7 sta var show mysql clu -- ike mysql> show global status like ‘wsrep_cluster_size‘;+--------------------+-------+| Variabl

互聯網金融MySQL高可用架構之-MHA故障切換

文件 ads erro osi ddr app1 bind enabled ive 互聯網金融MySQL高可用架構之-MHA 在線平滑切換過程 --切換命令如下： [root@MHA bin]# masterha_master_switch --conf=/etc/app1

Redis（九）高可用專欄之Sentinel模式

置疑登錄 ann 過期鍵 proto handle cli 也有取整本文講述Redis高可用方案中的哨兵模式——Sentinel，RedisClient中的Jedis如何使用以及使用原理。 Redis主從復制 Redis Sentinel模式 Jedis中的Sent

MySQL高可用架構之基於MHA的搭建

MySQL高可用架構之基於MHA的搭建一、MySQL MHA架構介紹： MHA（Master High Availability）目前在MySQL高可用方面是一個相對成熟的解決方案，它由日本DeNA公司youshimaton（現就職於Facebook公司）開發，是一套

Redis 高可用特性之 “持久化” 詳解

轉載自 Redis 高可用特性之 “持久化” 詳解在之前的文章中，介紹了《Redis的記憶體模型》，從這篇文章開始，將依次介紹 Redis 高可用相關的知識——持久化、複製(及讀寫分離)、哨兵、以及叢集。本文將先說明上述幾種技術分別解決了 Redis 高可用的什麼問

MySQL MHA高可用方案【五、故障切換】

5.1 故障模擬 01：在db01（Master）伺服器上檢視主從複製及mha是否正常 02：停止db01（Master）伺服器上的mysql服務 02：在db04上檢查MHA的日誌（/var/log/mha/app/app1/manager.log） 03：檢視VIP是否飄移到新Ma

MySQL高可用方案之DRBD+MySQL+RHCS（下）

續：MySQL高可用方案之DRBD+MySQL+RHCS（上）五、MySQL5.6.42安裝安裝步驟（兩臺機器都要安裝） [[email protected] ~]# cd /opt/ [[email protected] opt]# ls mysql-5.6.42-linux

MySQL高可用方案之DRBD+MySQL+RHCS（上）

MySQL作為最流行的資料庫，它的高可用方案也是多種多樣，其中用的比較多的是MHA+增強版半同步。但是客戶使用的是DRBD+RHCS的方案，通過各方尋找安裝資料，最終形成一個完整的安裝文件，供參考一、DRBD介紹 1.1 DRBD基本功能 Distributed Replicated Block De

高可用叢集之keepalived+lvs實戰-技術流ken

1.keepalived簡介 lvs在我之前的部落格《高負載叢集實戰之lvs負載均衡-技術流ken》中已經進行了詳細的介紹和應用，在這裡就不再贅述。這篇博文將把lvs與keepalived相結合使用，在實際工作中搭建高可用，高負載，高效能的伺服器叢集。 “Keepalived的作用是檢測伺服器的狀態，如果有

Redis學習筆記(7)-redis一主多從搭建高可用環境之簡單版

在redis的主目錄下面建立三個資料夾 [[email protected] redis-4.0.11]# pwd /root/redis/redis-4.0.11 [[email protected] redis-4.0.11]# mkdi

postgresql 高可用 repmgr 的使用之五 1 Primary + 1 Standby 的 manual failover,node rejoin

操作前/etc/repmgr.conf 的內容

手動關閉主庫模擬異常

從庫提升為主庫

node1 節點變為新的slave

相關推薦