postgresql高可用之repmgr元件日常管理命令及注意實現

阿新 • • 發佈：2021-10-31

　　在postgresql的高可用架構中，通常會採用流複製機制實現主備，edb提供了一個性能影響的參考：

　　從上可知，HA模式大約會下降10%-30%左右的效能，remote_write因為僅僅寫入了standby的記憶體，所以效能影響很小。on完全取決於對端磁碟的效能，本質上就是WAL序列寫兩次+一次網路。當前如果主節點沒有跑到瓶頸例如cpu沒跑滿、IO滿了，網路沒有到瓶頸，那麼很可能HA的TPS和單例項的TPS是一樣的，只有網路成為瓶頸時，HA才會是問題，例如跑批需要同步時。

　　筆者環境使用ltbench，可以非常穩定的跑到70000 tps，響應時間在1.5毫秒，採用PMEM儲存，在邁絡思roce下，local、remote_apply、on相差在10%以內，主節點cpu利用率在80%左右。

　　而為了實現自動故障轉移，一般會選擇採用repmgr（當然也有Patroni（用python寫的）和pg_auto_failover），我們推薦repmgr做failover管理，keepalived作為客戶端透明故障轉移。

repmgr命令主要如下：

repmgr primary register— initialise a repmgr installation and register the primary node

repmgr primary unregister— unregister an inactive primary node

repmgr standby clone

— clone a PostgreSQL standby node from another PostgreSQL node

repmgr standby register— add a standby's information to therepmgrmetadata

repmgr standby unregister— remove a standby's information from therepmgrmetadata

repmgr standby promote— promote a standby to a primary

repmgr standby follow— attach a running standby to a new upstream node

repmgr standby switchover— promote a standby to primary and demote the existing primary to a standby

repmgr witness register— add a witness node's information to therepmgrmetadata

repmgr witness unregister— remove a witness node's information to therepmgrmetadata

===========日常管理主要使用node，cluster以及service三類為主。上面三類主要用於初始安裝和維護。

repmgr node status— show overview of a node's basic information and replication status

repmgr node check— performs some health checks on a node from a replication perspective

repmgr node rejoin— rejoin a dormant (stopped) node to the replication cluster

repmgr node service— show or execute the system service command to stop/start/restart/reload/promote a node

repmgr cluster show— display information about each registered node in the replication cluster

repmgr cluster matrix— runs repmgr cluster show on each node and summarizes output

repmgr cluster crosscheck— cross-checks connections between each combination of nodes

repmgr cluster event— output a formatted list of cluster events

repmgr cluster cleanup— purge monitoring history

repmgr service status— display information about the status ofrepmgrdon each node in the cluster

repmgr service pause— Instruct allrepmgrdinstances in the replication cluster to pause failover operations

repmgr service unpause— Instruct allrepmgrdinstances in the replication cluster to resume failover operations

repmgr daemon start— Start therepmgrddaemon on the local node

repmgr daemon stop— Stop therepmgrddaemon on the local node

　　比較重要的是：

檢視叢集節點角色和狀態。（任意節點執行，包含所有節點的資訊）

repmgr cluster show -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
 ID | Name                       | Role    | Status    | Upstream                  | Location | Priority | Timeline | Connection string                                                     
----+----------------------------+---------+-----------+---------------------------+----------+----------+----------+------------------------------------------------------------------------
 1  | 10.19.36.10-defaultcluster | standby |   running | 10.19.36.9-defaultcluster | default  | 100      | 5        | host=10.19.36.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 2  | 10.19.36.9-defaultcluster  | primary | * running |                           | default  | 100      | 5        | host=10.19.36.9 port=5432 user=repmgr dbname=repmgr connect_timeout=2

檢視repmgrd守護程序狀態。（任意節點執行，包含所有節點的資訊）

[lightdb@localhost ~]$ repmgr service status -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
 ID | Name                       | Role    | Status    | Upstream                  | repmgrd | PID     | Paused? | Upstream last seen
----+----------------------------+---------+-----------+---------------------------+---------+---------+---------+--------------------
 1  | 10.19.36.10-defaultcluster | standby |   running | 10.19.36.9-defaultcluster | running | 3324044 | no      | 0 second(s) ago    
 2  | 10.19.36.9-defaultcluster  | primary | * running |                           | running | 4170296 | no      | n/a

-- csv可以生成csv格式，便於自動化分析，如下：

[lightdb@localhost ~]$ repmgr service status -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf --csv
1,10.19.36.10-defaultcluster,standby,1,1,3324044,0,100,1,default
2,10.19.36.9-defaultcluster,primary,1,1,4170296,0,100,-1,default

檢視本節點角色以及狀態、複製延時（基本上status就足夠）（僅包含本節點的資訊）

[lightdb@localhost ~]$ repmgr node status -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
Node "10.19.36.10-defaultcluster":
    LightDB version: 13.3
    Total data size: 9086 MB
    Conninfo: host=10.19.36.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2
    Role: standby
    WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)   # 除非設定為always，否則在WAL replay時總是僅用。
    Archive command: test ! -f /home/lightdb/lightdb-x/13.3-21.2/archive/%f && cp %p /home/lightdb/lightdb-x/13.3-21.2/archive/%f && find /home/lightdb/lightdb-x/13.3-21.2/archive -type f -mmin +10080 | xargs -i rm {}
    WALs pending archiving: 2 pending files
    Replication connections: 0 (of maximal 10)
    Replication slots: 0 physical (of maximal 10; 0 missing)
    Upstream node: 10.19.36.9-defaultcluster (ID: 2)
    Replication lag: 55 seconds     # 第一次不太準，可能會比較高
    Last received LSN: 6/F641CCE8
    Last replayed LSN: 6/F64159D0

　　如果不放心可以查下check，check做了實時檢查。就監控而言，保守起見，兩個都執行（僅包含本節點的資訊）。

[lightdb@localhost ~]$ repmgr node check -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
Node "10.19.36.10-defaultcluster":
    Server role: OK (node is standby)
    Replication lag: OK (0 seconds)
    WAL archiving: OK (2 pending archive ready files)
    Upstream connection: OK (node "10.19.36.10-defaultcluster" (ID: 1) is attached to expected upstream node "10.19.36.9-defaultcluster" (ID: 2))
    Downstream servers: OK (this node has no downstream nodes)
    Replication slots: OK (node has no physical replication slots)
    Missing physical replication slots: OK (node has no missing physical replication slots)
    Configured data directory: OK (configured "data_directory" is "/home/lightdb/data/defaultCluster")

　　檢視叢集節點間連通性（任意節點執行，包含所有節點的資訊）。

[lightdb@localhost ~]$ repmgr cluster crosscheck -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
INFO: connecting to database
 Name                       | ID | 1 | 2
----------------------------+----+---+---
 10.19.36.10-defaultcluster | 1  | * | * 
 10.19.36.9-defaultcluster  | 2  | * | *

監控從節點異常（除非主節點選擇的同步模式是local，如果為on、remote_apply或remote_write，就必須監控從節點，以避免從節點發生異常時，主節點停止工作，從而影響生產）。

============================上述主要是監控，下面來看一下日常管理類。

節點維護重啟。

[lightdb@lightdb1 ~]$ repmgr service pause -f /mnt/pmem1/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
NOTICE: node 1 (10.20.137.41-defaultcluster) paused
NOTICE: node 2 (10.20.137.42-defaultcluster) paused
[lightdb@lightdb1 ~]$ repmgr service status -f /mnt/pmem1/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
 ID | Name                        | Role    | Status    | Upstream                    | repmgrd | PID    | Paused? | Upstream last seen
----+-----------------------------+---------+-----------+-----------------------------+---------+--------+---------+--------------------
 1  | 10.20.137.41-defaultcluster | primary | * running |                             | running | 38834  | yes     | n/a                
 2  | 10.20.137.42-defaultcluster | standby |   running | 10.20.137.41-defaultcluster | running | 185064 | yes     | 1 second(s) ago    

。。。。修改引數，重啟例項。。。。

　　期間，standby的repmgr日誌會一直嘗試重試，如下：

[2021-10-30 18:22:14] [INFO] node "10.20.137.42-defaultcluster" (ID: 2) monitoring upstream node "10.20.137.41-defaultcluster" (ID: 1) in normal state
[2021-10-30 18:22:14] [DETAIL] last monitoring statistics update was 2 seconds ago
[2021-10-30 18:23:12] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:12] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:12] [WARNING] unable to connect to upstream node "10.20.137.41-defaultcluster" (ID: 1)
[2021-10-30 18:23:12] [INFO] checking state of node "10.20.137.41-defaultcluster" (ID: 1), 1 of 3 attempts
[2021-10-30 18:23:12] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=10.20.137.41 port=5432 fallback_application_name=repmgr"
[2021-10-30 18:23:12] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:12] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2021-10-30 18:23:17] [INFO] checking state of node "10.20.137.41-defaultcluster" (ID: 1), 2 of 3 attempts
[2021-10-30 18:23:17] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=10.20.137.41 port=5432 fallback_application_name=repmgr"
[2021-10-30 18:23:17] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:17] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2021-10-30 18:23:22] [INFO] checking state of node "10.20.137.41-defaultcluster" (ID: 1), 3 of 3 attempts
[2021-10-30 18:23:22] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=10.20.137.41 port=5432 fallback_application_name=repmgr"
[2021-10-30 18:23:22] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:22] [WARNING] unable to reconnect to node "10.20.137.41-defaultcluster" (ID: 1) after 3 attempts
[2021-10-30 18:23:22] [NOTICE] repmgrd on this node is paused
[2021-10-30 18:23:22] [DETAIL] no failover will be carried out
[2021-10-30 18:23:22] [HINT] execute "repmgr service unpause" to resume normal failover mode
[2021-10-30 18:23:22] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:22] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:22] [ERROR] unable to execute get_primary_current_lsn()
[2021-10-30 18:23:22] [DETAIL] 
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

[2021-10-30 18:23:22] [WARNING] unable to retrieve primary's current LSN
[2021-10-30 18:23:24] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:24] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:24] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:24] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:26] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:26] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-30 18:23:26] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:26] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-30 18:23:28] [NOTICE] upstream is available but upstream connection has gone away, resetting
[2021-10-30 18:23:28] [NOTICE] reconnected to upstream node "10.20.137.41-defaultcluster" (ID: 1) after 6 seconds, resuming monitoring
[2021-10-30 18:27:16] [INFO] node "10.20.137.42-defaultcluster" (ID: 2) monitoring upstream node "10.20.137.41-defaultcluster" (ID: 1) in normal state
[2021-10-30 18:27:16] [DETAIL] last monitoring statistics update was 2 seconds ago

　　pg日誌如下：

2021-10-30 18:12:30.114559T  @  checkpointer  00000[2021-10-29 20:45:28 CST] 0 [115395] DETAIL:  Last completed transaction was at log time 2021-10-30 18:12:30.084333+08.
2021-10-30 18:14:41.898079T  @  walreceiver  00000[2021-10-30 17:58:15 CST] 0 [144662] LOG:  replication terminated by primary server
2021-10-30 18:14:41.898079T  @  walreceiver  00000[2021-10-30 17:58:15 CST] 0 [144662] DETAIL:  End of WAL reached on timeline 3 at 10/800000A0.
2021-10-30 18:14:41.898109T  @  walreceiver  XX000[2021-10-30 17:58:15 CST] 0 [144662] FATAL:  could not send end-of-streaming message to primary: no COPY in progress
2021-10-30 18:14:41.898250T  @  startup  00000[2021-10-29 20:45:28 CST] 0 [115394] LOG:  invalid record length at 10/800000A0: wanted 24, got 0
2021-10-30 18:14:41.909281T  @  walreceiver  XX000[2021-10-30 18:14:41 CST] 0 [158899] FATAL:  could not connect to the primary server: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
2021-10-30 18:14:46.909030T  @  walreceiver  00000[2021-10-30 18:14:46 CST] 0 [158962] LOG:  started streaming WAL from primary at 10/80000000 on timeline 3
2021-10-30 18:15:30.175149T  @  checkpointer  00000[2021-10-29 20:45:28 CST] 0 [115395] LOG:  restartpoint starting: time

repmgrd守護程序啟動與停止。
主備切換。
故障恢復。主節點修復後，如果能夠正常rejoin回來固然好，更多的時候是rejoin失敗。如下：

[lightdb@hs-10-19-36-9 log]$ repmgr node rejoin -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf -d 'host=10.19.36.10 dbname=repmgr user=repmgr'
ERROR: this node cannot attach to rejoin target node 1
DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 8/C00000A0
HINT: use --force-rewind to execute lt_rewind
[lightdb@hs-10-19-36-9 log]$ repmgr node rejoin -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf -d 'host=10.19.36.10 dbname=repmgr user=repmgr' --force-rewind
NOTICE: lt_rewind execution required for this node to attach to rejoin target node 1
DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 8/C00000A0
NOTICE: executing lt_rewind
DETAIL: lt_rewind command is "/home/lightdb/lightdb-x/13.3-21.2/bin/lt_rewind -D '/home/lightdb/data/defaultCluster' --source-server='host=10.19.36.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2'"
ERROR: lt_rewind execution failed
DETAIL: lt_rewind: servers diverged at WAL location 8/A0000000 on timeline 5
lt_rewind: error: could not open file "/home/lightdb/data/defaultCluster/pg_wal/000000050000000800000004": 沒有那個檔案或目錄
lt_rewind: fatal: could not find previous WAL record at 8/8DF84C18

　　注：補充一下，Pg的timeline，準確的說和oracle的resetlogs是一樣的。相當於重新初始化的例項。wal檔案的前8位就是timeline id。每次發生基於時間點的恢復或pg_rewind、promote，timeline id都會加1，並且會生成一個NewTimelineID.history檔案。所以第一個history檔案是00000002.history。在時間點恢復（引數recovery_target_timeline和recovery_target_lsn控制）的情況下，hisotry檔案會記錄設定的恢復點和實際的恢復點。如下：

cat 00000002.history
1 0/70000D8 after LSN 0/7000060

含義如下：
1<parentTLI>    0/70000D8 <switchpoint>     after LSN 0/7000060<reason>

parentTLI：    ID of the parent timeline
switchpoint：    XLogRecPtr of the WAL location where the switch happened
reason ：    human-readable explanation of why the timeline was changed

所以多次時間點恢復後可能是上面這樣的。

或者如下：

檢視pg_wal，可以發現最舊的檔案已經是從00004開始了。

[lightdb@hs-10-19-36-9 defaultCluster]$ cd /home/lightdb/data/defaultCluster/pg_wal
[lightdb@hs-10-19-36-9 pg_wal]$ ll
總用量 13107224
-rw------- 1 lightdb lightdb       333 10月 29 11:19 000000010000000000000002.00000028.backup
-rw------- 1 lightdb lightdb        42 10月 29 12:13 00000002.history
-rw------- 1 lightdb lightdb        85 10月 29 14:02 00000003.history
-rw------- 1 lightdb lightdb       128 10月 29 15:49 00000004.history
-rw------- 1 lightdb lightdb 536870912 10月 29 23:12 000000050000000800000005
-rw------- 1 lightdb lightdb 536870912 10月 29 23:12 000000050000000800000006
-rw------- 1 lightdb lightdb 536870912 10月 29 22:39 000000050000000800000007
-rw------- 1 lightdb lightdb 536870912 10月 29 22:42 000000050000000900000000
-rw------- 1 lightdb lightdb 536870912 10月 29 22:44 000000050000000900000001
-rw------- 1 lightdb lightdb 536870912 10月 29 22:46 000000050000000900000002
-rw------- 1 lightdb lightdb 536870912 10月 29 22:47 000000050000000900000003
-rw------- 1 lightdb lightdb 536870912 10月 29 22:54 000000050000000900000004
-rw------- 1 lightdb lightdb 536870912 10月 29 22:49 000000050000000900000005
-rw------- 1 lightdb lightdb 536870912 10月 29 22:52 000000050000000900000006
-rw------- 1 lightdb lightdb 536870912 10月 29 23:03 000000050000000900000007
-rw------- 1 lightdb lightdb 536870912 10月 29 22:39 000000050000000A00000000
-rw------- 1 lightdb lightdb 536870912 10月 29 22:39 000000050000000A00000001
-rw------- 1 lightdb lightdb 536870912 10月 29 22:40 000000050000000A00000002
-rw------- 1 lightdb lightdb 536870912 10月 29 22:41 000000050000000A00000003
-rw------- 1 lightdb lightdb 536870912 10月 29 22:43 000000050000000A00000004
-rw------- 1 lightdb lightdb 536870912 10月 29 22:46 000000050000000A00000005
-rw------- 1 lightdb lightdb 536870912 10月 29 22:48 000000050000000A00000006
-rw------- 1 lightdb lightdb 536870912 10月 29 22:51 000000050000000A00000007
-rw------- 1 lightdb lightdb 536870912 10月 29 22:40 000000050000000B00000000
-rw------- 1 lightdb lightdb 536870912 10月 29 22:41 000000050000000B00000001
-rw------- 1 lightdb lightdb 536870912 10月 29 22:43 000000050000000B00000002
-rw------- 1 lightdb lightdb 536870912 10月 29 22:45 000000050000000B00000003
-rw------- 1 lightdb lightdb 536870912 10月 29 22:50 000000050000000B00000004
-rw------- 1 lightdb lightdb 536870912 10月 29 22:53 000000050000000B00000005
-rw------- 1 lightdb lightdb       171 10月 29 16:04 00000005.history
-rw------- 1 lightdb lightdb       214 10月 29 23:12 00000006.history
drwx------ 2 lightdb lightdb       242 10月 29 23:12 archive_status

PS：wal檔案的命名格式是：3部分8位16進位制，從00000001 00000000 00000001開始，如：00000001(timelineid) 0000000C(lsn/(0x100000000/wal_segsz_bytes)) 000000CE(lsn % (0x100000000/wal_segsz_bytes))。所以wal檔名的生成是有體系的，根據wal大小不同，第三部分可能不同。

看下各個history檔案的內容。

[lightdb@hs-10-19-36-9 pg_wal]$ tail -fn 100 00000005.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified

3    4/C00000A0    no recovery target specified

4    4/E00000A0    no recovery target specified
^C
[lightdb@hs-10-19-36-9 pg_wal]$ cat 00000005.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified

3    4/C00000A0    no recovery target specified

4    4/E00000A0    no recovery target specified
[lightdb@hs-10-19-36-9 pg_wal]$ cat 00000004.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified

3    4/C00000A0    no recovery target specified
[lightdb@hs-10-19-36-9 pg_wal]$ cat 00000003.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified
[lightdb@hs-10-19-36-9 pg_wal]$ cat 00000006.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified

3    4/C00000A0    no recovery target specified

4    4/E00000A0    no recovery target specified

5    8/A0000000    no recovery target specified

LightDB Enterprise Postgres--金融級關係型資料庫,更快、更穩、更懂金融！

postgresql高可用之repmgr元件日常管理命令及注意實現

postgresql高可用之repmgr元件日常管理命令及注意實現

老司機帶你玩轉面試（3）：Redis 高可用之主從模式

老司機帶你玩轉面試（4）：Redis 高可用之哨兵模式

MHA高可用之VIP漂移

mysql高可用之MHA--郵件報警

基於Pacemaker的PostgreSQL高可用叢集

Redis 高可用之哨兵模式

Redis學習—高可用之哨兵模式

Redis學習—高可用之Redis-Cluster叢集

HBase服務高可用之路的探索

Redis6.x高可用之Cluster叢集和分片

叢集高可用之lvs+keepalive

Linux高可用之Keepalived

openstack高可用之keepalived使用

RocketMQ原始碼詳解 | Broker篇 · 其五：高可用之主從架構

高可用之keepalived

Redis 學習筆記（五）高可用之主從模式

記一次部署系列：Mysql高可用之MHA

redis高可用之哨兵模式

六、prometheus高可用之thanos

postgresql高可用之repmgr元件日常管理命令及注意實現

相關推薦