1. 程式人生 > 其它 >部署PostgreSQL 12流複製叢集(with Repmgr)

部署PostgreSQL 12流複製叢集(with Repmgr)

技術標籤:PostgreSQLLinux

目錄

本文主要講解如何構建一個基於主備架構(Master/Standby)的一主多從PostgreSQL Server流複製叢集。

1.概述

1.1 流複製和熱備

在構建PostgreSQL Server叢集的過程中,由於底層複製技術採用了PostgreSQL提供的流複製(Streaming Replication)技術,我們構造的Standby Server具有HOT Standby的相關特性,可適用於需要“讀寫分離”的應用場景。

關於Hot Standby,PostgreSQL官方有如下說明:

Hot Standby is the term used to describe the ability to connect to the server and run read-only queries while the server is in archive recovery or standby mode. This is useful both for replication purposes and for restoring a backup to a desired state with great precision. The term Hot Standby also refers to the ability of the server to move from recovery through to normal operation while users continue running queries and/or keep their connections open.

Running queries in hot standby mode is similar to normal query operation, though there are several usage and administrative differences.

1.2 Repmgr

本文中,我們利用Repmgr來管理PostgreSQL的流複製叢集,並提供HA功能。

**Repmgr是2010年由2ndQuadrant推出的PostgreSQL故障切換最流行的工具。**它是一個開源工具套件,用於管理PostgreSQL伺服器叢集中的複製和故障轉移。它使用擴充套件來增強PostgreSQL的內建熱備份功能,以設定備用伺服器,監控複製以及執行管理任務,例如故障轉移或手動切換操作。

repmgr幫助DBA和系統管理員管理PostgreSQL資料庫叢集。通過利用PostgreSQL 9中引入的Hot Standby功能,repmgr極大地簡化了設定和管理具有高可用性和可伸縮性要求的資料庫的過程。

Repmgr通過以下方式簡化了管理和日常管理,提高了生產力並降低了PostgreSQL叢集的總體成本:

  • 監視複製過程;
  • 允許DBA釋出高可用性操作,例如切換和故障切換。

2.基礎環境

本例的環境中有4個伺服器(以下簡稱節點),其相關資訊如下表所示:

主機名IP地址作業系統軟體包規劃叢集角色規劃
repmgr01192.168.29.191/24CentOS 7.8 64-bitpostgresql12+repmgr_12master(初始角色)
repmgr02192.168.29.192/24CentOS 7.8 64-bitpostgresql12+repmgr_12standby1(初始角色)
repmgr03192.168.29.193/24CentOS 7.8 64-bitpostgresql12+repmgr_12standby2(初始角色)
repmgr03192.168.29.194/24CentOS 7.8 64-bitpostgresql12+repmgr_12witness

3.基礎配置

3.1 系統環境

  1. 以root使用者身份在所有節點上執行如下操作,配置伺服器時區、語言環境,並根據個人使用習慣選擇是否配置系統別名:

    echo "export TZ='Asia/Shanghai'" >> /etc/profile
    echo "export LANG=en_US.utf8" >> /etc/profile
    echo "alias df='df -hTP'" >> /etc/profile
    echo "alias ll='ls -l --color=auto '" >> /etc/profile
    source /etc/profile
    
  2. 以root使用者身份在所有節點上執行如下操作,關閉服務系統SELinux功能:

    cat << EOF > /etc/selinux/config 
    SELINUX=disabled
    SELINUXTYPE=targeted
    EOF
    setenforce 0
    
  3. 以root使用者身份在所有節點上執行如下操作,關閉系統防火牆:

    systemctl disable firewalld.service && systemctl stop firewalld.service
    

    生產環境根據實際需要決定是否關閉。若啟用防火牆,需要開放相關埠,如54321

  4. 以root使用者身份在所有節點上執行如下操作,配置本地主機名解析:

    cat << EOF >> /etc/hosts
    192.168.29.191  repmgr01
    192.168.29.192  repmgr02
    192.168.29.193  repmgr03
    192.168.29.194  repmgr04
    EOF
    
  5. 為加快軟體包下載速度,以root使用者身份在所有節點上執行如下操作,配置系統使用阿里雲CentOS軟體倉庫:

    mkdir -p /etc/yum.repos.d/repo_backup && mv /etc/yum.repos.d/*.repo /etc/yum.repos.d/repo_backup
    wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
    yum clean all
    yum makecache fast
    
  6. 以root使用者身份在所有節點上執行如下操作,安裝NTP客戶端軟體包,並配置系統時鐘與阿里雲時鐘伺服器進行時鐘同步,實現叢集時鐘的一致性:

    yum install -y ntpdate createrepo sshpass
    ntpdate ntp1.aliyun.com
    echo "ntpdate ntp1.aliyun.com" >> /etc/rc.local
    chmod +x /etc/rc.local /etc/rc.d/rc.local
    

    生產環境中,建議將叢集其中一個節點配置為NTP伺服器,其餘節點向該節點進行同步。具體配置請參考相關手冊說明。

  7. 為方便操作,我們配置了所有節點的root使用者之間SSH互信。此操作是可選操作:

    ssh-keygen -t rsa -P '' -f '/root/.ssh/id_rsa'
    for h in `grep repmgr /etc/hosts |awk '{print $2}'`; do sshpass -p r00tr00t ssh-copy-id -o StrictHostKeyChecking=no [email protected]$h; done
    for h in `grep repmgr /etc/hosts |awk '{print $2}'`; do ssh [email protected]$h date; done
    
  8. 以root使用者身份在所有節點上執行如下操作,建立postgres使用者用來執行pg和repmgr程序,併為postgres使用者配置ssh兩兩互信:

    groupadd -g 5432 postgres
    useradd -u 5432 -g postgres postgres
    echo postgres |passwd --force --stdin postgres
    su - postgres
    ssh-keygen -t rsa -P '' -f '/home/postgres/.ssh/id_rsa'
    for h in `grep repmgr /etc/hosts |awk '{print $2}'`; do sshpass -p postgres ssh-copy-id -o StrictHostKeyChecking=no [email protected]$h; done
    for h in `grep repmgr /etc/hosts |awk '{print $2}'`; do ssh [email protected]$h date; done
    exit
    
  9. 以root使用者身份在所有節點上執行如下操作,配置使用者的Shell資源限制:

    sed -i "/^postgres/d" /etc/security/limits.conf && \
    sed -i "/^postgres/d" /etc/security/limits.d/20-nproc.conf && \
    cat << EOF >> /etc/security/limits.conf
    #### Added by root:
    *        soft    core      unlimited
    @root    hard    core      unlimited
    @root    soft    nproc     unlimited
    @root    hard    nproc     unlimited
    @root    soft    nofile    300000
    @root    hard    nofile    300000
    #### Added by root:
    @postgres    hard     nofile   65536
    @postgres    soft     nofile   65536
    @postgres    hard     nproc    65536
    @postgres    soft     nproc    65536
    @postgres    soft     core     unlimited
    @postgres    hard     core     unlimited
    EOF
    echo "@postgres    soft    nproc    65536" >> /etc/security/limits.d/20-nproc.conf
    

上述系統核心引數僅為經驗值,並非針對所有系統適用。需要酌情配置和使用。

  1. 以root使用者身份在所有節點上執行如下操作,優化系統核心引數:

    cat << EOF >> /etc/sysctl.conf
    kernel.sem = 5010 641280 5010 256
    #kernel.sem = 50100 64128000 50100 1280
    fs.aio-max-nr = 1048576
    fs.file-max = 6815744
    kernel.shmall = 2097152
    kernel.shmmax = 4294967295
    kernel.shmmni = 4096
    net.ipv4.ip_local_port_range = 9000 65500
    net.core.rmem_default = 262144
    net.ipv4.tcp_syncookies = 1
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_tw_recycle = 1
    net.ipv4.tcp_tw_timestamps=1
    net.ipv4.tcp_fin_timeout = 30
    net.core.rmem_max = 4194304
    net.core.wmem_default = 262144
    net.core.wmem_max = 1048576
    net.core.somaxconn=1024
    vm.swappiness=0
    vm.overcommit_memory = 2
    vm.overcommit_ratio = 90        #mem/(mem+swap)
    vm.dirty_background_ratio=1     #DirtyPageTotal/Memory
    vm.dirty_ratio = 2              #WriteCache/Memory
    net.ipv4.tcp_keepalive_time = 1200
    net.ipv4.tcp_keepalive_probes = 3
    net.ipv4.tcp_keepalive_intvl = 30
    net.ipv4.tcp_max_syn_backlog = 8192
    net.ipv4.tcp_max_tw_buckets = 6000
    net.core.netdev_max_backlog = 32768
    net.core.wmem_default = 8388608
    net.core.rmem_default = 8388608
    net.core.rmem_max = 16777216
    net.ipv4.tcp_synack_retries = 2
    net.ipv4.tcp_syn_retries = 2
    net.ipv4.route.gc_timeout = 100
    net.ipv4.tcp_wmem = 8192 436600 873200
    net.ipv4.tcp_rmem  = 32768 436600 873200
    net.ipv4.tcp_mem = 94500000 91500000 92700000
    net.ipv4.tcp_max_orphans = 3276800
    EOF
    
    /sbin/sysctl -p
    systemctl set-property crond.service TasksMax=65535
    

    上述系統核心引數僅為經驗值,並非針對所有系統適用。需要酌情配置和使用。

  2. 以root使用者身份在所有節點上執行如下操作,配置資料磁碟的IO排程策略:

    echo deadline > /sys/block/sda/queue/scheduler && \
    cat << EOF >> /etc/rc.d/rc.local
    echo deadline > /sys/block/sda/queue/scheduler
    #echo deadline > /sys/block/sdb/queue/scheduler
    EOF
    

    本例中,我們採用的是虛擬機器,僅一塊磁碟(sda)。實際環境請根據情況進行調整。

  3. 以root使用者身份在所有節點上執行如下操作,配置系統,防止某些情況下系統自動刪除使用者IPC資源,導致服務不可用:

    sed -i "/RemoveIPC=/d" /etc/systemd/logind.conf
    echo "RemoveIPC=no" >> /etc/systemd/logind.conf
    systemctl daemon-reload
    systemctl restart systemd-logind.service
    

2.軟體安裝

本例中我們從線上YUM源倉庫安裝所需要的軟體包,包括PostgreSQL Server相關軟體包和Repmgr相關軟體包:

  1. 以root使用者身份在所有節點上執行如下操作,新增YUM倉庫:

    cat << EOF > /etc/yum.repos.d/c7-devtoolset-7-x86_64.repo
    [c7-devtoolset-7]
    name=c7-devtoolset-7
    baseurl=https://buildlogs.centos.org/c7-devtoolset-7.x86_64/
    gpgcheck=0
    enabled=1
    [c7-llvm-toolset-7]
    name=c7-llvm-toolset-7
    baseurl=https://buildlogs.centos.org/c7-llvm-toolset-7.x86_64/
    gpgcheck=0
    enabled=1
    EOF
    
    cat << EOF > /etc/yum.repos.d/fedoraproject-epel-7.repo
    [fedoraproject-epel-7]
    name=fedoraproject-epel-7
    baseurl=https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/
    gpgcheck=0
    enabled=1
    EOF
    
    rpm -ivh https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
    
  2. 以root使用者身份在所有節點上執行如下操作,安裝軟體包:

    yum install -y postgresql12-server postgresql12-contrib postgresql12-test postgresql12-devel repmgr_12 repmgr_12-devel
    rpm -qa |grep -E '(postgres|repmgr)'
    

3.叢集配置及初始化

為方便操作,我們在所有節點上執行如下操作,配置postgres使用者環境變數:

su - postgres
cat << EOF >> /home/postgres/.bashrc
# Added by root for postgres user:
export PGHOME=/usr/pgsql-12
export PGDATA=/data/pgdata/12
export PATH=\$PGHOME/bin:\$PATH
alias cdhome="cd \$PGHOME"
alias cddata="cd \$PGDATA"
#alias tailf_log="tailf -n40 \$(ls -ltr \$PGDATA/log/*.log |tail -n1 |awk '{print \$NF}')"
EOF

Repmgr軟體執行時通過postgres使用者來執行,但我們安裝後的Repmgr軟體包配置檔案預設歸屬root使用者所有,postgres使用者預設只能讀取該檔案,因此為方便後續編輯配置檔案,建議修改repmgr配置檔案屬主為postgres。以root使用者身份在在所有節點上執行如下操作:

chown -R postgres.postgres /etc/repmgr

3.1 初始化流複製叢集

3.1.1 初始化Master節點

使用root使用者身份,在repmgr01節點上初始化資料庫例項。該資料庫將作為流複製叢集的源資料庫,後續克隆至其他Standby節點上。在repmgr01節點上:

mkdir -p /data/pgdata/12
chmod 700 /data/pgdata/12
chown -R postgres:postgres /data
su - postgres
echo "postgres" > ~/.postgres_passwd
chmod 600 ~/.postgres_passwd
initdb -U postgres -E UTF8 --locale=en_US.utf8 -D $PGDATA --wal-segsize=16 --data-checksums --auth=md5 --pwfile=/home/postgres/.postgres_passwd
rm -f ~/.postgres_passwd

修改postgresql.conf和pg_hba.conf配置檔案:

cat << EOF >> $PGDATA/postgresql.conf
# Added by root user:
listen_addresses = '*'
max_connections = 400
max_wal_senders = 10
max_replication_slots = 10
wal_level = 'replica'
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
wal_keep_segments = 64
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9
track_activity_query_size = 4096
EOF
sed -i "/^shared_buffer/s/128/256/g" $PGDATA/postgresql.conf
sed -i "/^max_wal_size/s/1GB/2GB/g" $PGDATA/postgresql.conf
sed -i "/^min_wal_size/s/80MB/512MB/g" $PGDATA/postgresql.conf

echo "host    all             all         0.0.0.0/0               md5" >> $PGDATA/pg_hba.conf

為方便操作,配置幾個系統別名:

cat << EOF >> /home/postgres/.bashrc
# ALIAS FOR master & standby DB:
alias start_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log start"
alias stop_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log -m fast stop"
alias restart_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log -m fast restart"
alias reload_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log reload"
alias show_repmgr_cluster="repmgr cluster show --compact"
EOF
source /home/postgres/.bashrc

啟動Master節點的PostgreSQL Server例項:

start_pgsql

Repmgr使用一個獨立的資料庫(repmgr)來儲存其元資料,通常建議對流複製配置一個專用的複製使用者,如repmgr使用者。使用者名稱和資料庫名稱可以是任何名稱,為簡單起見,我們統一配置為“ repmgr”。同時,repmgr使用者將被建立為超級使用者。這是2ndQuadrant的推薦做法,因為某些Repmgr操作需要一些更高的特權。

這裡,我們僅需要在主節點上執行以下命令即可。在Master節點(repmgr01)上,用postgres使用者操作:

  • 新增pg_stat_statements擴充套件外掛:

    su - postgres
    psql -Upostgres -dtemplate1 -h127.0.0.1 -c "CREATE EXTENSION pg_stat_statements;"
    psql -Upostgres -dpostgres -h127.0.0.1 -c "CREATE EXTENSION pg_stat_statements;"
    
  • 配置postgresql.conf檔案,啟用同步複製引數:

    cat << EOF >> $PGDATA/postgresql.conf
    synchronous_commit = remote_apply
    synchronous_standby_names = '1(repmgr01,repmgr02,repmgr03)'
    EOF
    

    後面我們將通過Clone方式從主庫將資料庫完整克隆到備庫,因此Standby節點上的配置檔案中primary_slot_name這個引數需要後續手動修改。

  • 建立repmgr使用者、repmgr元資料庫、修改repmgr使用者search_path引數:

    psql -Upostgres -dpostgres -h127.0.0.1 -c "CREATE USER repmgr WITH SUPERUSER CREATEDB CREATEROLE INHERIT LOGIN REPLICATION ENCRYPTED PASSWORD 'repmgr';"
    psql -Upostgres -dpostgres -h127.0.0.1 -c "CREATE DATABASE repmgr WITH OWNER repmgr TEMPLATE template1;"
    psql -Upostgres -dpostgres -h127.0.0.1 -c "ALTER USER repmgr SET search_path TO repmgr, public;"
    echo "shared_preload_libraries = 'repmgr, pg_stat_statements'" >> $PGDATA/postgresql.conf
    

3.2 準備Standby節點

以root使用者身份登入系統,在所有Standby節點(本例為:repmgr02, repmgr03)上,準備資料目錄,以及建立系統別名:

mkdir -p /data/pgdata/12
chmod 700 /data/pgdata/12
chown -R postgres:postgres /data

cat << EOF >> /home/postgres/.bashrc
# ALIAS FOR master & standby DB:
alias start_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log start"
alias stop_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log -m fast stop"
alias restart_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log -m fast restart"
alias reload_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log reload"
alias show_repmgr_cluster="repmgr cluster show --compact"
EOF

3.3 配置Repmgr

針對從PGDG軟體倉庫中安裝的PostgreSQL 12以及Repmgr軟體,Repmgr的預設配置檔案的路徑是/etc/repmgr/12,配置檔案為repmgr.conf

我們需要在所有主節點和備用節點中配置Repmgr軟體引數。本例中,這裡配置的是使用Repmgr來管理流複製的最低要求引數。我們需要為每臺機器分配唯一的節點ID併為其命名。名稱可以是任意的,但是建議將其與主機名保持相同,以方便區分。

我們還需要為每個節點指定一個連線字串,並指定PostgreSQL資料目錄的位置。

  1. 以root使用者身份登入repmgr01節點,執行如下操作:

    配置/etc/repmgr/12/repmgr.conf檔案:

    cat << EOF >> /etc/repmgr/12/repmgr.conf
    # Added by root user:
    node_id=1
    node_name='repmgr01'
    conninfo='host=192.168.29.191 port=5432 user=repmgr dbname=repmgr password=repmgr connect_timeout=3 application_name=repmgr01 keepalives=1 keepalives_idle=6 keepalives_interval=3 keepalives_count=3'
    data_directory='/data/pgdata/12'
    location='my-dc1'
    EOF
    

    建立密碼檔案.pgpass,避免在配置檔案中直接暴露密碼:

    cat << EOF > /home/postgres/.pgpass
    # hostname:port:database:username:password
    # For postgres superuser:
    127.0.0.1:5432:postgres:postgres:postgres
    localhost:5432:postgres:postgres:postgres
    repmgr01:5432:postgres:postgres:postgres
    192.168.29.191:postgres:postgres:postgres
    repmgr02:5432:postgres:postgres:postgres
    192.168.29.192:5432:postgres:postgres:postgres
    repmgr03:5432:postgres:postgres:postgres
    192.168.29.193:5432:postgres:postgres:postgres
    # For repmgr user:
    127.0.0.1:5432:repmgr:repmgr:repmgr
    localhost:5432:repmgr:repmgr:repmgr
    repmgr01:5432:repmgr:repmgr:repmgr
    192.168.29.191:5432:repmgr:repmgr:repmgr
    repmgr02:5432:repmgr:repmgr:repmgr
    192.168.29.192:5432:repmgr:repmgr:repmgr
    repmgr03:5432:repmgr:repmgr:repmgr
    192.168.29.193:5432:repmgr:repmgr:repmgr
    # For stream replication:
    repmgr01:5432:replication:repmgr:repmgr
    192.168.29.191:5432:replication:repmgr:repmgr
    repmgr02:5432:replication:repmgr:repmgr
    192.168.29.192:5432:replication:repmgr:repmgr
    repmgr03:5432:replication:repmgr:repmgr
    192.168.29.193:5432:replication:repmgr:repmgr
    EOF
    
    chown postgres:postgres /home/postgres/.pgpass
    chmod 600 /home/postgres/.pgpass
    

    修改/data/pgdata/12/pg_hba.conf配置檔案,允許來自repmgr使用者的連線:

    sed -i "/  replication /d" /data/pgdata/12/pg_hba.conf
    cat << EOF >> /data/pgdata/12/pg_hba.conf
    # Added by root user:
    local   replication     repmgr                              md5
    host    replication     repmgr      127.0.0.1/32            md5
    host    replication     repmgr      192.168.29.0/24         md5
    local   repmgr          repmgr                              md5
    host    repmgr          repmgr      127.0.0.1/32            md5
    host    repmgr          repmgr      192.168.29.0/24         md5
    EOF
    

    重啟Master節點的PostgreSQL Server資料庫例項:

    su - postgres
    restart_pgsql
    
  2. 以root使用者身份登入repmgr02節點,執行如下操作:

    配置/etc/repmgr/12/repmgr.conf檔案:

    cat << EOF >> /etc/repmgr/12/repmgr.conf
    # Added by root user:
    node_id=2
    node_name='repmgr02'
    conninfo='host=192.168.29.192 port=5432 user=repmgr dbname=repmgr password=repmgr connect_timeout=3 application_name=repmgr02 keepalives=1 keepalives_idle=6 keepalives_interval=3 keepalives_count=3'
    data_directory='/data/pgdata/12'
    location='my-dc1'
    EOF
    

    建立密碼檔案.pgpass,避免在配置檔案中直接暴露密碼:

    cat << EOF > /home/postgres/.pgpass
    #LINE-FORMAT: hostname:port:database:username:password
    # For postgres superuser:
    127.0.0.1:5432:postgres:postgres:postgres
    localhost:5432:postgres:postgres:postgres
    repmgr01:5432:postgres:postgres:postgres
    192.168.29.191:postgres:postgres:postgres
    repmgr02:5432:postgres:postgres:postgres
    192.168.29.192:5432:postgres:postgres:postgres
    repmgr03:5432:postgres:postgres:postgres
    192.168.29.193:5432:postgres:postgres:postgres
    # For repmgr user:
    127.0.0.1:5432:repmgr:repmgr:repmgr
    localhost:5432:repmgr:repmgr:repmgr
    repmgr01:5432:repmgr:repmgr:repmgr
    192.168.29.191:5432:repmgr:repmgr:repmgr
    repmgr02:5432:repmgr:repmgr:repmgr
    192.168.29.192:5432:repmgr:repmgr:repmgr
    repmgr03:5432:repmgr:repmgr:repmgr
    192.168.29.193:5432:repmgr:repmgr:repmgr
    # For stream replication:
    repmgr01:5432:replication:repmgr:repmgr
    192.168.29.191:5432:replication:repmgr:repmgr
    repmgr02:5432:replication:repmgr:repmgr
    192.168.29.192:5432:replication:repmgr:repmgr
    repmgr03:5432:replication:repmgr:repmgr
    192.168.29.193:5432:replication:repmgr:repmgr
    EOF
    chown postgres:postgres /home/postgres/.pgpass
    chmod 600 /home/postgres/.pgpass
    
  3. 以root使用者身份登入repmgr03節點,執行如下操作:

    配置/etc/repmgr/12/repmgr.conf檔案:

    cat << EOF >> /etc/repmgr/12/repmgr.conf
    # Added by root user:
    node_id=3
    node_name='repmgr03'
    conninfo='host=192.168.29.193 port=5432 user=repmgr dbname=repmgr password=repmgr connect_timeout=3 application_name=repmgr03 keepalives=1 keepalives_idle=6 keepalives_interval=3 keepalives_count=3'
    data_directory='/data/pgdata/12'
    location='my-dc1'
    EOF
    

    建立密碼檔案.pgpass,避免在配置檔案中直接暴露密碼:

    cat << EOF > /home/postgres/.pgpass
    #LINE-FORMAT: hostname:port:database:username:password
    # For postgres superuser:
    127.0.0.1:5432:postgres:postgres:postgres
    localhost:5432:postgres:postgres:postgres
    repmgr01:5432:postgres:postgres:postgres
    192.168.29.191:postgres:postgres:postgres
    repmgr02:5432:postgres:postgres:postgres
    192.168.29.192:5432:postgres:postgres:postgres
    repmgr03:5432:postgres:postgres:postgres
    192.168.29.193:5432:postgres:postgres:postgres
    # For repmgr user:
    127.0.0.1:5432:repmgr:repmgr:repmgr
    localhost:5432:repmgr:repmgr:repmgr
    repmgr01:5432:repmgr:repmgr:repmgr
    192.168.29.191:5432:repmgr:repmgr:repmgr
    repmgr02:5432:repmgr:repmgr:repmgr
    192.168.29.192:5432:repmgr:repmgr:repmgr
    repmgr03:5432:repmgr:repmgr:repmgr
    192.168.29.193:5432:repmgr:repmgr:repmgr
    # For stream replication:
    repmgr01:5432:replication:repmgr:repmgr
    192.168.29.191:5432:replication:repmgr:repmgr
    repmgr02:5432:replication:repmgr:repmgr
    192.168.29.192:5432:replication:repmgr:repmgr
    repmgr03:5432:replication:repmgr:repmgr
    192.168.29.193:5432:replication:repmgr:repmgr
    EOF
    chown postgres:postgres /home/postgres/.pgpass
    chmod 600 /home/postgres/.pgpass
    

3.4 測試Repmgr連線

在所有Master和Standby節點(本例中為repmgr01, repmgr02和repmgr03)上,以postgres使用者身份測試各節點是否能連線Master節點:

su - postgres
/usr/pgsql-12/bin/psql 'host=repmgr01 user=repmgr dbname=repmgr connect_timeout=3' -c "select version();"

3.5 註冊Master節點到Repmgr

使用postgres使用者身份登入Master節點(repmgr01),執行如下操作將當前PostgreSQL Server例項註冊到Repmgr,並檢查狀態:

su - postgres
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf primary register
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster show

3.5 克隆Master節點到Standby節點

使用postgres使用者身份登入兩個Standby節點,在兩個Standby節點上執行如下操作,從Master節點將資料庫完整克隆至Standby節點上:

su - postgres
/usr/pgsql-12/bin/repmgr -h 192.168.29.191 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone --dry-run
/usr/pgsql-12/bin/repmgr -h 192.168.29.191 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone

如下所示為一個完整克隆操作的輸出過程例項:

$ /usr/pgsql-12/bin/repmgr -h 192.168.29.191 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/data/pgdata/12" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.29.191 user=repmgr dbname=repmgr
DETAIL: current installation size is 32 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met

$ /usr/pgsql-12/bin/repmgr -h 192.168.29.191 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone
NOTICE: destination directory "/data/pgdata/12" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.29.191 user=repmgr dbname=repmgr
DETAIL: current installation size is 32 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/data/pgdata/12"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /data/pgdata/12 -h 192.168.29.191 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /data/pgdata/12 start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

3.6 註冊Standby節點

接下來,我們需要在所有Standby節點上執行如下操作啟動PostgreSQL Server資料庫例項,並將其作為Standby節點註冊到Repmgr中:

start_pgsql
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby register --upstream-node-id=1 --dry-run
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby register --upstream-node-id=1

3.7 檢查流複製狀態

使用postgres使用者身份登入Master節點,執行如下操作檢查叢集流複製狀態:

psql -Upostgres -dpostgres -c "select * from pg_stat_replication;"
psql -Upostgres -dpostgres -c "select * from pg_replication_slots;"

3.7 檢查Repmgr叢集狀態

使用postgres使用者身份登入任意Master或者Standby節點,執行如下操作檢查Repmgr叢集狀態:

show_repmgr_cluster        #此命令是我們在前面配置的系統別名。
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster show --compact
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster event
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster matrix

4.啟用叢集HA功能

使用Repmgr來管理流複製叢集只是完成了一半的工作。為了獲得真正的自動故障轉移,我們必須在repmgr.conf檔案中配置其他引數,並啟動repmgr守護程式用來監控叢集中各節點狀態。這就是我們將在下面將要執行的內容。

4.1 啟用Witness節點

使用root使用者身份登入Witness節點(repmgr04),執行如下操作,初始化Witness節點:

mkdir -p /data/pgdata/12
chmod 700 /data/pgdata/12
chown -R postgres:postgres /data
chown -R postgres:postgres /etc/repmgr
su - postgres
echo "postgres" > ~/.postgres_passwd && chmod 600 ~/.postgres_passwd
initdb -U postgres -E UTF8 --locale=en_US.utf8 -D /data/pgdata/12 --wal-segsize=16 --data-checksums --auth=md5 --pwfile=/home/postgres/.postgres_passwd
rm -f /home/postgres/.postgres_passwd
echo "listen_addresses = '*'" >> /data/pgdata/12/postgresql.conf
cat << EOF >> /home/postgres/.bashrc
# ALIAS FOR witness DB:
alias start_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log start"
alias stop_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log -m fast stop"
alias restart_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log -m fast restart"
alias reload_pgsql="pg_ctl -D \$PGDATA -l \$PGDATA/postgresql.log reload"
alias show_repmgr_cluster="repmgr cluster show --compact"
EOF
source /home/postgres/.bashrc
start_pgsql

執行如下操作配置Witness節點的repmgr.conf配置檔案:

cat << EOF >> /etc/repmgr/12/repmgr.conf
node_id=9
node_name='repmgr04'
conninfo='host=192.168.29.194 port=5432 user=repmgr dbname=repmgr password=repmgr connect_timeout=3 application_name=repmgr04 keepalives=1 keepalives_idle=6 keepalives_interval=3 keepalives_count=3'
data_directory='/var/lib/pgsql/12/data'
location='my-dc1'
EOF

執行如下操作,為Witness節點安裝有用的擴充套件,並配置Witness節點pg_hba.conf檔案,允許來自repmgr使用者的連線:

su - postgres
psql -Upostgres -dtemplate1 -c "CREATE EXTENSION pg_stat_statements;"
psql -Upostgres -dpostgres -c "CREATE EXTENSION pg_stat_statements;"
echo "shared_preload_libraries ='repmgr, pg_stat_statements'" >> /data/pgdata/12/postgresql.conf
sed -i "/  replication /d" /data/pgdata/12/pg_hba.conf
cat << EOF >> /data/pgdata/12/pg_hba.conf
local   replication    repmgr                      trust
host    replication    repmgr   127.0.0.1/32       trust
host    replication    repmgr   192.168.29.0/24    trust
local   repmgr         repmgr                      trust
host    repmgr         repmgr   127.0.0.1/32       trust
host    repmgr         repmgr   192.168.29.0/16    trust
EOF

執行如下操作,在Witness節點上建立repmgr專用複製使用者,並配置密碼檔案,防止在配置檔案中直接暴露使用者密碼:

psql -Upostgres -dpostgres -W -c "CREATE USER repmgr WITH SUPERUSER CREATEDB CREATEROLE INHERIT LOGIN REPLICATION ENCRYPTED PASSWORD 'repmgr';"
psql -Upostgres -dpostgres -W -c "CREATE DATABASE repmgr WITH OWNER repmgr TEMPLATE template1;"
psql -Upostgres -dpostgres -W -c "ALTER USER repmgr SET search_path TO repmgr, public;"
psql -Urepmgr -drepmgr -W -c "show search_path;"

cat << EOF > /home/postgres/.pgpass
#LINE-FORMAT: hostname:port:database:username:password
# For repmgr user:
127.0.0.1:5432:repmgr:repmgr:repmgr
localhost:5432:repmgr:repmgr:repmgr
repmgr01:5432:repmgr:repmgr:repmgr
192.168.29.191:5432:repmgr:repmgr:repmgr
repmgr02:5432:repmgr:repmgr:repmgr
192.168.29.192:5432:repmgr:repmgr:repmgr
repmgr03:5432:repmgr:repmgr:repmgr
192.168.29.193:5432:repmgr:repmgr:repmgr
EOF
chown postgres:postgres /home/postgres/.pgpass
chmod 600 /home/postgres/.pgpass
restart_pgsql

在Master節點(repmgr01)上,用postgres使用者登入並檢查是否能連線Witness節點:

/usr/pgsql-12/bin/psql 'host=192.168.29.194 port=5432 user=repmgr dbname=repmgr connect_timeout=3' -c "show port;"

回到Witness節點上,以postgres使用者身份執行repmgr witness register命令,註冊Witness節點至Repmgr叢集。注意到下面的命令中我們使用的是主節點的IP地址,而不是見證節點的地址:

/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf witness register -h 192.168.29.191

以postgres使用者身份,在叢集中任意節點上執行如下操作,檢查當前叢集狀態:

repmgr cluster show --compact
repmgr cluster matrix

如下所示為本例項的叢集狀態示例:

$ repmgr cluster show --compact
 ID | Name     | Role    | Status    | Upstream | Location | Prio. | TLI
----+----------+---------+-----------+----------+----------+-------+-----
 1  | repmgr01 | primary | * running |          | default  | 100   | 1   
 2  | repmgr02 | standby |   running | repmgr01 | default  | 100   | 1   
 3  | repmgr03 | standby |   running | repmgr01 | default  | 100   | 1   
 9  | repmgr04 | witness | * running | repmgr01 | default  | 0     | n/a

$ repmgr cluster matrix
INFO: connecting to database
 Name     | ID | 1 | 2 | 3 | 9
----------+----+---+---+---+---
 repmgr01 | 1  | * | * | * | * 
 repmgr02 | 2  | * | * | * | * 
 repmgr03 | 3  | * | * | * | * 
 repmgr04 | 9  | * | * | * | * 

4.2 提升postgres使用者許可權

為了讓Repmgr能執行一些系統命令來完成故障轉移功能,postgres使用者需要具備sudo許可權,並且需要能無需密碼執行一些系統命令。因此我們將postgres使用者加入到系統的sudo使用者配置檔案中。使用root使用者在所有節點上(repmgr01、repmgr02、repmgr03、repmgr04)上操作,讓postgres使用者具有無需密碼啟停系統服務(包括postgresql和repmgr服務)的許可權:

chmod +w /etc/sudoers
cat << EOF >> /etc/sudoers 
postgres  ALL=NOPASSWD:ALL
EOF
chmod -w /etc/sudoers

4.3 啟用HA相關引數

下面我們使用postgres使用者登入叢集的各個節點,分別在各節點修改repmgr.conf配置檔案啟用故障轉移所需的引數配置。具體引數的含義請參考Repmgr官方文件的描述。

  1. 節點repmgr01:

    cat << EOF >> /etc/repmgr/12/repmgr.conf
    failover='automatic'
    promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
    follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
    priority=60
    monitor_interval_secs=2
    connection_check_type='ping'
    reconnect_attempts=3
    reconnect_interval=3
    primary_visibility_consensus=true
    standby_disconnect_on_failover=true
    repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgr12.service'
    repmgrd_service_stop_command='sudo /usr/bin/systemctl stop repmgr12.service'
    service_start_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log start'
    service_stop_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log -m fast stop'
    service_restart_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log -m fast restart'
    service_reload_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log reload'
    monitoring_history=yes
    log_status_interval=60
    log_level='INFO'
    log_facility='STDERR'
    log_file='/var/log/repmgr/repmgrd.log'
    replication_user='repmgr'
    replication_type='physical'
    use_replication_slots=yes
    EOF
    
  2. 節點repmgr02:

    cat << EOF >> /etc/repmgr/12/repmgr.conf
    failover='automatic'
    promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
    follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
    priority=60
    monitor_interval_secs=2
    connection_check_type='ping'
    reconnect_attempts=3
    reconnect_interval=3
    primary_visibility_consensus=true
    standby_disconnect_on_failover=true
    repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgr12.service'
    repmgrd_service_stop_command='sudo /usr/bin/systemctl stop repmgr12.service'
    service_start_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log start'
    service_stop_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log -m fast stop'
    service_restart_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log -m fast restart'
    service_reload_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log reload'
    monitoring_history=yes
    log_status_interval=60
    log_level='INFO'
    log_facility='STDERR'
    log_file='/var/log/repmgr/repmgrd.log'
    replication_user='repmgr'
    replication_type='physical'
    use_replication_slots=yes
    EOF
    
  3. 節點repmgr03:

    cat << EOF >> /etc/repmgr/12/repmgr.conf
    failover='automatic'
    promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
    follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
    priority=40
    monitor_interval_secs=2
    connection_check_type='ping'
    reconnect_attempts=3
    reconnect_interval=3
    primary_visibility_consensus=true
    standby_disconnect_on_failover=true
    repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgr12.service'
    repmgrd_service_stop_command='sudo /usr/bin/systemctl stop repmgr12.service'
    service_start_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log start'
    service_stop_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log -m fast stop'
    service_restart_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log -m fast restart'
    service_reload_command='/usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log reload'
    monitoring_history=yes
    log_status_interval=60
    log_level='INFO'
    log_facility='STDERR'
    log_file='/var/log/repmgr/repmgrd.log'
    replication_user='repmgr'
    replication_type='physical'
    use_replication_slots=yes
    EOF
    
  4. 節點repmgr04:

    cat << EOF >> /etc/repmgr/12/repmgr.conf
    failover='automatic'
    promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
    follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
    #priority=60
    monitor_interval_secs=2
    connection_check_type='ping'
    reconnect_attempts=3
    reconnect_interval=3
    primary_visibility_consensus=true
    standby_disconnect_on_failover=true
    repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgr12.service'
    repmgrd_service_stop_command='sudo /usr/bin/systemctl stop repmgr12.service'
    service_start_command='sudo /usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log start'
    service_stop_command='sudo /usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log -m fast stop'
    service_restart_command='sudo /usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log -m fast restart'
    service_reload_command='sudo /usr/pgsql-12/bin/pg_ctl -D /data/pgdata/12 -l /data/pgdata/12/postgresql.log reload'
    monitoring_history=yes
    log_status_interval=60
    log_level='INFO'
    log_facility='STDERR'
    log_file='/var/log/repmgr/repmgrd.log'
    replication_user='repmgr'
    replication_type='physical'
    use_replication_slots=yes
    repmgrd_pid_file='/var/run/repmgr/repmgrd-12.pid'
    EOF
    

4.4 啟動repmgr守護程序

上面我們已經在叢集所有節點和見證節點中設定了相關引數,接下來我們可以以守護程序方式執行Repmgr。在正式啟動Repmgr之前,我們先執行命令的--dry-run選項以測試啟動Repmgr守護程式中可能存在的問題。

我們首先需要在主節點中進行測試,然後在兩個備用節點中進行測試,然後在見證節點中進行測試。注意,該命令必須以postgres使用者身份執行:

su - postgres
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start --dry-run

如果所有節點以--dry-run的方式啟動Repmgr程序都沒有錯誤。則接下來,我們在所有四個節點中啟動守護程式:

/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start

如下所示為以守護程序方式啟動Repmgr程序,並檢查後臺程序狀態的命令輸出示例:

$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start
NOTICE: executing: "sudo /usr/bin/systemctl start repmgr12.service"
NOTICE: repmgrd was successfully started

$ ps -ef |grep repmgr |grep -v grep
postgres  1537  1270  0 19:17 ?        00:00:00 postgres: repmgr repmgr 192.168.29.194(53823) idle
postgres  1539     1  0 19:17 ?        00:00:00 /usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose

另外,我們還可以從主節點或備節點檢查服務啟動事件:

/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster event --event=repmgrd_start

最後,我們可以在任何主節點、備節點上檢查syslog的守護程式輸出。預設情況下Repmgr程序的日誌將輸出到系統日誌檔案(/var/log/message)中,例如:

cat /var/log/messages |grep --color=auto repmgrd |more

本例中,我們已將Repmgr程序的日誌獨立輸出到var/log/repmgr/repmgr.log中,因此,只需檢查這個檔案即可:

cat /var/log/repmgr/repmgrd.log

5.叢集啟動與關閉

叢集的啟動與關閉均使用postgres使用者在各節點上進行操作。

5.1 啟動叢集

在所有節點(repmgr01-repmgr04)上,啟動PostgreSQL Server資料庫例項:

start_pgsql
ps -ef |grep postgres

在主節點檢查流複製狀態:

psql -Upostgres -dpostgres -c "select * from pg_stat_replication;"
psql -Upostgres -dpostgres -c "select * from pg_replication_slots;"

在所有節點(repmgr01-repmgr04)上,啟動repmgr程序:

/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start

5.2 關閉叢集

在所有節點(repmgr01-repmgr04)上,停止repmgr程序:

repmgr daemon stop

在所有節點(repmgr01-repmgr04)上,停止PostgreSQL Server資料庫例項:

stop_pgsql
ps -ef |grep postgres