KES V8R6叢集手工配置VIP案例

阿新 • • 發佈：2021-06-29

經常有使用者問，V8R6叢集搭建時沒有配置VIP，搭建完成後，如何新增VIP？以下向大家介紹下手動新增VIP 的過程。

一、作業系統環境

作業系統（UOS)：
root@uos01:~# cat /etc/issue
Uniontech OS Server 20 Enterprise \n \l

資料庫：
test=# select version();
                                                       version                                                      
-------------------------------------------------------------------------------
 KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

二、叢集架構資訊

1、前期部署

前期部署時，沒有配置VIP

2、檢視叢集節點狀態資訊

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

3、檢視repmgr.conf檔案

kingbase@uos01:~/cluster/R6HA/kha/kingbase/etc$ cat repmgr.conf 
on_bmj=off
node_id=1
node_name='node238'
promote_command='/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6HA/kha/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr  standby follow  -f /home/kingbase/cluster/R6HA/kha/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log'
data_directory='/home/kingbase/cluster/R6HA/kha/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6HA/kha/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=3
reconnect_interval=5
failover='automatic'
recovery='manual'
monitoring_history='no'
trusted_servers='192.168.7.1'
synchronous='quorum'
repmgrd_pid_file='/home/kingbase/cluster/R6HA/kha/kingbase/hamgrd.pid'
ping_path='/usr/bin'

===從以上配置檔案獲知，檔案中沒有virtual_ip的配置項===

4、sys_monitor.sh啟動叢集

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart
2021-03-01 12:07:25 Ready to stop all DB ...
Service process "node_export" was killed at process 12391
Service process "postgres_ex" was killed at process 12392
Service process "node_export" was killed at process 5229
Service process "postgres_ex" was killed at process 5230
2021-03-01 12:07:28 begin to stop repmgrd on "[192.168.7.238]".
2021-03-01 12:07:29 repmgrd on "[192.168.7.238]" stop success.
2021-03-01 12:07:29 begin to stop repmgrd on "[192.168.7.239]".
2021-03-01 12:07:29 repmgrd on "[192.168.7.239]" stop success.
2021-03-01 12:07:29 begin to stop DB on "[192.168.7.239]".
waiting for server to shut down.... done
server stopped
2021-03-01 12:07:30 DB on "[192.168.7.239]" stop success.
2021-03-01 12:07:30 begin to stop DB on "[192.168.7.238]".
waiting for server to shut down.... done
server stopped
2021-03-01 12:07:30 DB on "[192.168.7.238]" stop success.
2021-03-01 12:07:30 Done.
2021-03-01 12:07:30 Ready to start all DB ...
2021-03-01 12:07:30 begin to start DB on "[192.168.7.238]".
waiting for server to start.... done
server started
2021-03-01 12:07:31 execute to start DB on "[192.168.7.238]" success, connect to check it.
2021-03-01 12:07:32 DB on "[192.168.7.238]" start success.
2021-03-01 12:07:32 Try to ping trusted_servers on host 192.168.7.238 ...
2021-03-01 12:07:34 Try to ping trusted_servers on host 192.168.7.239 ...
2021-03-01 12:07:37 begin to start DB on "[192.168.7.239]".
waiting for server to start.... done
server started
2021-03-01 12:07:37 execute to start DB on "[192.168.7.239]" success, connect to check it.
2021-03-01 12:07:38 DB on "[192.168.7.239]" start success.
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2021-03-01 12:07:38 The primary DB is started.
2021-03-01 12:07:38 begin to start repmgrd on "[192.168.7.238]".
[2021-03-01 12:07:39] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 12:07:39] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log"

2021-03-01 12:07:39 repmgrd on "[192.168.7.238]" start success.
2021-03-01 12:07:39 begin to start repmgrd on "[192.168.7.239]".
[2021-03-01 12:07:35] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 12:07:35] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log"

2021-03-01 12:07:40 repmgrd on "[192.168.7.239]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node238 | primary | * running |          | running | 13285 | no      | n/a                
 2  | node239 | standby |   running | node238  | running | 5508  | no      | 0 second(s) ago    
2021-03-01 12:07:44 Done.

===從以上資訊獲知，在叢集啟動過程中，沒有對VIP檢測的環節。===

三、修改repmgr.conf配置檔案配置vip（需要在所有節點執行）

1、確定配置vip的網絡卡

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever

====配置vip的網絡卡必須和物理ip是同一個裝置。====

2、確定ip和arping可執行檔案路徑和許可權

確定ip和arping可執行檔案路徑：

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ which arping
/usr/bin/arping
root@uos01:~# which ip
/usr/sbin/ip

檢視arping版本：

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ arping -V
arping utility, iputils-s20180629
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ls arping
arping
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ which arping
/usr/bin/arping

===作業系統的arping版本是沒問題。正確的版本號，顯示：ipuitils-xxxx 都可以

配置ip和arping可執行檔案許可權（配置setuid許可權）：

root@uos01:~# ls -lh /usr/bin/arping
-rwxr-xr-x 1 root root 27K Jan 14  2020 /usr/bin/arping
root@uos01:~# ls -lh /usr/bin/ip
-rwxr-xr-x 1 root root 575K Jun  4  2021 /usr/bin/ip
root@uos01:~# chmod 4755 /usr/bin/arping
root@uos01:~# chmod 4755 /usr/sbin/ip
root@uos01:~# ls -lh /usr/bin/arping
-rwsr-xr-x 1 root root 27K Jan 14  2020 /usr/bin/arping
root@uos01:~# ls -lh /usr/sbin/ip
lrwxrwxrwx 1 root root 7 Jun  4  2021 /usr/sbin/ip -> /bin/ip
root@uos01:~# ls -lh /bin/ip
-rwsr-xr-x 1 root root 575K Jun  4  2021 /bin/ip

注意：

1）ip命令用於載入和解除安裝vip。

2）arping命令用於vip切換中的arp cache的清理和測試。

3、修改repmgr.conf配置檔案

3、修改repmgr.conf檔案

四、重新啟動叢集（sys_monitor.sh啟動）

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart
2021-03-01 12:22:39 Ready to stop all DB ...
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
2021-03-01 12:22:42 begin to stop repmgrd on "[192.168.7.238]".
2021-03-01 12:22:43 repmgrd on "[192.168.7.238]" already stopped.
2021-03-01 12:22:43 begin to stop repmgrd on "[192.168.7.239]".
2021-03-01 12:22:43 repmgrd on "[192.168.7.239]" already stopped.
2021-03-01 12:22:43 begin to stop DB on "[192.168.7.239]".
waiting for server to shut down.... done
server stopped
2021-03-01 12:22:44 DB on "[192.168.7.239]" stop success.
2021-03-01 12:22:44 begin to stop DB on "[192.168.7.238]".
waiting for server to shut down.... done
server stopped
2021-03-01 12:22:44 DB on "[192.168.7.238]" stop success.
2021-03-01 12:22:44 Done.
2021-03-01 12:22:44 Ready to start all DB ...
2021-03-01 12:22:44 begin to start DB on "[192.168.7.238]".
waiting for server to start.... done
server started
2021-03-01 12:22:45 execute to start DB on "[192.168.7.238]" success, connect to check it.
2021-03-01 12:22:46 DB on "[192.168.7.238]" start success.
2021-03-01 12:22:46 Try to ping trusted_servers on host 192.168.7.238 ...
2021-03-01 12:22:48 Try to ping trusted_servers on host 192.168.7.239 ...
2021-03-01 12:22:51 begin to start DB on "[192.168.7.239]".
waiting for server to start.... done
server started
2021-03-01 12:22:51 execute to start DB on "[192.168.7.239]" success, connect to check it.
2021-03-01 12:22:52 DB on "[192.168.7.239]" start success.
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2021-03-01 12:22:53 The primary DB is started.
2021-03-01 12:22:57 Success to load virtual ip [192.168.7.244/24] on primary host [192.168.7.238].
2021-03-01 12:22:57 Try to ping vip on host 192.168.7.238 ...
2021-03-01 12:22:59 Try to ping vip on host 192.168.7.239 ...
2021-03-01 12:23:02 begin to start repmgrd on "[192.168.7.238]".
[2021-03-01 12:23:02] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 12:23:02] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log"

2021-03-01 12:23:02 repmgrd on "[192.168.7.238]" start success.
2021-03-01 12:23:02 begin to start repmgrd on "[192.168.7.239]".
[2021-03-01 12:22:58] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 12:22:58] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log"

2021-03-01 12:23:03 repmgrd on "[192.168.7.239]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node238 | primary | * running |          | running | 15043 | no      | n/a                
 2  | node239 | standby |   running | node238  | running | 6440  | no      | n/a                
2021-03-01 12:23:07 Done.

=== 從以上資訊可獲知，叢集重啟後已經開始載入VIP地址 [192.168.7.244/24] ===

五、驗證叢集狀態

1、檢視vip的載入

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.7.244/24 scope global secondary enp0s3:3
       valid_lft forever preferred_lft forever

=== 從以上獲知，vip載入在主庫節點成功===

2、檢視叢集節點狀態

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

3、通過vip連線資料庫檢視流複製狀態

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./ksql -h 192.168.7.244 -U system test
ksql (V8.0)
Type "help" for help.

test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_s
tart         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag |
 replay_lag | sync_priority | sync_state |          reply_time           
-------+----------+---------+------------------+---------------+-----------------+
 14935 |    16384 | esrep   | node239          | 192.168.7.239 |                 |       58172 | 2021-03-01 12:22:
51.831920+08 |              | streaming | 0/6000670 | 0/6000670 | 0/6000670 | 0/6000670  |           |           |
            |             1 | quorum     | 2021-03-01 12:24:30.751707+08
(1 row)

六、主備switchover切換測試

1、切換前叢集節點狀態

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、執行switchover的切換

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr standby switchover --siblings-follow
NOTICE: executing switchover on node "node239" (ID: 2)
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
INFO: pausing repmgrd on node "node238" (ID 1)
INFO: pausing repmgrd on node "node239" (ID 2)
NOTICE: local node "node239" (ID: 2) will be promoted to primary; current primary "node238" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node238" (ID: 1)
NOTICE: issuing CHECKPOINT
NOTICE: node (ID: 1) release the virtual ip 192.168.7.244/24 success
DETAIL: executing server command "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl  -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/7000028
NOTICE: PING 192.168.7.244 (192.168.7.244) 56(84) bytes of data.

--- 192.168.7.244 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 3ms


WARNING: ping host"192.168.7.244" failed
DETAIL: average RTT value is not greater than zero
NOTICE: new primary node (ID: 2) acquire the virtual ip 192.168.7.244/24 success
NOTICE: promoting standby to primary
DETAIL: promoting server "node239" (ID: 2) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node239" (ID: 2) was successfully promoted to primary
NOTICE: issuing CHECKPOINT
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node's recovery point: 0/7000028; rejoin target node's fork point: 0/70000A0
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: begin to start server at 2021-03-01 12:29:42.971664
NOTICE: starting server using "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl  -w -t 90 -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile start"
NOTICE: start server finish at 2021-03-01 12:29:43.087104
NOTICE: replication slot "repmgr_slot_2" deleted on node 1
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
NOTICE: switchover was successful
DETAIL: node "node239" is now primary and node "node238" is attached as standby
INFO: unpausing repmgrd on node "node238" (ID 1)
INFO: unpause node "node238" (ID 1) successfully
INFO: unpausing repmgrd on node "node239" (ID 2)
INFO: unpause node "node239" (ID 2) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully

3、檢視切換後vip的載入

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:c9:c0:27 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.239/24 brd 192.168.7.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.7.244/24 scope global secondary enp0s3:3
       valid_lft forever preferred_lft forever

=== 由以上獲知，vip已經載入到新的主庫上===

4、檢視切換後的節點狀態（切換狀態正常）

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node238 | standby |   running | node239  | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node239 | primary | * running |          | default  | 100      | 2        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

5、檢視原主庫vip（已經被解除安裝）

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever

七、叢集failover switch測試

1、關閉主庫資料庫服務

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./sys_ctl stop -D ../data
waiting for server to shut down.... done
server stopped

2、檢視failover後集群節點狀態

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show
 ID | Name    | Role    | Status               | Upstream  | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+----------------------+-----------+----------+----------+
 1  | node238 | standby | ! running as primary | ? node239 | default  | 100      | 3        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node239 | primary | ? unreachable        |           | default  | 100      | ?        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

WARNING: following issues were detected
  - node "node238" (ID: 1) is registered as standby but running as primary
  - unable to connect to node "node238" (ID: 1)'s upstream node "node239" (ID: 2)
  - unable to determine if node "node238" (ID: 1) is attached to its upstream node "node239" (ID: 2)
  - unable to connect to node "node239" (ID: 2)
  - node "node239" (ID: 2) is registered as an active primary but is unreachable

=== 從以上獲知，在主庫資料庫服務宕機後，發生failover的切換，原備庫被切換為新的主庫，在節點狀態中原主庫的狀態為”unreachable“。===

八、配置過程中的故障資訊

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart

the dir "/sbin" has no execute file "arping", please set [arping_path] in /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart
2021-03-01 12:19:27 Ready to stop all DB ...
Service process "node_export" was killed at process 13382
Service process "postgres_ex" was killed at process 13383
Service process "node_export" was killed at process 5575
Service process "postgres_ex" was killed at process 5576
2021-03-01 12:19:31 begin to stop repmgrd on "[192.168.7.238]".
2021-03-01 12:19:31 repmgrd on "[192.168.7.238]" stop success.
2021-03-01 12:19:31 begin to stop repmgrd on "[192.168.7.239]".
2021-03-01 12:19:32 repmgrd on "[192.168.7.239]" stop success.
2021-03-01 12:19:32 begin to stop DB on "[192.168.7.239]".
incorrect command permissions for the virtual ip.
waiting for server to shut down.... done
server stopped
2021-03-01 12:19:33 DB on "[192.168.7.239]" stop success.
2021-03-01 12:19:33 begin to stop DB on "[192.168.7.238]".
incorrect command permissions for the virtual ip.
waiting for server to shut down.... done
server stopped
2021-03-01 12:19:33 DB on "[192.168.7.238]" stop success.
2021-03-01 12:19:33 Done.
2021-03-01 12:19:33 Ready to start all DB ...
2021-03-01 12:19:33 begin to start DB on "[192.168.7.238]".
incorrect command permissions for the virtual ip.
waiting for server to start.... done
server started
2021-03-01 12:19:34 execute to start DB on "[192.168.7.238]" success, connect to check it.
2021-03-01 12:19:35 DB on "[192.168.7.238]" start success.
2021-03-01 12:19:35 Try to ping trusted_servers on host 192.168.7.238 ...
2021-03-01 12:19:37 Try to ping trusted_servers on host 192.168.7.239 ...
2021-03-01 12:19:40 begin to start DB on "[192.168.7.239]".
incorrect command permissions for the virtual ip.
waiting for server to start.... done
server started
2021-03-01 12:19:40 execute to start DB on "[192.168.7.239]" success, connect to check it.
2021-03-01 12:19:41 DB on "[192.168.7.239]" start success.
ERROR: No execute permission for "/usr/sbin/ip"
incorrect command permissions for the virtual ip.
2021-03-01 12:19:42 There is no primary DB running, will do nothing and exit.

　=== 從以上故障獲知，在配置檔案沒有設定arping可執行檔案的路徑及ip和arping可執行檔案沒有設定setuid許可權===

九、操作步驟總結：

1） 確定需要配置的vip地址，需和物理ip同網段，並且沒有被使用。    
2） 檢視arping和ip可執行檔案的路徑及arping的版本。    
3） 對ip和arping可執行檔案配置setuid許可權（s許可權）。    
4） 修改repmgr.conf檔案新增配置項。    
5） 重新啟動叢集並驗證叢集狀態。    
6） 主備切換測試。    
7） 應用連線vip訪問測試。

KES V8R6叢集手工配置VIP案例

經常有使用者問，V8R6叢集搭建時沒有配置VIP，搭建完成後，如何新增VIP？以下向大家介紹下手動新增VIP 的過程。

KingbaseES R6叢集手工配置vip案例

案例環境：作業系統（UOS)： root@uos01:~# cat /etc/issue Uniontech OS Server 20 Enterprise \\n \\l

kingbaseES R3叢集 SSL 配置測試案例

案例說明：本測試是在非生產環境下，在官方沒有明確宣告支援KingbaseCluster使用ssl的前提下，建議只能在測試環境使用，避免生產環境下直接使用。

kingbaseES R3叢集防火牆配置案例

kingbaseES R3叢集防火牆配置案例案例環境：作業系統： [root@node1 ~]# cat /etc/centos-release

kingbaseES R6 叢集手工切換案例

KingbaseES V8R6叢集sys_backup.sh外部備份案例

案例說明：本案例採用sys_backup.sh執行物理備份，備份使用如下邏輯架構：叢集採用CentOS 7系統，repo採用kylin V10 Server。

HBase 系列（四）—— HBase 叢集環境配置

一、叢集規劃這裡搭建一個 3 節點的 HBase 叢集，其中三臺主機上均為 Regin Server。同時為了保證高可用，除了在 hadoop001 上部署主 Master 服務外，還在 hadoop002 上部署備用的 Master 服務。Master 服務由 Zook

SpringBoot系列教程之Redis叢集環境配置

之前介紹的幾篇redis的博文都是基於單機的redis基礎上進行演示說明的，然而在實際的生產環境中，使用redis叢集的可能性應該是大於單機版的redis的，那麼叢集的redis如何操作呢？它的配置和單機的有什麼區別，又有什麼

解決：kubernetes 叢集DNS配置及容器內CoreDNS解析外部域名配置問題

近期devops過程中發現在kubernetes 中啟動Jenkins master 執行job 啟動slave時出現概率事件解析不到gitlab的域名。第一時間反射到的是dns問題，具體是DNS哪裡的配置問題慢慢刨根。

jenkins 部署docker 容器 eureka 叢集完整配置多臺伺服器

jenkins通過流水線作業pipiline部署伺服器192.168.89.135192.168.89.136192.168.89.141 jenkins 構建兩個item，一個build，一個deploy

ETCD叢集安裝配置及簡單應用老版本

一、環境準備 CentOS Linux release 7.3.1611 (Core) etcd-v3.2.6 二、ETCD下載 https://github.com/coreos/etcd/releases/download/v3.2.6/etcd-v3.2.6-linux-amd64.tar.gz

MySQL PXC叢集安裝配置

1、關閉防火牆 [root@node04 ~]#systemctl disable firewalld [root@node04 ~]#systemctl stop firewalld

關於Hadoop叢集的配置方法——另附安裝網址

hadoop-2.6.0.tar.gz（壓縮包網址 https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/）配置前需要配置主機名：vi etc/hosts

Elasticsearch7 配置叢集時配置檔案(解決elasticsearch head無法連線elasticsearch問題)

1.主節點配置檔案 [root@es_master ~]# cat/home/wx/elasticsearch-7.9.0/config/elasticsearch.yml cluster.name: ELK-Cluster

RabbitMQ 叢集部署配置

http://www.linuxhub.cn/2018/08/12/install-rabbitmq-cluster.html rabbitmq叢集安裝： 1.安裝erlang環境，安裝mq

spring security在分散式專案下的配置方法(案例詳解)

分散式專案和傳統專案的區別就是，分散式專案有多個服務，每一個服務僅僅只實現一套系統中一個或幾個功能，所有的服務組合在一起才能實現系統的完整功能。這會產生一個問題，多個服務之間session不能共享，你在其中一

大資料學習16_Spark叢集搭建以及入門案例執行

Spark Spark是什麼 Apache Spark 是一個快速的, 多用途的叢集計算系統, 相對於 Hadoop MapReduce 將中間結果儲存在磁碟中, Spark 使用了記憶體儲存中間結果, 能在資料尚未寫入硬碟時在記憶體中進行運算.

Elasticsearch生產叢集的配置建議

目錄 1 伺服器的記憶體 2 伺服器的CPU 3 伺服器的磁碟 4 叢集的網路 5 叢集的節點個數

hadoop叢集安裝配置Kerberos（二）：hadoop叢集配置 kerberos 認證

技術標籤：大資料linuxhadoopkerberos 目錄前言一、配置 SASL認證證書二、修改叢集配置檔案

微服務架構學習（五）：註冊中心叢集（非VIP可通過微信公眾號學習）

技術標籤：SpringCloudEurekaEureka叢集註冊中心廢話不多說，先來掃碼關注一下吧：

KES V8R6叢集手工配置VIP案例

一、作業系統環境

二、叢集架構資訊

1、前期部署

2、檢視叢集節點狀態資訊

3、檢視repmgr.conf檔案

4、sys_monitor.sh啟動叢集

三、修改repmgr.conf配置檔案配置vip（需要在所有節點執行）

1、確定配置vip的網絡卡

2、確定ip和arping可執行檔案路徑和許可權

3、修改repmgr.conf檔案

四、重新啟動叢集（sys_monitor.sh啟動）

五、驗證叢集狀態

1、檢視vip的載入

2、檢視叢集節點狀態

3、通過vip連線資料庫檢視流複製狀態

六、主備switchover切換測試

1、切換前叢集節點狀態

2、執行switchover的切換

3、檢視切換後vip的載入

4、檢視切換後的節點狀態（切換狀態正常）

5、檢視原主庫vip（已經被解除安裝）

七、叢集failover switch測試

1、關閉主庫資料庫服務

2、檢視failover後集群節點狀態

八、配置過程中的故障資訊

九、操作步驟總結：

相關推薦