KingbaseES R6叢集備庫網絡卡down測試案例
阿新 • • 發佈:2022-03-30
資料庫版本:
test=# select version(); version ---------------------------------------------------------------------------------------------------------------------- KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit (1 row)
主機節點資訊:
[kingbase@node101 bin]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.101 node101 , #主庫
192.168.1.102 node102 #備庫
叢集節點資訊:
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+---------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node101 | primary | * running | | running | 11180 | no | n/a 2 | node102 | standby | running | node101 | running | 9242 | no | 0 second(s) ago
一、檢視叢集狀態及配置資訊
1、叢集節點狀態
[kingbase@node101 bin]$ ./repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------- 1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 2 | node102 | standby | running | node101 | default | 100 | 1 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2、叢集配置資訊
二、將備庫網絡卡down測試
1、備庫網絡卡down[root@node102 ~]# ifconfig enp0s3 down
2、檢視備庫messages日誌
3、備庫hamgr.log
=日誌資訊顯示repmgrd服務被close,無法提供正常的服務。=
4、主庫檢視叢集節點狀態
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+---------------+----------+----------+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=5 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | ? unreachable | node101 | default | 100 | ? | host=192.168.1.102 user=system dbname=esrep port=5 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
WARNING: following issues were detected
- unable to connect to node "node102" (ID: 2)
- node "node102" (ID: 2) is registered as an active standby but is unreachable
=== 從以上資訊所示,叢集沒有觸發主備庫的切換操作。===
三、備庫網絡卡恢復正常(up)
1、檢視叢集狀態資訊
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321nect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 1 | host=192.168.1.102 user=system dbname=esrep port=54321nect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2、檢視備庫hamgr.log
=如下日誌所示,備庫網絡卡恢復正常後,備庫通過接收wal日誌流執行recovery,和主庫同步。=
[2022-03-29 16:11:45] [INFO] node "node102" (ID: 2) monitoring upstream node "node101" (ID: 1) in normal state
[2022-03-29 16:11:45] [ERROR] unable to determine if server is in recovery
[2022-03-29 16:11:45] [DETAIL]
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[2022-03-29 16:11:45] [DETAIL] query text is:
SELECT pg_catalog.pg_is_in_recovery()
[2022-03-29 16:11:47] [NOTICE] upstream is available but upstream connection has gone away, resetting
[2022-03-29 16:12:24] [ERROR] is_rep_sync_streaming(): get 2 tuples
[2022-03-29 16:12:45] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:45] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:12:47] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:47] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:12:49] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:49] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:16:47] [INFO] node "node102" (ID: 2) monitoring upstream node "node101" (ID: 1) in normal state
四、總結
1、對於備庫,如果網絡卡down引起的網路故障,並不會觸發叢集的主備切換。當網絡卡正常後,叢集恢復正常。
2、如果備庫的資料庫服務down,在recovery=‘automatic | standby’配置時,會自動恢復備庫的資料庫服務。
3、本案例是在一主一備的架構下的測試,如果是一主多備的架構,對於同步狀態是‘sync’的備庫網絡卡down,會導致其他的備庫進行競選,將同步狀態提升為‘sync’。