1. 程式人生 > 其它 >KingbaseES R6叢集備庫網絡卡down測試案例

KingbaseES R6叢集備庫網絡卡down測試案例

資料庫版本:

test=# select version();
                                                       version
----------------------------------------------------------------------------------------------------------------------
 KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

主機節點資訊:

[kingbase@node101 bin]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.101   node101  ,  #主庫
192.168.1.102   node102     #備庫

叢集節點資訊:

ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node101 | primary | * running |          | running | 11180 | no      | n/a
 2  | node102 | standby |   running | node101  | running | 9242  | no      | 0 second(s) ago

一、檢視叢集狀態及配置資訊

1、叢集節點狀態

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                         
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby |   running | node101  | default  | 100      | 1        | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、叢集配置資訊

二、將備庫網絡卡down測試

1、備庫網絡卡down
[root@node102 ~]# ifconfig enp0s3 down

2、檢視備庫messages日誌

3、備庫hamgr.log

=日誌資訊顯示repmgrd服務被close,無法提供正常的服務。=

4、主庫檢視叢集節點狀態

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                 
----+---------+---------+---------------+----------+----------+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running     |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=5 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby | ? unreachable | node101  | default  | 100      | ?        | host=192.168.1.102 user=system dbname=esrep port=5 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

WARNING: following issues were detected
  - unable to connect to node "node102" (ID: 2)
  - node "node102" (ID: 2) is registered as an active standby but is unreachable

=== 從以上資訊所示,叢集沒有觸發主備庫的切換操作。===

三、備庫網絡卡恢復正常(up)

1、檢視叢集狀態資訊

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=54321nect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby |   running | node101  | default  | 100      | 1        | host=192.168.1.102 user=system dbname=esrep port=54321nect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、檢視備庫hamgr.log

=如下日誌所示,備庫網絡卡恢復正常後,備庫通過接收wal日誌流執行recovery,和主庫同步。=

[2022-03-29 16:11:45] [INFO] node "node102" (ID: 2) monitoring upstream node "node101" (ID: 1) in normal state
[2022-03-29 16:11:45] [ERROR] unable to determine if server is in recovery
[2022-03-29 16:11:45] [DETAIL]
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

[2022-03-29 16:11:45] [DETAIL] query text is:
SELECT pg_catalog.pg_is_in_recovery()
[2022-03-29 16:11:47] [NOTICE] upstream is available but upstream connection has gone away, resetting
[2022-03-29 16:12:24] [ERROR] is_rep_sync_streaming(): get 2 tuples
[2022-03-29 16:12:45] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:45] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:12:47] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:47] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:12:49] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:49] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:16:47] [INFO] node "node102" (ID: 2) monitoring upstream node "node101" (ID: 1) in normal state

四、總結

 1、對於備庫,如果網絡卡down引起的網路故障,並不會觸發叢集的主備切換。當網絡卡正常後,叢集恢復正常。
 2、如果備庫的資料庫服務down,在recovery=‘automatic | standby’配置時,會自動恢復備庫的資料庫服務。
 3、本案例是在一主一備的架構下的測試,如果是一主多備的架構,對於同步狀態是‘sync’的備庫網絡卡down,會導致其他的備庫進行競選,將同步狀態提升為‘sync’。