kingbaseES R6 叢集“雙主”故障解決案例

阿新 • • 發佈：2021-06-17

案例測試環境：

作業系統：
[kingbase@node1 bin]$ cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)

資料庫：
[kingbase@node1 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help.

test=# select version();
                                                       version                                                    
----------------------------------------------------------------------------------------
 KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

操作步驟總結：

1、檢視和確定主備庫後，關閉叢集（cluster和db）服務。
2、修改系統ip及/etc/hosts檔案中ip。
3、修改叢集配置檔案repmgr.conf中的物理ip和vip資訊。
4、重啟系統網路服務應用新的物理ip。
5、啟動主備庫資料庫服務。
6、註冊主庫到叢集。
7、關閉備庫資料庫服務，註冊備庫到叢集並將備庫節點重新加入到叢集。
8、檢視叢集服務狀態（cluster和db）並啟動主備庫repmgrd服務。
9、重啟叢集（sys_monitor.sh）服務驗證。

一、叢集啟動後“雙主”故障


[kingbase@node1 bin]$ ./sys_monitor.sh restart
2021-03-01 13:30:03 Ready to stop all DB ...
Service process "node_export" was killed at process 8253
Service process "postgres_ex" was killed at process 8254
Service process "node_export" was killed at process 8131
Service process "postgres_ex" was killed at process 8132
2021-03-01 13:30:09 begin to stop repmgrd on "[192.168.7.248]".
2021-03-01 13:30:10 repmgrd on "[192.168.7.248]" stop success.
2021-03-01 13:30:10 begin to stop repmgrd on "[192.168.7.249]".
2021-03-01 13:30:11 repmgrd on "[192.168.7.249]" stop success.
2021-03-01 13:30:11 begin to stop DB on "[192.168.7.249]".
waiting for server to shut down..... done
server stopped
2021-03-01 13:30:13 DB on "[192.168.7.249]" stop success.
2021-03-01 13:30:13 begin to stop DB on "[192.168.7.248]".
waiting for server to shut down.... done
server stopped
2021-03-01 13:30:14 DB on "[192.168.7.248]" stop success.
2021-03-01 13:30:14 Done.
2021-03-01 13:30:14 Ready to start all DB ...
2021-03-01 13:30:14 begin to start DB on "[192.168.7.248]".
waiting for server to start.... done
server started
2021-03-01 13:30:16 execute to start DB on "[192.168.7.248]" success, connect to check it.
2021-03-01 13:30:17 DB on "[192.168.7.248]" start success.
2021-03-01 13:30:17 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 13:30:19 Try to ping trusted_servers on host 192.168.7.249 ...
2021-03-01 13:30:22 begin to start DB on "[192.168.7.249]".
waiting for server to start.... done
server started
2021-03-01 13:30:23 execute to start DB on "[192.168.7.249]" success, connect to check it.
2021-03-01 13:30:24 DB on "[192.168.7.249]" start success.
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+-------
 1  | node248 | primary | * running |          | default  | 100      | 5        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | primary | ! running |          | default  | 100      | 4        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
WARNING: following issues were detected
  - node "node249" (ID: 2) is running but the repmgr node record is inactive
2021-03-01 13:30:24 There are more than one primary DBs([2] DBs are running), will do nothing and exit.

檢視原備庫資料庫服務：


[kingbase@node1 etc]$ ps -ef |grep kingbase

kingbase 20612     1  0 13:30 ?        00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/KHA/kingbase/data
kingbase 20626 20612  0 13:30 ?        00:00:00 kingbase: logger   
kingbase 20628 20612  0 13:30 ?        00:00:00 kingbase: checkpointer   
kingbase 20629 20612  0 13:30 ?        00:00:00 kingbase: background writer   
kingbase 20630 20612  0 13:30 ?        00:00:00 kingbase: walwriter   
kingbase 20631 20612  0 13:30 ?        00:00:00 kingbase: autovacuum launcher   
kingbase 20632 20612  0 13:30 ?        00:00:00 kingbase: archiver   
kingbase 20633 20612  0 13:30 ?        00:00:00 kingbase: stats collector   
kingbase 20634 20612  0 13:30 ?        00:00:00 kingbase: ksh writer   
kingbase 20635 20612  0 13:30 ?        00:00:00 kingbase: ksh collector   
kingbase 20636 20612  0 13:30 ?        00:00:00 kingbase: sys_kwr collector   
kingbase 20637 20612  0 13:30 ?        00:00:00 kingbase: logical replication launcher

原主庫檢視叢集節點狀態和資料庫服務：


[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+----------------------+----------+----------+----------+--------
 1  | node248 | standby | ! running as primary |          | default  | 100      | 5        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | primary | * running            |          | default  | 100      | 4        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

WARNING: following issues were detected
  - node "node248" (ID: 1) is running as primary but the repmgr node record is inactive


[kingbase@node2 bin]$ ps -ef |grep kingbase

kingbase 20161     1  0 13:29 ?        00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/KHA/kingbase/data
kingbase 20172 20161  0 13:29 ?        00:00:00 kingbase: logger   
kingbase 20176 20161  0 13:29 ?        00:00:00 kingbase: checkpointer   
kingbase 20177 20161  0 13:29 ?        00:00:00 kingbase: background writer   
kingbase 20178 20161  0 13:29 ?        00:00:00 kingbase: walwriter   
kingbase 20179 20161  0 13:29 ?        00:00:00 kingbase: autovacuum launcher   
kingbase 20180 20161  0 13:29 ?        00:00:00 kingbase: archiver   
kingbase 20181 20161  0 13:29 ?        00:00:00 kingbase: stats collector   
kingbase 20182 20161  0 13:29 ?        00:00:00 kingbase: ksh writer   
kingbase 20183 20161  0 13:29 ?        00:00:00 kingbase: ksh collector   
kingbase 20184 20161  0 13:29 ?        00:00:00 kingbase: sys_kwr collector   
kingbase 20185 20161  0 13:29 ?        00:00:00 kingbase: logical replication launcher

二、檢視控制檔案對比節點資料差異

新主庫：


[kingbase@node1 bin]$ ./sys_controldata -D ../data
sys_control version number:            1201
Catalog version number:               201909212
Database system identifier:           6950158917747347623
Database cluster state:               in production
sys_control last modified:             Mon 01 Mar 2021 01:35:16 PM CST
Latest checkpoint location:           1/F2008980
Latest checkpoint's REDO location:    1/F2008948
Latest checkpoint's REDO WAL file:    0000000500000001000000F2
Latest checkpoint's TimeLineID:       5
Latest checkpoint's PrevTimeLineID:   5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:8813
Latest checkpoint's NextOID:          32951
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        839
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  8813
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Mon 01 Mar 2021 01:35:16 PM CST

原主庫：

[kingbase@node2 bin]$ ./sys_controldata -D ../data
sys_control version number:            1201
Catalog version number:               201909212
Database system identifier:           6950158917747347623
Database cluster state:               in production
sys_control last modified:             Mon 01 Mar 2021 01:34:45 PM CST
Latest checkpoint location:           1/F2002AC0
Latest checkpoint's REDO location:    1/F2002A88
Latest checkpoint's REDO WAL file:    0000000400000001000000F2
Latest checkpoint's TimeLineID:       4
Latest checkpoint's PrevTimeLineID:   4
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:8810
Latest checkpoint's NextOID:          32951
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        839
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  8810
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Mon 01 Mar 2021 01:34:45 PM CST

從control檔案對比可以獲知，新主庫的timeline(5)高於原主庫timeline(4);並且新主庫的事務id：8813高於原主庫事務id：8810，故選擇新主庫作為叢集的primary節點，原主庫被standby。

三、將原主庫重新加入到叢集


[kingbase@node2 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down.... done
server stopped
[kingbase@node2 bin]$ ./repmgr node rejoin -h 192.168.7.248 -U esrep -d esrep
ERROR: this node cannot attach to rejoin target node 1
DETAIL: rejoin target server's timeline 5 forked off current database system timeline 4 before current recovery point 1/F2002B70
HINT: use --force-rewind to execute sys_rewind

[kingbase@node2 bin]$ ./repmgr node rejoin -h 192.168.7.248 -U esrep -d esrep --force-rewind
NOTICE: sys_rewind execution required for this node to attach to rejoin target node 1
DETAIL: rejoin target server's timeline 5 forked off current database system timeline 4 before current recovery point 1/F2002B70
NOTICE: executing sys_rewind
DETAIL: sys_rewind command is "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_rewind -D '/home/kingbase/cluster/R6HA/KHA/kingbase/data' --source-server='host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'"
sys_rewind: servers diverged at WAL location 1/F20000D8 on timeline 4
sys_rewind: rewinding from last common checkpoint at 1/F2000060 on timeline 4
sys_rewind: find last common checkpoint start time from 2021-03-01 14:06:28.539405 CST to 2021-03-01 14:06:28.577794 CST, in "0.038389" seconds.
sys_rewind: update the control file: minRecoveryPoint is '1/F2031590', minRecoveryPointTLI is '5', and database state is 'in archive recovery'
sys_rewind: we will remove the dir '/home/kingbase/cluster/R6HA/KHA/kingbase/data/sys_replslot/repmgr_slot_1.rewind' and all the file/dir in it.
sys_rewind: we will remove the dir '/home/kingbase/cluster/R6HA/KHA/kingbase/data/base/syssql_tmp.rewind' and all the file/dir in it.
sys_rewind: rewind start wal location 1/F2000060 (file 0000000400000001000000F2), end wal location 1/F2031590 (file 0000000500000001000000F2). time from 2021-03-01 14:06:28.539405 CST to 2021-03-01 14:06:44.221603 CST, in "15.682198" seconds.
sys_rewind: Done!
NOTICE: 0 files copied to /home/kingbase/cluster/R6HA/KHA/kingbase/data
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: begin to start server at 2021-03-01 14:06:44.800564
NOTICE: starting server using "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl  -w -t 90 -D '/home/kingbase/cluster/R6HA/KHA/kingbase/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile start"
NOTICE: start server finish at 2021-03-01 14:06:46.217825
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1

檢視叢集節點狀態：


[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+--------
 1  | node248 | primary | * running |          | default  | 100      | 5        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | standby |   running | node248  | default  | 100      | 4        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

在新主庫檢視叢集節點狀態：


[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+--------
 1  | node248 | primary | * running |          | default  | 100      | 5        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | standby |   running | node248  | default  | 100      | 4        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

檢視主備流複製狀態：

[kingbase@node1 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help.

test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_s
tart         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_la
g | replay_lag | sync_priority | sync_state |          reply_time           
-------+----------+---------+------------------+---------------+-----------------+-------
 22853 |    16384 | esrep   | node249          | 192.168.7.249 |                 |       38638 | 2021-03-01 14:07:
24.293687+08 |              | streaming | 1/F20357A8 | 1/F20357A8 | 1/F20357A8 | 1/F20357A8 |           |         
  |            |             0 | async      | 2021-03-01 14:07:57.851500+08
(1 row)

四、重新啟動叢集測試


[kingbase@node1 bin]$ ./sys_monitor.sh restart
2021-03-01 14:09:05 Ready to stop all DB ...
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
2021-03-01 14:09:10 begin to stop repmgrd on "[192.168.7.248]".
2021-03-01 14:09:11 repmgrd on "[192.168.7.248]" already stopped.
2021-03-01 14:09:11 begin to stop repmgrd on "[192.168.7.249]".
2021-03-01 14:09:11 repmgrd on "[192.168.7.249]" already stopped.
2021-03-01 14:09:11 begin to stop DB on "[192.168.7.249]".
waiting for server to shut down.... done
server stopped
2021-03-01 14:09:13 DB on "[192.168.7.249]" stop success.
2021-03-01 14:09:13 begin to stop DB on "[192.168.7.248]".
waiting for server to shut down...... done
server stopped
2021-03-01 14:09:16 DB on "[192.168.7.248]" stop success.
2021-03-01 14:09:16 Done.
2021-03-01 14:09:16 Ready to start all DB ...
2021-03-01 14:09:16 begin to start DB on "[192.168.7.248]".
waiting for server to start.... done
server started
2021-03-01 14:09:17 execute to start DB on "[192.168.7.248]" success, connect to check it.
2021-03-01 14:09:19 DB on "[192.168.7.248]" start success.
2021-03-01 14:09:19 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 14:09:21 Try to ping trusted_servers on host 192.168.7.249 ...
2021-03-01 14:09:24 begin to start DB on "[192.168.7.249]".
waiting for server to start.... done
server started
2021-03-01 14:09:25 execute to start DB on "[192.168.7.249]" success, connect to check it.
2021-03-01 14:09:26 DB on "[192.168.7.249]" start success.
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+-------
 1  | node248 | primary | * running |          | default  | 100      | 5        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | standby |   running | node248  | default  | 100      | 5        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2021-03-01 14:09:26 The primary DB is started.
2021-03-01 14:09:31 Success to load virtual ip [192.168.7.240/24] on primary host [192.168.7.248].
2021-03-01 14:09:31 Try to ping vip on host 192.168.7.248 ...
2021-03-01 14:09:33 Try to ping vip on host 192.168.7.249 ...
2021-03-01 14:09:36 begin to start repmgrd on "[192.168.7.248]".
[2021-03-01 14:09:37] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 14:09:37] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"

2021-03-01 14:09:37 repmgrd on "[192.168.7.248]" start success.
2021-03-01 14:09:37 begin to start repmgrd on "[192.168.7.249]".
[2021-03-01 14:09:00] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 14:09:00] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"

2021-03-01 14:09:38 repmgrd on "[192.168.7.249]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node248 | primary | * running |          | running | 24725 | no      | n/a                
 2  | node249 | standby |   running | node248  | running | 23587 | no      | n/a                
2021-03-01 14:09:46 Done.

檢視叢集節點狀態：


[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+--------
 1  | node248 | primary | * running |          | default  | 100      | 5        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | standby |   running | node248  | default  | 100      | 5        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

檢視主備流複製狀態：

[kingbase@node1 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help.

test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_s
tart         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_la
g | replay_lag | sync_priority | sync_state |          reply_time           
-------+----------+---------+------------------+---------------+-----------------+-------
 24269 |    16384 | esrep   | node249          | 192.168.7.249 |                 |       38644 | 2021-03-01 14:09:
25.712281+08 |              | streaming | 1/F2036C10 | 1/F2036C10 | 1/F2036C10 | 1/F2036C10 |           |         
  |            |             1 | quorum     | 2021-03-01 14:09:30.237826+08
(1 row)

檢視資料庫程序（主庫）：

[kingbase@node1 bin]$ ps -ef |grep kingbase

kingbase 23993     1  0 14:09 ?        00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/KHA/kingbase/data
kingbase 24012 23993  0 14:09 ?        00:00:00 kingbase: logger   
kingbase 24014 23993  0 14:09 ?        00:00:00 kingbase: checkpointer   
kingbase 24015 23993  0 14:09 ?        00:00:00 kingbase: background writer   
kingbase 24016 23993  0 14:09 ?        00:00:00 kingbase: walwriter   
kingbase 24017 23993  0 14:09 ?        00:00:00 kingbase: autovacuum launcher   
kingbase 24018 23993  0 14:09 ?        00:00:00 kingbase: archiver   
kingbase 24019 23993  0 14:09 ?        00:00:00 kingbase: stats collector   
kingbase 24020 23993  0 14:09 ?        00:00:00 kingbase: ksh writer   
kingbase 24021 23993  0 14:09 ?        00:00:00 kingbase: ksh collector   
kingbase 24022 23993  0 14:09 ?        00:00:00 kingbase: sys_kwr collector   
kingbase 24023 23993  0 14:09 ?        00:00:00 kingbase: logical replication launcher   
kingbase 24269 23993  0 14:09 ?        00:00:00 kingbase: walsender esrep 192.168.7.249(38644) streaming 1/F2036CF8
kingbase 24719 23993  0 14:09 ?        00:00:02 kingbase: esrep esrep 192.168.7.248(43596) idle

（備庫）：


[kingbase@node2 bin]$ ps -ef |grep kingbase

kingbase 23173     1  0 14:08 ?        00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/KHA/kingbase/data
kingbase 23185 23173  0 14:08 ?        00:00:00 kingbase: logger   
kingbase 23186 23173  0 14:08 ?        00:00:00 kingbase: startup   recovering 0000000500000001000000F2
kingbase 23195 23173  0 14:08 ?        00:00:00 kingbase: checkpointer   
kingbase 23196 23173  0 14:08 ?        00:00:00 kingbase: background writer   
kingbase 23197 23173  0 14:08 ?        00:00:00 kingbase: stats collector   
kingbase 23198 23173  0 14:08 ?        00:00:01 kingbase: walreceiver   streaming 1/F2036CF8
kingbase 23561 23173  0 14:09 ?        00:00:00 kingbase: esrep esrep 192.168.7.249(22306) idle

由上可獲知，叢集“雙主”問題解決！

kingbaseES R6 叢集“雙主”故障解決案例

案例測試環境：作業系統： [kingbase@node1 bin]$ cat /etc/centos-release CentOS Linux release 7.2.1511 (Core)

Repmgr 叢集“雙主”故障解決案例

實際工作中，可能會碰到叢集腦裂的情況，在腦裂時，會出現雙 primary情況。這時，需要使用者介入，人工判斷哪個節點的資料最新，減少資料丟失。

KingbaseES R6叢集修改data目錄測試案例

KingbaseES、repmgr、KingbaseCluster 案例說明：本案例是在部署完成KingbaseES R6集群后，由於業務的需求，叢集需要修改data（資料儲存）目錄的測試。本案例分兩種修改方式，第一種是離線修改data目錄，即關閉

KingbaseES R6叢集主機鎖衝突導致的主備切換案例

案例說明：主庫在業務高峰期間，客戶執行建表等DDL操作，主庫產生“AccessExclusiveLock ”鎖，導致大量的事務產生鎖衝突，大量的會話堆積，客戶端session訪問主庫失敗。備庫和主庫之間的PQping的心跳通訊測試也受到

KingbaseES R6叢集主庫網絡卡down測試案例

資料庫版本： test=# select version(); version ----------------------------------------------------------------------------------------------------------------------

KingbaseES R6 叢集主機鎖衝突導致的主備切換案例

案例說明：主庫在業務高峰期間，客戶執行建表等DDL操作，主庫產生“AccessExclusiveLock ”鎖，導致大量的事務產生鎖衝突，大量的會話堆積，客戶端session訪問主庫失敗。備庫和主庫之間的PQping的心跳通訊測試也受

KingbaseES R6叢集手工配置vip案例

案例環境：作業系統（UOS)： root@uos01:~# cat /etc/issue Uniontech OS Server 20 Enterprise \\n \\l

kingbaseES R6 叢集手工切換案例

kingbaseES R6叢集切換priority為0測試案例

KingbaseES、repmgr、PostgreSQL 案例說明：在一主多備的架構中，需要配置一臺備庫在主備切換時，不能選舉為主庫。對於repmgr主備切換主庫的選擇演算法如下：

KingbaseES R6叢集repmgr.conf引數'recovery'測試案例(二)

KingbaseES 、repmgr 案例二：測試‘recovery = automatic’ 1、檢視叢集節點狀態資訊：

KingbaseES R6 叢集repmgr.conf引數'recovery'測試案例(一)

KingbaseES R6叢集repmgr.conf引數\'recovery\'測試案例(一) 案例說明：在KingbaseES R6叢集中，主庫節點出現宕機（如重啟或關機），會產生主備切換，但是當主庫節點系統恢復正常後，如何對原主庫節點進行處理，保

KingbaseES R6 叢集repmgr.conf引數'recovery'測試案例(三)

案例三：測試‘recovery = manual’ 1、檢視叢集節點狀態資訊： [kingbase@node1 bin]$ ./repmgr cluster show

KingbaseES R6 叢集repmgr.conf引數'recovery'測試案例(二)

案例二：測試‘recovery = automatic’ 1、檢視叢集節點狀態資訊： [kingbase@node1 bin]$ ./repmgr cluster show

KingbaseES R6叢集備庫網絡卡down測試案例

資料庫版本： test=# select version(); version ----------------------------------------------------------------------------------------------------------------------

KingbaseES R6 叢集備庫網絡卡down測試案例

資料庫版本： test=# select version(); version ----------------------------------------------------------------------------------------------------------------------

KingbaseES R6叢集一鍵修改叢集和資料庫引數測試案例

案例說明：叢集環境修改叢集或資料庫引數，需要在每個node上都要修改，在每個節點而執行修改操作，容易出現漏改或節點上引數不一致等錯誤；在KingbaseES V8R6的叢集中增加了，一鍵修改引數的新功能，可以在一個節點

KingbaseES R6 叢集一鍵修改叢集和資料庫引數測試案例

案例說明：叢集環境修改叢集或資料庫引數，需要在每個node上都要修改，在每個節點而執行修改操作，容易出現漏改或節點上引數不一致等錯誤；在KingbaseES V8R6的叢集中增加了，一鍵修改引數的新功能，可以在一個節

KingbaseES R3叢集線上刪除資料節點案例

案例說明： kingbaseES R3叢集一主多從的架構，一般有兩個節點是叢集的管理節點，所有的節點都可以為資料節點；對於非管理節點的資料節點可以線上刪除；但是對於管理節點，無法線上刪除，如果刪除管理節點，需要重新

KingbaseES R6 叢集 recovery 引數對切換的影響

案例說明：在KingbaseES R6叢集中，主庫節點出現宕機（如重啟或關機），會產生主備切換，但是當主庫節點系統恢復正常後，如何對原主庫節點進行處理，保證叢集資料的一致性和安全，可以通過對repmgr.conf檔案中配置r

KingbaseES R6叢集通過備庫clone線上新增新節點

案例說明： KingbaseES R6叢集可以通過圖形化方式線上新增新節點，但是在新增新節點clone環節時，是從主庫copy資料到新的節點，這樣在生產環境，如果資料量大，將會對主庫的網路I/O造成壓力。可以通過‘repmgr stand

kingbaseES R6 叢集“雙主”故障解決案例

相關推薦