KingbaseES R3 叢集刪除test庫導致主備無法切換問題

阿新 • • 發佈：2022-01-07

案例說明：
在KingbaseES R3叢集中，kingbasecluster程序會通過test庫訪問，連線後臺資料庫服務測試；如果刪除test資料庫，導致後臺資料庫服務訪問失敗，在叢集主備切換時，無法訪問後臺資料庫服務，導致切換失敗。修改叢集HAmodule.conf配置檔案相關引數後，可以解決叢集test庫被刪除導致主備切換失敗問題。

測試資料庫版本：

prod=# select version();
                                                         version                                                         
-------------------------------------------------------------------------------------------------------------------------
 Kingbase V008R003C002B0270 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

一、檢視叢集訪問test庫配置

[kingbase@node1 etc]$ cat HAmodule.conf |grep -i test
#database instance built-in database.example:KB_DATANAME="TEST"
KB_DATANAME="TEST"

二、檢視kingbase_monitor.sh訪問test庫資訊

=可以從kingbase_monitor.sh start的啟動過程，看到對test庫的訪問=

[kingbase@node1 bin]$ sh -x kingbase_monitor.sh restart > ~/kmon.txt

[kingbase@node1 ~]$ cat kmon.txt |grep -i test
+ param='KB_DATANAME="TEST"'
+ paramValue='"TEST"'
+ '[' -z '"TEST"' ']'
+ eval 'KB_DATANAME="TEST"'
++ KB_DATANAME=TEST
++ /home/kingbase/cluster/kha/db/bin/ksql 'host=192.168.7.248 port=54321 user=SUPERMANAGER_V8ADMIN password=KINGBASEADMIN dbname=TEST connect_timeout=10' -Aqtc 'select count(*)=1 from sys_stat_replication;'
++ /home/kingbase/cluster/kha/db/bin/ksql 'host=192.168.7.248 port=54321 user=SUPERMANAGER_V8ADMIN password=KINGBASEADMIN dbname=TEST connect_timeout=10' -Aqtc 'select sys_xlog_location_diff(sys_current_xlog_flush_location(), write_location)<=16777216 from sys_stat_replication;'

二、叢集刪除test庫測試（主庫）

# 檢視database cluster
[kingbase@node3 bin]$ ./ksql -U system -W 123456 prod
ksql (V008R003C002B0270)
Type "help" for help.

prod=# \l
                               List of databases
   Name    | Owner  | Encoding |   Collate   |    Ctype    | Access privileges  
-----------+--------+----------+-------------+-------------+--------------------
 prod      | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 SAMPLES   | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 SECURITY  | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 TEMPLATE0 | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/system         +
           |        |          |             |             | system=CTcb/system
 TEMPLATE1 | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/system         +
           |        |          |             |             | system=CTcb/system
 TEMPLATE2 | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/system        +
           |        |          |             |             | system=CTcb/system
 TEST      | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
(7 rows)


# 主庫判斷
prod=# select sys_is_in_recovery();
 sys_is_in_recovery 
--------------------
 f
(1 row)

# 檢視流複製狀態
prod=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |  
 state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+--

 25795 |       10 | system  | node243          | 192.168.7.243 |                 |       45418 | 2021-03-01 12:49:12.263710+08 |              | s
treaming | 0/E0001B0     | 0/E0001B0      | 0/E0001B0      | 0/E000178       |             0 | async
(1 row)


# 主庫刪除test庫：
prod=# drop database test;
DROP DATABASE
prod=# \l
                               List of databases
   Name    | Owner  | Encoding |   Collate   |    Ctype    | Access privileges  
-----------+--------+----------+-------------+-------------+--------------------
 prod      | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 SAMPLES   | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 SECURITY  | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 TEMPLATE0 | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/system         +
           |        |          |             |             | system=CTcb/system
 TEMPLATE1 | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/system         +
           |        |          |             |             | system=CTcb/system
 TEMPLATE2 | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/system        +
           |        |          |             |             | system=CTcb/system
(6 rows)


# 備庫檢視：
[kingbase@node3 bin]$ ./ksql -U system -W 123456 prod
ksql (V008R003C002B0270)
Type "help" for help.

prod=# \l
                               List of databases
   Name    | Owner  | Encoding |   Collate   |    Ctype    | Access privileges  
-----------+--------+----------+-------------+-------------+--------------------
 prod      | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 SAMPLES   | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 SECURITY  | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 TEMPLATE0 | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/system         +
           |        |          |             |             | system=CTcb/system
 TEMPLATE1 | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/system         +
           |        |          |             |             | system=CTcb/system
 TEMPLATE2 | system | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/system        +
           |        |          |             |             | system=CTcb/system
(6 rows)

三、主備failover切換測試

1）關閉主庫資料庫服務

[kingbase@node1 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down....... done
server stopped

2）檢視日誌

備庫cluster.log：(從日誌可以獲知，備庫訪問主庫後臺資料庫服務失敗，已經發起了failover的切換）

......
2021-03-01 12:41:16: pid 14431: LOG:  health checking retry count 1
2021-03-01 12:41:16: pid 14431: LOG:  failed to connect to kingbase server on "192.168.7.248:54321", getsockopt() detected error "Connection refused"
2021-03-01 12:41:16: pid 14431: ERROR:  failed to make persistent db connection
2021-03-01 12:41:16: pid 14431: DETAIL:  connection to host:"192.168.7.248:54321" failed
2021-03-01 12:41:18: pid 14644: LOG:  watchdog checking if kingbasecluster is alive using heartbeat
2021-03-01 12:41:18: pid 14644: DETAIL:  the last heartbeat from "192.168.7.248:9999" received 0 seconds ago
......

2021-03-01 12:41:22: pid 14473: LOG:  received the failover command lock request from local kingbasecluster on IPC interface
2021-03-01 12:41:22: pid 14473: LOG:  local kingbasecluster node "192.168.7.243:9999 Linux node3" is requesting to become a lock holder for failover ID: 69
2021-03-01 12:41:22: pid 14473: LOG:  local kingbasecluster node "192.168.7.243:9999 Linux node3" is the lock holder
2021-03-01 12:41:22: pid 14431: LOG:  starting degeneration. shutdown host 192.168.7.248(54321)
2021-03-01 12:41:22: pid 14431: LOG:  Restart all children
2021-03-01 12:41:22: pid 14431: LOG:  execute command: /home/kingbase/cluster/kha/kingbasecluster/bin/failover_stream.sh 192.168.7.243 1 1 192.168.7.248 192.168.7.248 0 0 /home/kingbase/cluster/kha/db/data
......

3）檢視備庫資料庫服務（切換失敗）

=備庫資料庫服務仍然啟動recovery，還是在備庫狀態=

[kingbase@node3 log]$ ps -ef |grep kingbase

kingbase 13764     1  0 12:31 ?        00:00:01 /home/kingbase/cluster/kha/db/bin/kingbase -D /home/kingbase/cluster/kha/db/data
kingbase 13781 13764  0 12:31 ?        00:00:00 kingbase: logger process   
kingbase 13782 13764  0 12:31 ?        00:00:00 kingbase: startup process   recovering 00000001000000000000000F
kingbase 13786 13764  0 12:31 ?        00:00:00 kingbase: checkpointer process   
kingbase 13787 13764  0 12:31 ?        00:00:00 kingbase: writer process   
kingbase 13788 13764  0 12:31 ?        00:00:00 kingbase: stats collector process   
.......

檢視備庫failover.log：

=== 有以下日誌資訊獲知，在切換過程中需要訪問test庫，而test庫被刪除，導致訪問失敗，主備切換不成功===

-----------------2021-03-01 12:41:22 failover beging---------------------------------------
----failover-stats is %H = hostname of the new master node [192.168.7.243], %P = old primary node id [1], %d = node id[1], %h = host name [192.168.7.248], %O = old primary host[192.168.7.248] %m = new master node id [0], %M = old master node id [0], %D = database cluster path [/home/kingbase/cluster/kha/db/data].
----ping trust ip 
ping trust ip 192.168.7.1 success 
----determine whether the faulty db is master or standby 
master down, let 192.168.7.243 become new primary.....
 2021-03-01 12:41:24 del old primary VIP on 192.168.7.248
ssh connect host:192.168.7.248 success, will stop old primary db and del the vip
stop the old primary db
DEL VIP NOW AT 2021-03-01 12:58:53 ON enp0s3
execute: [/sbin/ip addr del 192.168.7.245/24 dev enp0s3]
Oprate del ip cmd end.
2021-03-01 12:41:24 add VIP on 192.168.7.243
sys_ctl: PID file "/home/kingbase/cluster/kha/db/data/kingbase.pid" does not exist
Is server running?
ADD VIP NOW AT 2021-03-01 12:41:25 ON enp0s3
execute: [/sbin/ip addr add 192.168.7.245/24 dev enp0s3 label enp0s3:2]
execute: /home/kingbase/cluster/kha/db/bin/arping -U 192.168.7.245 -I enp0s3 -w 1
ARPING 192.168.7.245 from 192.168.7.245 enp0s3
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
2021-03-01 12:41:26 promote begin...let 192.168.7.243 become master
check db if is alive 
ksql "port=54321 user=SUPERMANAGER_V8ADMIN  dbname=TEST connect_timeout=10"  -c "select 33333;" 
ksql: FATAL:  database "TEST" does not exist
kingbase is down,retry check db is if alive,retry times:[1/3]
before promote query detail[] , try again!
ksql "port=54321 user=SUPERMANAGER_V8ADMIN  dbname=TEST connect_timeout=10"  -c "select 33333;" 
ksql: FATAL:  database "TEST" does not exist
kingbase is down,retry check db is if alive,retry times:[2/3]
before promote query detail[] , try again!
ksql "port=54321 user=SUPERMANAGER_V8ADMIN  dbname=TEST connect_timeout=10"  -c "select 33333;" 
ksql: FATAL:  database "TEST" does not exist
kingbase is down,retry check db is if alive,retry times:[3/3]
before promote query detail[] , try again!
kingbase is down,after retry 3 times ,cannot do promote, will exit
 execute kingbase_promote.sh failed ,will exit script with error
"ssh -o StrictHostKeyChecking=no -l kingbase -T 192.168.7.243 "/home/kingbase/cluster/kha/db/bin/kingbase_promote.sh /home/kingbase/cluster/kha/db/bin  SUPERMANAGER_V8ADMIN TEST 54321 /home/kingbase/cluster/kha/db/data 3 3 2>&1"" execute failed, error num=[66]

四、修改HAmodule.conf配置檔案引數（所有節點）

=將訪問test庫更改為template2庫=

[kingbase@node1 etc]$ pwd
/home/kingbase/cluster/kha/kingbasecluster/etc
[kingbase@node1 etc]$ cat HAmodule.conf |grep -i temp
KB_DATANAME="TEMPLATE2"

[kingbase@node1 etc]$ cd ../../db/etc/
[kingbase@node1 etc]$ cat HAmodule.conf |grep -i temp
KB_DATANAME="TEMPLATE2"

五、重新啟動叢集

** 1）重啟叢集**

[kingbase@node1 bin]$ ./kingbase_monitor.sh restart
-----------------------------------------------------------------------
2021-03-01 13:18:35 KingbaseES automation beging...
2021-03-01 13:18:35 stop kingbasecluster [192.168.7.243] ...
remove status file  /home/kingbase/cluster/kha/run/kingbasecluster/kingbasecluster_status
DEL VIP NOW AT 2021-03-01 13:01:11 ON enp0s3
No VIP on my dev, nothing to do.
2021-03-01 13:18:41 Done...
2021-03-01 13:18:41 stop kingbasecluster [192.168.7.248] ...
remove status file  /home/kingbase/cluster/kha/run/kingbasecluster/kingbasecluster_status
DEL VIP NOW AT 2021-03-01 13:18:46 ON enp0s3
No VIP on my dev, nothing to do.
2021-03-01 13:18:47 Done...
2021-03-01 13:18:47 stop kingbase [192.168.7.243] ...
set /home/kingbase/cluster/kha/db/data down now...
2021-03-01 13:18:50 Done...
2021-03-01 13:18:51 Del kingbase VIP [192.168.7.245/24] ...
DEL VIP NOW AT 2021-03-01 13:01:22 ON enp0s3
execute: [/sbin/ip addr del 192.168.7.245/24 dev enp0s3]
Oprate del ip cmd end.
2021-03-01 13:18:51 Done...
2021-03-01 13:18:51 stop kingbase [192.168.7.248] ...
set /home/kingbase/cluster/kha/db/data down now...
2021-03-01 13:18:56 Done...
2021-03-01 13:18:57 Del kingbase VIP [192.168.7.245/24] ...
DEL VIP NOW AT 2021-03-01 13:18:57 ON enp0s3
No VIP on my dev, nothing to do.
2021-03-01 13:18:57 Done...
......................
all stop..
ping trust ip 192.168.7.1 success ping times :[3], success times:[2]
ping trust ip 192.168.7.1 success ping times :[3], success times:[2]
start crontab kingbase position : [3]
Redirecting to /bin/systemctl restart  crond.service
start crontab kingbase position : [2]
Redirecting to /bin/systemctl restart  crond.service
ADD VIP NOW AT 2021-03-01 13:19:10 ON enp0s3
execute: [/sbin/ip addr add 192.168.7.245/24 dev enp0s3 label enp0s3:2]
execute: /home/kingbase/cluster/kha/db/bin/arping -U 192.168.7.245 -I enp0s3 -w 1
ARPING 192.168.7.245 from 192.168.7.245 enp0s3
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
ping vip 192.168.7.245 success ping times :[3], success times:[2]
ping vip 192.168.7.245 success ping times :[3], success times:[2]
wait kingbase recovery 5 sec...
start crontab kingbasecluster line number: [6]
Redirecting to /bin/systemctl restart  crond.service
start crontab kingbasecluster line number: [3]
Redirecting to /bin/systemctl restart  crond.service
......................
all started..
...
now we check again
=======================================================================
|             ip |                       program|              [status] 
[  192.168.7.243]|             [kingbasecluster]|              [active]
[  192.168.7.248]|             [kingbasecluster]|              [active]
[  192.168.7.243]|                    [kingbase]|              [active]
[  192.168.7.248]|                    [kingbase]|              [active]
=======================================================================

2）檢視叢集節點狀態

[kingbase@node1 bin]$ ./ksql -U SYSTEM -W 123456 prod -p 9999
ksql (V008R003C002B0270)
Type "help" for help.

prod=# show pool_nodes;
 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.7.243 | 54321 | up     | 0.500000  | standby | 0          | true              | 0
 1       | 192.168.7.248 | 54321 | up     | 0.500000  | primary | 0          | false             | 0
(2 rows)

prod=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   
state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+---
--------+---------------+----------------+----------------+-----------------+---------------+------------
 5539 |       10 | system  | node243          | 192.168.7.243 |                 |       47872 | 2021-03-01 13:19:09.653137+08 |              | st
reaming | 0/100000D0    | 0/100000D0     | 0/100000D0     | 0/100000D0      |             0 | async
(1 row)


[kingbase@node1 bin]$ ./ksql -U SYSTEM -W 123456 prod 
ksql (V008R003C002B0270)
Type "help" for help.

prod=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   
state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+---

 5539 |       10 | system  | node243          | 192.168.7.243 |                 |       47872 | 2021-03-01 13:19:09.653137+08 |              | st
reaming | 0/100000D0    | 0/100000D0     | 0/100000D0     | 0/100000D0      |             0 | async
(1 row)

六、再次執行主備切換測試

1）停止主庫資料庫服務

[kingbase@node1 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down....... done
server stopped

2、檢視備庫資料庫服務程序（切換成功）

[kingbase@node3 bin]$ ps -ef|grep kingbase

kingbase 22541     1  0 13:01 ?        00:00:00 /home/kingbase/cluster/kha/db/bin/kingbase -D /home/kingbase/cluster/kha/db/data
kingbase 22552 22541  0 13:01 ?        00:00:00 kingbase: logger process   
kingbase 22563 22541  0 13:01 ?        00:00:00 kingbase: checkpointer process   
kingbase 22564 22541  0 13:01 ?        00:00:00 kingbase: writer process   
kingbase 22565 22541  0 13:01 ?        00:00:00 kingbase: stats collector process   
root     23208     1  0 13:01 ?        00:00:00 ./kingbasecluster -n
root     23255 23208  0 13:01 ?        00:00:00 kingbasecluster: watchdog
root     23421 23208  0 13:02 ?        00:00:00 kingbasecluster: lifecheck
root     23423 23421  0 13:02 ?        00:00:00 kingbasecluster: heartbeat receiver
root     23424 23421  0 13:02 ?        00:00:00 kingbasecluster: heartbeat sender
kingbase 24571 22541  0 13:05 ?        00:00:00 kingbase: wal writer process   
kingbase 24572 22541  0 13:05 ?        00:00:00 kingbase: autovacuum launcher process   
kingbase 24573 22541  0 13:05 ?        00:00:00 kingbase: archiver process   last was 00000002.history
kingbase 24574 22541  0 13:05 ?        00:00:00 kingbase: bgworker: syslogical supervisor

檢視failover.log日誌：

[kingbase@node3 log]$ tail -100 failover.log 


-----------------2021-03-01 13:05:12 failover beging---------------------------------------
----failover-stats is %H = hostname of the new master node [192.168.7.243], %P = old primary node id [1], %d = node id[1], %h = host name [192.168.7.248], %O = old primary host[192.168.7.248] %m = new master node id [0], %M = old master node id [0], %D = database cluster path [/home/kingbase/cluster/kha/db/data].
----ping trust ip 
ping trust ip 192.168.7.1 success 
----determine whether the faulty db is master or standby 
master down, let 192.168.7.243 become new primary.....
 2021-03-01 13:05:14 del old primary VIP on 192.168.7.248
ssh connect host:192.168.7.248 success, will stop old primary db and del the vip
stop the old primary db
sys_ctl: PID file "/home/kingbase/cluster/kha/db/data/kingbase.pid" does not exist
Is server running?
DEL VIP NOW AT 2021-03-01 13:22:43 ON enp0s3
execute: [/sbin/ip addr del 192.168.7.245/24 dev enp0s3]
Oprate del ip cmd end.
2021-03-01 13:05:14 add VIP on 192.168.7.243
ADD VIP NOW AT 2021-03-01 13:05:15 ON enp0s3
execute: [/sbin/ip addr add 192.168.7.245/24 dev enp0s3 label enp0s3:2]
execute: /home/kingbase/cluster/kha/db/bin/arping -U 192.168.7.245 -I enp0s3 -w 1
ARPING 192.168.7.245 from 192.168.7.245 enp0s3
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
2021-03-01 13:05:16 promote begin...let 192.168.7.243 become master
check db if is alive 
ksql "port=54321 user=SUPERMANAGER_V8ADMIN  dbname=TEMPLATE2 connect_timeout=10"  -c "select 33333;" 
2021-03-01 13:05:17 kingbase is ok , to prepare execute promote
execute promote
server promoting
check db if is alive after promote 
ksql "port=54321 user=SUPERMANAGER_V8ADMIN  dbname=TEMPLATE2 connect_timeout=10"  -c "select 33333;" 
2021-03-01 13:05:17 after execute promote , kingbase status is ok.
after execute promote, kingbase is ok.
2021-03-01 13:05:17 sync to async
ALTER SYSTEM
 sys_reload_conf 
-----------------
 t
(1 row)

2021-03-01 13:05:17 make checkpoint
check the db to see if it is alive
ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEMPLATE2 connect_timeout=10"  -c "select 33333;"
2021-03-01 13:05:18 kingbase is ok , to prepare execute checkpoint
execute checkpoint
CHECKPOINT
check the db to see if it is alive after execute checkpoint 
ksql "port=54321 user=SUPERMANAGER_V8ADMIN  dbname=TEMPLATE2 connect_timeout=10"  -c "select 33333;" 
2021-03-01 13:05:18 after execute checkpoint, kingbase is ok.
after execute checkpoint, kingbase is ok.

七、將原主庫恢復為備庫

1）在新主庫上建立replication slots。

2）在原主庫下建立recovery.conf檔案後，sys_ctl手工啟動資料庫服務。

3）檢查主備流複製狀態。

4）重新啟動叢集測試。

八、總結

對於KingbaseES R3叢集的test庫，多用於kingbasecluster和後臺資料庫服務的健康檢查訪問，請不要輕易刪除。

九、附件

案例：在沒有在原主庫data目錄下建立recovery.conf檔案，啟動主庫資料庫服務。後建立了recovery.conf,再啟動原主庫以備庫加入流複製失敗，因為timeline與新主庫不一致。採用sys_rewind工具重新將原主庫加入叢集。

1）檢視原主庫sys_log日誌

=如下所示，備庫資料庫服務啟動失敗，因為timeline與新主庫不一致=

[kingbase@node1 sys_log]$ tail -100f kingbase-2021-03-01_132943.log
LOG:  database system was shut down in recovery at 2021-03-01 13:29:42 CST
LOG:  entering standby mode
FATAL:  requested timeline 2 is not a child of this server's history
DETAIL:  Latest checkpoint is at 0/12000028 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 0/11000098.
LOG:  startup process (PID 10918) exited with exit code 1
LOG:  aborting startup due to startup process failure
LOG:  database system is shut down

2）在原主庫執行sys_rewind加入叢集

[kingbase@node1 bin]$ ./sys_rewind -D /home/kingbase/cluster/kha/db/data --source-server='host=192.168.7.243 port=54321 user=system dbname=PROD' -P -n
connected to server
datadir_source = /home/kingbase/cluster/kha/db/data
rewinding from last common checkpoint at 0/10000028 on timeline 1
find last common checkpoint start time from 2021-03-01 13:36:24.675116 CST to 2021-03-01 13:36:24.702727 CST, in "0.027611" seconds.
reading source file list
reading target file list
reading WAL in target
need to copy 298 MB (total source directory size is 363 MB)
Rewind datadir file from source
Get archive xlog list from source
Rewind archive log from source
 59462/305222 kB (19%) copied
creating backup label and updating control file
syncing target data directory
rewind start wal location 0/10000028 (file 000000010000000000000010), end wal location 0/11070290 (file 000000020000000000000011). time from 2021-03-01 13:36:25.675116 CST to 2021-03-01 13:36:25.379386 CST, in "0.704270" seconds.
Done!

3）啟動備庫資料庫服務

[kingbase@node1 bin]$ ./sys_ctl start -D ../data
server starting
[kingbase@node1 bin]$ LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "/data/kingbase/cluster/r3/data/sys_log".


[kingbase@node1 bin]$ ps -ef|grep kingbase
.......
kingbase 11983 13899  0 13:31 pts/1    00:00:00 tail -100f kingbase-2021-03-01_132943.log
kingbase 13611     1  0 13:36 pts/0    00:00:00 /home/kingbase/cluster/kha/db/bin/kingbase -D ../data
kingbase 13614 13611  0 13:36 ?        00:00:00 kingbase: logger process   
kingbase 13615 13611  0 13:36 ?        00:00:00 kingbase: startup process   recovering 000000020000000000000011
kingbase 13619 13611  0 13:36 ?        00:00:00 kingbase: checkpointer process   
kingbase 13620 13611  0 13:36 ?        00:00:00 kingbase: writer process   
kingbase 13621 13611  0 13:36 ?        00:00:00 kingbase: wal receiver process   streaming 0/11070EB8
kingbase 13622 13611  0 13:36 ?        00:00:00 kingbase: stats collector process

4）查詢叢集節點狀態

# 主庫查詢：

prod=#  select * from sys_replication_slots;      
  slot_name   | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
--------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
 slot_node248 |        | physical  |        |          | t      |      27104 | 2110 |              | 0/11071688  | 
 slot_node243 |        | physical  |        |          | f      |            |      |              |             | 
(2 rows)

[kingbase@node3 bin]$ ./ksql -U SYSTEM -W 123456 prod -p 9999
ksql (V008R003C002B0270)
Type "help" for help.

prod=# select * from sys_stat_replication ;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |  
 state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+--

 27104 |       10 | system  | node248          | 192.168.7.248 |                 |       22355 | 2021-03-01 13:19:11.376063+08 |              | s
treaming | 0/11070FD0    | 0/11070FD0     | 0/11070FD0     | 0/11070FD0      |             0 | async
(1 row)

prod=# show pool_nodes;
 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.7.243 | 54321 | up     | 0.500000  | primary | 1          | true              | 0
 1       | 192.168.7.248 | 54321 | up   | 0.500000  | standby | 0          | false             | 0
(2 rows)

KINGBASE研究院

KingbaseES R3 叢集刪除test庫導致主備無法切換問題

KingbaseES R3 叢集刪除test庫導致主備無法切換問題

KingbaseES R3叢集線上刪除資料節點案例

KingbaseES R3叢集備庫執行sys_backup.sh備份案例

KingbaseES R3叢集備庫執行sys_backup.sh物理備份案例

KingbaseES R6叢集主機鎖衝突導致的主備切換案例

KingbaseES R6 叢集主機鎖衝突導致的主備切換案例

KingbaseES R3叢集開啟受限dba主備切換測試

kingbaseES R3叢集防火牆配置案例

kingbaseES R3叢集修改system使用者密碼方案V3.0

KingbaseES R3叢集cluster日誌切割和清理案例

KingbaseES R3 叢集cluster日誌切割和清理案例

kingbaseES R3叢集 SSL 配置測試案例

kingbaseES R3 叢集配置 SSL

【轉】 mysql+mycat搭建穩定高可用叢集，負載均衡，主備複製，讀寫分離

KingbaseES R6叢集主庫網絡卡down測試案例

kingbaseES R6 叢集“雙主”故障解決案例

KingbaseES V8R6 手工建立主備流複製叢集案例

KingbaseES R6叢集通過備庫clone線上新增新節點

KingbaseES R6叢集備庫網絡卡down測試案例

KingbaseES R6 叢集備庫網絡卡down測試案例

KingbaseES R3 叢集刪除test庫導致主備無法切換問題

相關推薦