KingbaseES V8R6 叢集環境wal日誌清理
案例說明:
1、對於叢集中的wal日誌,除了需要在備庫執行recovery外,在叢集主備切換(switchover或failover)時,sys_rewind都要讀取wal日誌,將資料庫恢復到一致性狀態。
2、對於叢集主備庫中的wal日誌,在清理時,經過測試,理論上在checkpoint所在的wal日誌之前的都可以清理,但這是比較理想的狀態,在生產環境中,建議保留3天到一個星期的wal日誌,避免因為主備延遲,導致在叢集切換時,因為缺少wal日誌失敗。
3、對於KingbaseES V8R6的叢集,如果在主備庫上通過sys_backup.sh工具建立了備份,歸檔日誌會自動備份,應該也會隨著歷史備份的清理,自動被清理。如果節點沒有建立sys_backup.sh的備份,可以通過 sys_archivecleanup工具清理,原則也是在生產環境中,建議保留3天到一個星期的歸檔日誌。
資料庫版本:
test=# select version; version ------------------------------------------------------------------------------------------------------------------ KingbaseES V008R006C005B0023 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit (1 row)
叢集節點資訊:
[kingbase@node1 bin]$ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.8.200 node1 #叢集節點node200 192.168.8.201 node2 #叢集節點node201 ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+---------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node200 | primary | * running | | running | 29303 | no | n/a 2 | node201 | standby | running | node200 | running | 29748 | no | 1 second(s) ago
一、叢集switchover切換測試
1、檢視主備庫控制檔案資訊
1)主庫控制檔案
[kingbase@node1 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202110271
Database system identifier: 7094057752387829054
Database cluster state: in production
sys_control last modified: Tue 10 May 2022 12:33:09 PM CST
Latest checkpoint location: 1/29001768
Latest checkpoint's REDO location: 1/29001738
Latest checkpoint's REDO WAL file: 000000030000000100000029
Latest checkpoint's TimeLineID: 3
2)備庫控制檔案
[kingbase@node2 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202110271
Database system identifier: 7094057752387829054
Database cluster state: in archive recovery
sys_control last modified: Thu 19 May 2022 12:05:06 PM CST
Latest checkpoint location: 1/29001768
Latest checkpoint's REDO location: 1/29001738
Latest checkpoint's REDO WAL file: 000000030000000100000029
Latest checkpoint's TimeLineID: 3
2、清理wal日誌(將主備庫日誌都只保留checkpoint所在的wal日誌檔案(包括)及以後的)
# 主庫保留wal日誌
[kingbase@node1 sys_wal]$ ls -lh
total 49M
-rw-------. 1 kingbase kingbase 16M May 10 13:19 000000030000000100000029
-rw-------. 1 kingbase kingbase 16M May 10 13:19 00000003000000010000002A
-rw-------. 1 kingbase kingbase 16M May 10 13:23 00000003000000010000002B
-rw-------. 1 kingbase kingbase 85 May 18 11:28 00000003.history
drwx------. 2 kingbase kingbase 24K May 10 13:19 archive_status
drwxrwxr-x. 2 kingbase kingbase 4.0K May 19 12:58 log_bk
# 備庫保留wal日誌
[kingbase@node2 sys_wal]$ ls -lh
total 49M
-rw------- 1 kingbase kingbase 16M May 19 12:51 000000030000000100000029
-rw------- 1 kingbase kingbase 16M May 19 12:51 00000003000000010000002A
-rw------- 1 kingbase kingbase 16M May 19 12:55 00000003000000010000002B
-rw------- 1 kingbase kingbase 85 May 18 11:28 00000003.history
drwx------ 2 kingbase kingbase 12K May 19 12:51 archive_status
drwxrwxr-x 2 kingbase kingbase 4.0K May 19 13:00 log_bk
3、執行repmgr standby switchover
1)檢視當前叢集狀態資訊
[kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------
1 | node200 | primary | * running | | default | 100 | 3 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | standby | running | node200 | default | 100 | 3 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2)執行switchover
[kingbase@node2 bin]$ ./repmgr standby switchover -h 192.168.8.200 -U esrep -d esrep
WARNING: following problems with command line parameters detected:
......
INFO: unpause node "node201" (ID 2) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully
[kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------
1 | node200 | standby | running | node201 | default | 100 | 3 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | primary | * running | | default | 100 | 4 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
執行回切測試:
[kingbase@node1 bin]$ ./repmgr standby switchover -h 192.168.8.201 -U esrep -d esrep
WARNING: following problems with command line parameters detected:
INFO: unpause node "node201" (ID 2) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully
[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------
1 | node200 | primary | * running | | default | 100 | 5 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | standby | running | node200 | default | 100 | 4 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
=== 如上所示,switchover切換成功!====
二、叢集 failover 切換測試
1、檢視當前叢集狀態資訊
[kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------
1 | node200 | standby | running | node201 | default | 100 | 5 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | primary | * running | | default | 100 | 6 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2、檢視主備庫控制檔案資訊
# 主庫:
[kingbase@node2 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202110271
Database system identifier: 7094057752387829054
Database cluster state: in production
sys_control last modified: Thu 19 May 2022 01:26:08 PM CST
Latest checkpoint location: 1/409BA150
Latest checkpoint's REDO location: 1/3EADD130
Latest checkpoint's REDO WAL file: 00000006000000010000003E
# 備庫:
[kingbase@node1 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202110271
Database system identifier: 7094057752387829054
Database cluster state: in archive recovery
sys_control last modified: Thu 19 May 2022 01:22:19 PM CST
Latest checkpoint location: 1/37000028
Latest checkpoint's REDO location: 1/37000028
Latest checkpoint's REDO WAL file: 000000050000000100000037
3、清理主備庫wal日誌(將主備庫日誌都只保留checkpoint所在的wal日誌檔案(包括)及以後的)
# 主庫保留wal日誌
[kingbase@node2 sys_wal]$ ls -lh
total 65M
-rw------- 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003E
-rw------- 1 kingbase kingbase 16M May 19 13:26 00000006000000010000003F
-rw------- 1 kingbase kingbase 16M May 19 13:26 000000060000000100000040
-rw------- 1 kingbase kingbase 16M May 19 13:26 000000060000000100000041
-rw------- 1 kingbase kingbase 214 May 19 13:18 00000006.history
drwx------ 2 kingbase kingbase 16K May 19 13:26 archive_status
drwxrwxr-x 2 kingbase kingbase 4.0K May 19 13:30 log_bk
# 備庫保留wal日誌
[kingbase@node1 sys_wal]$ ls -lh
total 193M
-rw-------. 1 kingbase kingbase 16M May 19 13:17 000000050000000100000037
-rw-------. 1 kingbase kingbase 171 May 19 13:03 00000005.history
-rw-------. 1 kingbase kingbase 16M May 19 13:23 000000060000000100000037
-rw-------. 1 kingbase kingbase 16M May 19 13:24 000000060000000100000038
-rw-------. 1 kingbase kingbase 16M May 19 13:24 000000060000000100000039
-rw-------. 1 kingbase kingbase 16M May 19 13:24 00000006000000010000003A
-rw-------. 1 kingbase kingbase 16M May 19 13:24 00000006000000010000003B
-rw-------. 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003C
-rw-------. 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003D
-rw-------. 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003E
-rw-------. 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003F
-rw-------. 1 kingbase kingbase 16M May 19 13:26 000000060000000100000040
-rw-------. 1 kingbase kingbase 16M May 19 13:26 000000060000000100000041
-rw-------. 1 kingbase kingbase 214 May 19 13:21 00000006.history
drwx------. 2 kingbase kingbase 24K May 19 13:26 archive_status
drwxrwxr-x. 2 kingbase kingbase 4.0K May 19 13:28 log_bk
4、執行failover切換測試
1)關閉主庫資料庫服務
[kingbase@node2 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down....... done
server stopped
2)檢視切換結果
[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node200 | primary | * running | | default | 100 | 7 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | standby | running | | default | 100 | 6 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
=== 如上所示,failover切換成功!====
三、總結
手工清理wal日誌,請參考《KingbaseES 單例項環境wal(xlog)日誌清理案例》
https://www.cnblogs.com/kingbase/p/16263467.html