KingbaseES V8R6 單例項sys_backup.sh外部備份故障案例
阿新 • • 發佈:2022-06-01
案例說明:
在KingbaseES V8R6單例項環境,配置外部備份伺服器使用sys_backup.sh物理備份時,出現以下”WAL segment xxx was not archived before the 60000ms timeout“故障。操作步驟見:《KingbaseES V8R6 單例項sys_backup.sh外部備份案例》https://note.youdao.com/s/Oxf3c7Ee
-
故障現象:
-
故障分析:
1、檢視資料庫節點下sys_log日誌
2022-05-30 23:18:57.147 CST [21529] LOG: archive command failed with exit code 55 2022-05-30 23:18:57.147 CST [21529] DETAIL: The failed archive command was: export TZ=Asia/Shanghai;/opt/Kingbase/ES/V8R6_054/Server/bin/sys_rman --config /home/kingbase/kbbr1_repo/sys_rman.conf --stanza=kingbase archive-push sys_wal/00000001000000000000001C 2022-05-30 23:18:57.147 CST [21529] WARNING: archiving write-ahead log file "00000001000000000000001C" failed too many times, will try again later
2、提取日誌歸檔資訊手工執行歸檔
[kingbase@node1 data]$ /opt/Kingbase/ES/V8R6_054/Server/bin/sys_rman --config /home/kingbase/kbbr1_repo/sys_rman.conf --stanza=kingbase archive-push sys_wal/00000001000000000000001C 2022-05-30 23:23:32.207 P00 INFO: archive-push command begin 2.27: [sys_wal/00000001000000000000001C] --config=/home/kingbase/kbbr1_repo/sys_rman.conf --exec-id=31480-4e35a388 --kb1-path=/data/kingbase/v8r6_054/data --log-level-console=info --log-level-file=info --log-path=/opt/Kingbase/ES/V8R6_054/Server/log --log-subprocess --repo1-host=192.168.8.100 --repo1-host-config=/home/kingbase/kbbr1_repo/sys_rman.conf --repo1-host-user=kingbase --repo1-path=/home/kingbase/kbbr1_repo --stanza=kingbase ERROR: [103]: unable to find a valid repository: repo1: [UnknownError] remote-0 process on '192.168.8.100' terminated unexpectedly [127]: bash: /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/sys_rman: No such file or directory 2022-05-30 23:23:32.461 P00 INFO: archive-push command end: aborted with exception [103]
3、檢視備份伺服器路徑資訊
=從備份伺服器查詢到,原路徑資訊“/opt/Kingbase/ES/V8R6_054/Server",由於此版本的資料庫軟體安裝儲存路徑使用了符號連結(如下圖所示),導致在repo節點配置檔案路徑時,和資料庫節點的儲存路徑不一致,在repo節點讀取檔案失敗。=
資料節點檔案儲存路徑:
-
故障解決:
1、在repo節點上重新建立儲存目錄
[kingbase@srv01 bin]$ mkdir -p /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server [kingbase@srv01 V8R6_054]$ cd /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/ [kingbase@srv01 Server]$ ls -lh total 338M drwxr-xr-x 2 kingbase kingbase 4.0K May 31 14:20 bin -rw------- 1 kingbase kingbase 338M May 31 14:14 db.zip drwxrwxr-x 5 kingbase kingbase 8.0K Apr 7 16:17 lib drwx------ 2 kingbase kingbase 100 May 31 14:23 log drwxrwxr-x 8 kingbase kingbase 4.0K Apr 7 16:17 share
2、編輯sys_backup.conf檔案
3、重新執行初始化
4、 重新執行手工歸檔(歸檔成功)
[kingbase@node1 data]$ /opt/Kingbase/ES/V8R6_054/Server/bin/sys_rman --config /home/kingbase/kbbr1_repo/sys_rman.conf --stanza=kingbase archive-push sys_wal/00000001000000000000001C
2022-05-30 23:26:17.480 P00 INFO: archive-push command begin 2.27: [sys_wal/00000001000000000000001C] --config=/home/kingbase/kbbr1_repo/sys_rman.conf --exec-id=32254-8f3f05e4 --kb1-path=/data/kingbase/v8r6_054/data --log-level-console=info --log-level-file=info --log-path=/opt/Kingbase/ES/V8R6_054/Server/log --log-subprocess --repo1-host=192.168.8.100 --repo1-host-config=/home/kingbase/kbbr1_repo/sys_rman.conf --repo1-host-user=kingbase --repo1-path=/home/kingbase/kbbr1_repo --stanza=kingbase
WARN: WAL file '00000001000000000000001C' already exists in the repo1 archive with the same checksum
HINT: this is valid in some recovery scenarios but may also indicate a problem.
2022-05-30 23:26:17.924 P00 INFO: pushed WAL file '00000001000000000000001C' to the archive
2022-05-30 23:26:18.026 P00 INFO: archive-push command end: completed successfully (549ms)
-
總結
對於sys_backup.sh工具執行物理備份時,很多故障都體現在”歸檔超時“的問題,解決以上問題,可以使用以下思路: 1、檢視資料節點的sys_log日誌,檢視是否有歸檔失敗對應的日誌資訊。 2、如果有以上日誌資訊,提取日誌資訊,通過手工方式執行歸檔命令,檢視執行失敗的錯誤提示。 3、按照錯誤提示,檢視對應的配置檔案,和歸檔相關的配置是否正確,如儲存路徑、目錄許可權、wal日誌缺失等,解決歸檔失敗問題。 4、如在業務量高峰時備份,會產生I/O擁塞,導致歸檔失敗,可以調高archive_timeout引數的閾值,建議不要在業務高峰執行備份操作。