1. 程式人生 > 其它 >KingbaseES V8R6 單例項sys_backup.sh外部備份故障案例

KingbaseES V8R6 單例項sys_backup.sh外部備份故障案例

案例說明:

在KingbaseES V8R6單例項環境,配置外部備份伺服器使用sys_backup.sh物理備份時,出現以下”WAL segment xxx was not archived before the 60000ms timeout“故障。操作步驟見:《KingbaseES V8R6 單例項sys_backup.sh外部備份案例》https://note.youdao.com/s/Oxf3c7Ee

  • 故障現象:

  • 故障分析:

1、檢視資料庫節點下sys_log日誌

2022-05-30 23:18:57.147 CST [21529] LOG:  archive command failed with exit code 55
2022-05-30 23:18:57.147 CST [21529] DETAIL:  The failed archive command was: export TZ=Asia/Shanghai;/opt/Kingbase/ES/V8R6_054/Server/bin/sys_rman --config /home/kingbase/kbbr1_repo/sys_rman.conf --stanza=kingbase archive-push sys_wal/00000001000000000000001C
2022-05-30 23:18:57.147 CST [21529] WARNING:  archiving write-ahead log file "00000001000000000000001C" failed too many times, will try again later

2、提取日誌歸檔資訊手工執行歸檔

[kingbase@node1 data]$ /opt/Kingbase/ES/V8R6_054/Server/bin/sys_rman --config /home/kingbase/kbbr1_repo/sys_rman.conf --stanza=kingbase archive-push sys_wal/00000001000000000000001C

2022-05-30 23:23:32.207 P00   INFO: archive-push command begin 2.27: [sys_wal/00000001000000000000001C] --config=/home/kingbase/kbbr1_repo/sys_rman.conf --exec-id=31480-4e35a388 --kb1-path=/data/kingbase/v8r6_054/data --log-level-console=info --log-level-file=info --log-path=/opt/Kingbase/ES/V8R6_054/Server/log --log-subprocess --repo1-host=192.168.8.100 --repo1-host-config=/home/kingbase/kbbr1_repo/sys_rman.conf --repo1-host-user=kingbase --repo1-path=/home/kingbase/kbbr1_repo --stanza=kingbase
ERROR: [103]: unable to find a valid repository:
       repo1: [UnknownError] remote-0 process on '192.168.8.100' terminated unexpectedly [127]: bash: /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/bin/sys_rman: No such file or directory
2022-05-30 23:23:32.461 P00   INFO: archive-push command end: aborted with exception [103]

3、檢視備份伺服器路徑資訊
=從備份伺服器查詢到,原路徑資訊“/opt/Kingbase/ES/V8R6_054/Server",由於此版本的資料庫軟體安裝儲存路徑使用了符號連結(如下圖所示),導致在repo節點配置檔案路徑時,和資料庫節點的儲存路徑不一致,在repo節點讀取檔案失敗。=

資料節點檔案儲存路徑:

  • 故障解決:
    1、在repo節點上重新建立儲存目錄
[kingbase@srv01 bin]$ mkdir -p /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server

[kingbase@srv01 V8R6_054]$ cd /opt/Kingbase/ES/V8R6_054/KESRealPro/V008R006C005B0054/Server/
[kingbase@srv01 Server]$ ls -lh
total 338M
drwxr-xr-x 2 kingbase kingbase 4.0K May 31 14:20 bin
-rw------- 1 kingbase kingbase 338M May 31 14:14 db.zip
drwxrwxr-x 5 kingbase kingbase 8.0K Apr  7 16:17 lib
drwx------ 2 kingbase kingbase  100 May 31 14:23 log
drwxrwxr-x 8 kingbase kingbase 4.0K Apr  7 16:17 share

2、編輯sys_backup.conf檔案

3、重新執行初始化

4、 重新執行手工歸檔(歸檔成功)

[kingbase@node1 data]$ /opt/Kingbase/ES/V8R6_054/Server/bin/sys_rman --config /home/kingbase/kbbr1_repo/sys_rman.conf --stanza=kingbase archive-push sys_wal/00000001000000000000001C

2022-05-30 23:26:17.480 P00   INFO: archive-push command begin 2.27: [sys_wal/00000001000000000000001C] --config=/home/kingbase/kbbr1_repo/sys_rman.conf --exec-id=32254-8f3f05e4 --kb1-path=/data/kingbase/v8r6_054/data --log-level-console=info --log-level-file=info --log-path=/opt/Kingbase/ES/V8R6_054/Server/log --log-subprocess --repo1-host=192.168.8.100 --repo1-host-config=/home/kingbase/kbbr1_repo/sys_rman.conf --repo1-host-user=kingbase --repo1-path=/home/kingbase/kbbr1_repo --stanza=kingbase

WARN: WAL file '00000001000000000000001C' already exists in the repo1 archive with the same checksum
      HINT: this is valid in some recovery scenarios but may also indicate a problem.
2022-05-30 23:26:17.924 P00   INFO: pushed WAL file '00000001000000000000001C' to the archive
2022-05-30 23:26:18.026 P00   INFO: archive-push command end: completed successfully (549ms)
  • 總結

         對於sys_backup.sh工具執行物理備份時,很多故障都體現在”歸檔超時“的問題,解決以上問題,可以使用以下思路:
    
        1、檢視資料節點的sys_log日誌,檢視是否有歸檔失敗對應的日誌資訊。
        2、如果有以上日誌資訊,提取日誌資訊,通過手工方式執行歸檔命令,檢視執行失敗的錯誤提示。
        3、按照錯誤提示,檢視對應的配置檔案,和歸檔相關的配置是否正確,如儲存路徑、目錄許可權、wal日誌缺失等,解決歸檔失敗問題。
        4、如在業務量高峰時備份,會產生I/O擁塞,導致歸檔失敗,可以調高archive_timeout引數的閾值,建議不要在業務高峰執行備份操作。