KingbaseES 單例項環境wal(xlog)日誌清理故障案例
案例說明:
在通過sys_archivecleanup工具手工清理wal日誌時,在control檔案中查詢的檢查點對應的wal日誌是“000000010000000000000008”,但是在執行清理時,誤將“000000010000000000000009”以前的wal日誌都被清理,在啟動資料庫時,無法讀取checkpoint所在的wal日誌,導致資料庫啟動失敗。
資料庫版本:
test=# select version; version ------------------------------------------------------------------------------------------------------------------ KingbaseES V008R006C005B0054 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64- bit
以下為wal日誌清理的操作:
1)檢視當前control檔案資訊
2)檢視wal日誌資訊並清理
清理前:
[kingbase@node1 sys_wal]$ ls -lh total 80M -rw------- 1 kingbase kingbase 16M May 11 13:26 000000010000000000000006 -rw------- 1 kingbase kingbase 16M May 11 13:26 000000010000000000000007 -rw------- 1 kingbase kingbase 16M May 11 13:26 000000010000000000000008 -rw------- 1 kingbase kingbase 16M May 11 13:00 000000010000000000000009 -rw------- 1 kingbase kingbase 16M May 11 13:02 00000001000000000000000A drwx------ 2 kingbase kingbase 78 May 11 13:49 archive_status
日誌清理:[kingbase@node1 bin]$ ./sys_archivecleanup /data/kingbase/v8r6_054/data/sys_wal 000000010000000000000009
清理後:
[kingbase@node1 sys_wal]$ ls -lh
total 32M
-rw------- 1 kingbase kingbase 16M May 11 13:00 000000010000000000000009
-rw------- 1 kingbase kingbase 16M May 11 13:02 00000001000000000000000A
drwx------ 2 kingbase kingbase 78 May 11 13:49 archive_status
一、啟動資料庫出現故障
1、啟動資料庫服務
[kingbase@node1 bin]$ ./sys_ctl start -D /data/kingbase/v8r6_054/data/
......
2022-05-12 15:29:34.641 CST [25993] HINT: Future log output will appear in directory "sys_log".
...... stopped waiting
sys_ctl: could not start server
Examine the log output.
2、檢視資料庫sys_log日誌
2022-05-12 15:29:35.309 CST [26003] LOG: invalid primary checkpoint record
2022-05-12 15:29:35.309 CST [26003] PANIC: could not locate a valid checkpoint record
2022-05-12 15:29:35.309 CST [26003] LOG: kingbase ran into a problem it couldn't handle,it needs to be shutdown to prevent damage to your data
2022-05-12 15:29:35.346 CST [26003] WARNING:
ERROR: -----------------------stack error start-----------------------
ERROR: TIME: 2022-05-12 15:29:35.309749+08
ERROR: 1 26003 0x7fc2aa18ef6b debug_backtrace (backtrace.so)
ERROR: 2 26003 0x7fc2aa18f53a <symbol not found> (backtrace.so)
ERROR: 3 26003 0x7fc2b390a670 <symbol not found> (libc.so.6)
ERROR: 4 26003 0x7fc2b390a5f7 gsignal (libc.so.6)
ERROR: 5 26003 0x7fc2b390bce8 abort (libc.so.6)
ERROR: 6 26003 0x9148dc errfinish + 0x4d008d3c
ERROR: 7 26003 0x54011c StartupXLOG + 0x4cc3457c
ERROR: 8 26003 0x774f51 StartupProcessMain + 0x4ce693b1
ERROR: 9 26003 0x550550 AuxiliaryProcessMain + 0x4cc449b0
ERROR: 10 26003 0x76f5c7 StartChildProcess + 0x4ce63a27
ERROR: 11 26003 0x77350d PostmasterMain + 0x4ce6796d
ERROR: 12 26003 0x6cb0af main + 0x4cdbf50f
ERROR: 13 26003 0x7fc2b38f6b15 __libc_start_main (libc.so.6)
ERROR: 14 26003 0x4a1659 _start + 0x4cbaac39
2022-05-12 15:29:40.654 CST [25993] LOG: startup process (PID 26003) was terminated by signal 6: Aborted
2022-05-12 15:29:40.654 CST [25993] LOG: aborting startup due to startup process failure
2022-05-12 15:29:40.728 CST [25993] LOG: database system is shut down
=如上所示,資料庫啟動時,無法通過wal日誌,讀取到checkpoint資訊,導致資料庫啟動失敗。=
二、讀取資料庫控制檔案資訊
[kingbase@node1 bin]$ ./sys_controldata -D /data/kingbase/v8r6_054/data
sys_control version number: 1201
Catalog version number: 202202151
Database system identifier: 7096019857358041449
Database cluster state: in production
sys_control last modified: Wed 11 May 2022 01:26:44 PM CST
Latest checkpoint location: 0/8000058
Latest checkpoint's REDO location: 0/8000028
Latest checkpoint's REDO WAL file: 000000010000000000000008
三、檢視當前的wal日誌
=如下所示,檢查點對應的wal日誌檔案“000000010000000000000008”已經缺失。=
[kingbase@node1 sys_wal]$ ls -lh
total 32M
-rw------- 1 kingbase kingbase 16M May 11 13:00 000000010000000000000009
-rw------- 1 kingbase kingbase 16M May 11 13:02 00000001000000000000000A
drwx------ 2 kingbase kingbase 78 May 11 13:49 archive_status
Tips:
=由於資料庫checkpoint對應的wal日誌缺失,資料庫啟動時,無法判斷資料庫的一致性狀態,導致啟動失敗。對於以上情況,可以通過物理備份,將資料庫恢復到過去的時間點,啟動資料庫;如果沒有物理備份,也可以通過重建控制檔案,啟動資料庫。但是這兩種方法都會導致資料丟失,所以在執行資料庫的日誌清理時,操作之前一定要確認,選擇的wal日誌檔案是正確的。=
四、重建控制檔案
1、通過sys_resetwal重建控制檔案
[kingbase@node1 bin]$ ./sys_resetwal -l 00000001000000000000000A -D /data/kingbase/v8r6_054/data
The database server was not shut down cleanly.
Resetting the write-ahead log might cause data to be lost.
If you want to proceed anyway, use -f to force reset.
[kingbase@node1 bin]$ ./sys_resetwal -l 00000001000000000000000A -D /data/kingbase/v8r6_054/data -f
Write-ahead log reset
2、檢視控制檔案重建後的wal日誌
[kingbase@node1 sys_wal]$ ls -lh
total 16M
-rw------- 1 kingbase kingbase 16M May 12 15:46 00000001000000000000000B
drwx------ 2 kingbase kingbase 6 May 12 15:46 archive_status
3、檢視控制檔案資訊
[kingbase@node1 bin]$ ./sys_controldata -D /data/kingbase/v8r6_054/data
sys_control version number: 1201
Catalog version number: 202202151
Database system identifier: 7096019857358041449
Database cluster state: shut down
sys_control last modified: Thu 12 May 2022 03:46:38 PM CST
Latest checkpoint location: 0/B000028
Latest checkpoint's REDO location: 0/B000028
Latest checkpoint's REDO WAL file: 00000001000000000000000B
五、啟動資料庫例項及驗證
1、啟動資料庫
[kingbase@node1 bin]$ ./sys_ctl start -D /data/kingbase/v8r6_054/data/
waiting for server to start....2022-05-12 15:54:53.731 CST [30496] LOG: sepapower extension initialized
.....
done
server started
2、檢視sys_log日誌(資料庫正常啟動)
[kingbase@node1 sys_log]$ tail -100 kingbase-2022-05-12_155453.log
2022-05-12 15:54:53.919 CST [30498] LOG: database system was shut down at 2022-05-12 15:46:38 CST
2022-05-12 15:54:54.132 CST [30496] LOG: database system is ready to accept connections
3、訪問資料庫
[kingbase@node1 bin]$ ./ksql -U system -W test -p 54322
Password:
ksql (V8.0)
Type "help" for help.
test=# \d prod
Did not find any relation named "prod".
test=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------------+-------+--------
public | sys_stat_statements | view | system
public | t1 | table | system
(2 rows)
六、總結
1、對於wal日誌清理,可以使用sys_archivecleanup工具,首先通過控制檔案判斷需要保留的wal日誌。
2、在執行清理時,一定要確認保留的日誌是正確的。
3、對於生產環境執行此操作,最好由雙人確認操作的正確性。