一次儲存鏈路抖動因I/O timeout不同在AIX和HPUX上的不同表現(轉)
去年一個故障案例經過時間的沉澱問題沒在發生今天有時間簡單的總結一下,當時正時午睡時分,突然告警4庫8個例項同時不可用,這麼大面積的故障多數是有共性的關連,當時檢視資料庫DB ALERT日誌都是I/O錯誤寫失敗,後確認8個例項都是使用了儲存層的同步容災技術,且儲存為同一品牌日立。
2017-01-22 13:02:14.213000 +08:00 KCF: read, write or open error, block=0x1ad85 online=1 file=443 '/dev/anbob_oravg01/ranbob_lv15_062' error=27063 txt: 'HPUX-ia64 Error: 11: Resource temporarily unavailable Additional information: -1 Additional information: 32768' Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_dbw7_17700.trc: Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_lgwr_17702.trc: ORA-00345: redo log write error block 95667 count 10 ORA-00312: online log 4 thread 1: '/dev/anbob_oravg02/ranbob_redo04' ORA-27063: number of bytes read/written is incorrect HPUX-ia64 Error: 11: Resource temporarily unavailable Additional information: -1 Additional information: 10240 KCF: read, write or open error, block=0x5c699 online=1 KCF: read, write or open error, block=0x168297 online=1 file=29 '/dev/anbob_oravg01/ranbob_lv15_024' file=142 '/dev/anbob_oravg04/ranbob_lv30_273' error=27063 txt: 'HPUX-ia64 Error: 11: Resource temporarily unavailable error=27063 txt: 'HPUX-ia64 Error: 11: Resource temporarily unavailable Additional information: -1 Additional information: -1 Additional information: 8192' Additional information: 8192' Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_dbw1_17688.trc:
再回頭看一下這些資料庫的環境, 使用的是同步的異地容災技術,也就是儲存上層的應用I/O一次要寫兩處,本地和遠端都寫成功才算完成,這裡的應用也就是ORACLE DB,這算是過去容災環境中常用技術,對於儲存同步通常也有非同步技術需要購買更貴的license. 這些環境中的DB 因為遠端的鏈路抖動導致I/O寫失敗導致HPUX平臺的資料庫重啟。
不過有意思的時同樣異地容災的資料庫還有其它環境並未重啟,如下
OS | Storage | IS_Restart |
---|---|---|
AIX | EMC | NO |
AIX | HDS | NO |
HPUX | EMC | NO |
HPUX | HDS | YES |
Note:
這裡看到只有HPUX和HDS的配合重啟了資料庫,在儲存上EMC工程師當時說從日誌發現錯了錯誤和切換鏈路,但HDS工程師說並未發現錯誤日誌,但是提出日立儲存判斷是當鏈路發生問題時, 切換的超時時間為30+10 秒。那麼再回到上層OS層,HPUX主機的IO timeout時間為30秒, AIX主機為60秒. 所以存在日立儲存切換鏈路前HPUX已I/O 超時,返回了I/O失敗. 而故障時間也可能剛好>30 <60秒所以在AIX timeout前儲存已恢復正常, AIX可以繼續並未重啟。
當然假設以上都是成立的,那究竟當鏈路不可用時,短時內宕掉資料庫保證資料庫一致,還是再增加多一些的時間retry, 為儲存短時內恢復爭取時間正為合適,需要一個時間的權衡。 這個時間也就是PL/SQL 中的commit.
資料庫的ACID中的D也就是永續性,要求COMMIT後的事務要持久化也就是不能丟失,所以在SQL中的COMMIT,都是強置redo log刷到磁碟才可以繼續,如下:
when a session issues a commit, it generates the redo describing how to update its transaction table slot in the undo segment header block, puts this redo into the log buffer, applies it to the undo segment header block, calls the log writer to flush the log buffer to disk, and then goes into a log file sync wait until the log writer lets it know that its entry in the log buffer has been copied to disk.
This commit/rollback mechanism that makes transactions Durable.(D OF ACID )
但是PL/SQL 中的commit是做了優化,為了權衡LOOP 中的COMMIT的效能,commit只是發關給LGWR一個提交的message, 然而並不會一直等lgwr寫磁碟完成就可以繼續下一個事務,這點區別與SQL中事務的認識。可以使用一段PL/SQL測試。
[[email protected] ~]$ sqlplus anbob/[email protected]/pdbanbob.com SQL*Plus: Release 12.2.0.0.0 Beta on Tue Feb 7 15:00:27 2017 Copyright (c) 1982, 2015, Oracle. All rights reserved. Last Successful login time: Tue Feb 07 2017 14:57:44 +08:00 Connected to: Oracle Database 12c EE Extreme Perf Release 12.2.0.1.0 - 64bit Production SQL> create table anbob.t(id int,a date); Table created. SQL> @statn commit STAT# HEX# OFFSET NAME VALUE ---------- ----- ---------- ---------------------------------------------------------------- ---------- 6 6 48 user commits 1 219 DB 1752 commit cleanouts 3 220 DC 1760 commit cleanouts successfully completed 3 647 287 5176 IMU commits 1 ... 45 rows selected. SQL> @statn sync STAT# HEX# OFFSET NAME VALUE ---------- ----- ---------- ---------------------------------------------------------------- ---------- 338 152 2704 redo synch time 9 ... 346 15A 2768 redo synch writes 2 ... 17 rows selected. declare i int:=0; begin while i<100 loop insert into t values(i,sysdate); commit; dbms_lock.sleep(1); i:=i+1; end loop; end; / PL/SQL procedure successfully completed. SQL>@statn commit STAT# HEX# OFFSET NAME VALUE ---------- ----- ---------- ---------------------------------------------------------------- ---------- 6 6 48 user commits 101 201 C9 1608 BPS commit wait 0 ... 219 DB 1752 commit cleanouts 103 220 DC 1760 commit cleanouts successfully completed 103 647 287 5176 IMU commits 101 45 rows selected. SQL> @statn sync STAT# HEX# OFFSET NAME VALUE ---------- ----- ---------- ---------------------------------------------------------------- ---------- 338 152 2704 redo synch time 9 ... 346 15A 2768 redo synch writes 3 17 rows selected.
Note:
user commits 值是和PLSQL 中 COMMIT一致,但是redo synch writes才增加了一次,注意如果在PL/SQL中使用DBLINK就不再這樣。而SQL中的COMMIT如下
SQL> @statn sync STAT# HEX# OFFSET NAME VALUE ---------- ----- ---------- ---------------------------------------------------------------- ---------- 338 152 2704 redo synch time 9 346 15A 2768 redo synch writes 4 17 rows selected. SQL> insert into t values(200,sysdate); 1 row created. SQL> commit; Commit complete. SQL> insert into t values(200,sysdate); 1 row created. SQL> commit; Commit complete. SQL> @statn sync STAT# HEX# OFFSET NAME VALUE ---------- ----- ---------- ---------------------------------------------------------------- ---------- 338 152 2704 redo synch time 10 346 15A 2768 redo synch writes 6
Note:
每一次commit都會觸發redo synch writes。
the statistic redo synch writes counts the number of times a session has sent a message (statistic messages sent) to lgwr on a commit. This is an approximation; in fact, “sending a message” may not involve a real message.
Clearly the user session is not behaving as expected—it has posted lgwr to write a few times, but it
has only incremented redo synch writes once, which suggests it didn’t stop and wait for lgwr to wake it
up again. The user’s session is breaching the durability requirement; if the instance crashed somewhere
in the middle of this loop, it’s entirely possible that a transaction that had been committed would not be
recovered.If we saw this output we could interpret it as 25 cycles of the following sequence:
• User session issues a commit
• User session posts lgwr and increments redo synch writes
• User session goes into a wait (log file sync) waiting to be posted by lgwr
• Lgwr gets woken up
• Lgwr writes the log buffer to disk, waiting a short time on each writeThis strategy does not get used if the code is doing updates across database links, so there have been occasions in the
past where I’ve used a totally redundant loopback database link to ensure that some PL/SQL code would wait for a
log file sync on every commit.
所以如果在PLSQL 使用LOOP commit, 像上面如果儲存最終都沒有恢復,那麼commit的事務會丟失。
當然為了保持HPUX和AIX 的一致,資料庫環境都使用了非同步IO(AIO)和RAW裸裝置的共享儲存,對於資料庫的I/O請求,資料庫只是傳送給OS層後就結束,timeout的時間多數取決於OS和儲存層。對於HP平臺而言,與IO timeout相關的核心引數主要是PV timeout、LV timeout、ESD_SECS、asyncdsk_io_timeout等。
HPUX做如下修改:
1, PV timeout預設為30s, 據瞭解AIX平臺為60s, 調整該引數為60s.
2,LV timout預設依賴PV timeout, 建議值如下LV timeout value = (# of paths * PV Timeout) + 10 seconds
3,ESD_SECS=120 esd_secs attribute determines the timeout of I/O operations to block devices.預設是30s, 在MOS中有案例調整了該引數,因使用RAW device本次未做調整。
4,asyncdsk_io_timout 預設值30s ,調整為120s。
在做了如上調整後,手動的切斷遠端鏈路,並在120s 前恢復,資料庫並未crash.
提示:本案例僅供參考,具體調整需要諮詢OS和儲存廠商。