一個Oracle bug的手工修復(r6筆記第59天)

阿新 • • 發佈：2022-05-04

在上週五的時候，本來一個例行巡檢，想擴充一些表空間，結果弄巧成拙，因為一個drop datafile的操作直接導致了一主兩備的兩個備庫MRP直接丟擲了ORA-600錯誤。在嘗試了一些方法和查看了MOS之後，除了重建備庫，暫時還沒有找到其它相對更快捷的方法。因為是10.2.0.4.0的環境，為了先修復問題，自己先使用rman在主庫做了備份，然後在備庫直接做duplicate操作還原恢復。先搭好了一個備庫，另外一個備庫則先留下來，觀察一下，看看有沒有其它的方法，如果還是沒有找到，就繼續重新搭建備庫。結果在這種試試看的時候，竟然還是找到了一線希望，也非常感謝微信群內的好友都出謀劃策，還是找到了一種可行的方案。初始的問題，可以參見http://blog.itpub.net/23718752/viewspace-1797653/ 修復的思路是因為在主庫中資料檔案的配置是沒有問題的，直接在主庫生成備份控制檔案，然後在備庫做還原，這個時候還原成功後，如果嘗試啟動MRP肯定會報錯，會有一個檔案存在不一致的情況，這個時候我們就需要讓dataguard端知道這個不一致，直接使用alter database drop datafile的操作就會把原來不一致的檔案從資料字典級進行了更新。這個過程有點類似於alter tablespace xxx drop datafile的過程，因為alter tablespace drop datafile需要在資料open階段完成，所以我們通過這種方式也能達到同樣的效果。嘗試的步驟如下：把備庫啟動到nomount階段，開始controlfile的還原。

$ rman target /
Recovery Manager: Release 10.2.0.4.0 - Production on Mon Sep 14 17:43:03 2015
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
connected to target database (not started)
RMAN> startup nomount
RMAN> restore controlfile from '/U01/backup_stage/ctl_oaqgu616_1_1';
Starting restore at 14-SEP-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=2984 devtype=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:02
output filename=/U01/app/oracle/oradata/test/control01.ctl
output filename=/U01/app/oracle/oradata/test/control02.ctl
output filename=/U01/app/oracle/oradata/test/control03.ctl
Finished restore at 14-SEP-15

還原之後，啟動到mount階段。 RMAN> alter database mount; database mounted released channel: ORA_DISK_1 RMAN> exit 這個時候開始嘗試應用日誌，即MRP開始喚醒MRP開始工作。可以看到alert日誌中的內容變化：


ALTER DATABASE RECOVER  managed standby database disconnect from session  
Mon Sep 14 17:45:04 2015
Attempt to start background Managed Standby Recovery process (p)
MRP0 started with pid=16, OS id=27255
Mon Sep 14 17:45:04 2015
MRP0: Background Managed Standby Recovery process started (peak)
Managed Standby Recovery not using Real Time Apply
MRP0: Background Media Recovery terminated with error 1110
Mon Sep 14 17:45:09 2015
Errors in file /U01/app/oracle/admin/peak/bdump/test_mrp0_27255.trc:
ORA-01110: data file 21: '/U01/app/oracle/oradata/test/test_new_index04.dbf'
ORA-01122: database file 21 failed verification check
ORA-01110: data file 21: '/U01/app/oracle/oradata/test/test_new_index04.dbf'
ORA-01203: wrong incarnation of this file - wrong creation SCN
Mon Sep 14 17:45:09 2015
Errors in file /U01/app/oracle/admin/peak/bdump/test_mrp0_27255.trc:
ORA-01110: data file 21: '/U01/app/oracle/oradata/test/test_new_index04.dbf'
ORA-01122: database file 21 failed verification check
ORA-01110: data file 21: '/U01/app/oracle/oradata/test/test_new_index04.dbf'
ORA-01203: wrong incarnation of this file - wrong creation SCN
Mon Sep 14 17:45:09 2015
MRP0: Background Media Recovery process shutdown (test)
Mon Sep 14 17:45:10 2015
Completed: ALTER DATABASE RECOVER  managed standby database disconnect from session  
Mon Sep 14 17:46:21 2015

這個時候還是會和預想的差不多，MRP依舊會失敗，但是不同的是，這個時候錯誤已經不是ORA-600的錯誤了。既然這個檔案存在不一致的情況，而且我們確實知道這個檔案是需要手工刪除的。我們就可以直接刪除資料檔案。 idle> alter database datafile '/U01/app/oracle/oradata/peak/peak_new_index04.dbf' offline drop; Database altered. 嘗試取消日誌應用 idle> recover managed standby database cancel; ORA-16136: Managed Standby Recovery not active 可見剛剛的MRP啟動是失敗的。再次啟動MRP idle> ALTER DATABASE RECOVER managed standby database disconnect from session ; Database altered. 再次啟動MRP的時候回發現日誌中出現了轉機，這個時候備庫這邊和主庫基本一致了，但是還是存在歸檔GAP.

alter database datafile '/U01/app/oracle/oradata/test/test_new_index04.dbf' offline drop
Mon Sep 14 17:46:21 2015
Completed: alter database datafile '/U01/app/oracle/oradata/test/test_new_index04.dbf' offline drop
Mon Sep 14 17:46:48 2015
ALTER DATABASE RECOVER  managed standby database cancel  
Mon Sep 14 17:46:48 2015
ORA-16136 signalled during: ALTER DATABASE RECOVER  managed standby database cancel  ...
Mon Sep 14 17:47:01 2015
ALTER DATABASE RECOVER  managed standby database disconnect from session 
Mon Sep 14 17:47:01 2015
Attempt to start background Managed Standby Recovery process (test)
MRP0 started with pid=16, OS id=27547
Mon Sep 14 17:47:01 2015
MRP0: Background Managed Standby Recovery process started (test)
Managed Standby Recovery not using Real Time Apply
 parallel recovery started with 15 processes
Mon Sep 14 17:47:06 2015
Waiting for all non-current ORLs to be archived...
Media Recovery Waiting for thread 1 sequence 7414
Fetching gap sequence in thread 1, gap sequence 7414-7416
Mon Sep 14 17:47:07 2015
Completed: ALTER DATABASE RECOVER  managed standby database disconnect from session 
Mon Sep 14 17:48:06 2015
FAL[client]: Failed to request gap sequence 
 GAP - thread 1 sequence 7414-7416
 DBID 1731005384 branch 680697352

這個時候發現了GAP,但是還沒有開始從上次ORA-600錯誤的日誌開始應用日誌。
直接開啟broker的驗證會事半功倍。
DGMGRL>add database stest2 as
 connect identifier is stest2
 maintained as physical;
DGMGRL>enable database stest;
 這個時候日誌中就開始忙碌起來了,關鍵的就是從上次失敗的歸檔開始開啟RFS接受日誌了。
 Mon Sep 14 17:53:19 2015
RFS[1]: Archived Log: '/U01/app/oracle/flash_recovery_area/STEST2/archivelog/2015_09_14/o1_mf_1_7414_bzf68cq2_.arc'
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[2]: Assigned to RFS process 28706
RFS[2]: Identified database type as 'physical standby'
RFS[2]: Archived Log: '/U01/app/oracle/flash_recovery_area/STEST2/archivelog/2015_09_14/o1_mf_1_7415_bzf68h9y_.arc'
RFS[2]: Archived Log: '/U01/app/oracle/flash_recovery_area/STEST2/archivelog/2015_09_14/o1_mf_1_7416_bzf68hgr_.arc'
RFS[2]: Archived Log: '/U01/app/oracle/flash_recovery_area/STEST2/archivelog/2015_09_14/o1_mf_1_7426_bzf68jt8_.arc'
.....
RFS[2]: Archived Log: '/U01/app/oracle/flash_recovery_area/STEST2/archivelog/2015_09_14/o1_mf_1_7420_bzf69g71_.arc'
Mon Sep 14 17:53:51 2015
Managed Standby Recovery not using Real Time Apply
 parallel recovery started with 15 processes
Mon Sep 14 17:53:51 2015
Waiting for all non-current ORLs to be archived...
Media Recovery Log /U01/app/oracle/flash_recovery_area/STEST2/archivelog/2015_09_14/o1_mf_1_7414_bzf68cq2_.arc
Mon Sep 14 17:53:52 2015
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  NODELAY
Mon Sep 14 17:53:52 2015

MRP也可以繼續應用日誌了,從上次失敗的地方開始。這個時候使用DG broker來做一個簡單驗證。


DGMGRL> show configuration;
Configuration
  Name:                test
  Enabled:             YES
  Protection Mode:     MaxPerformance
  Fast-Start Failover: DISABLED
  Databases:
    test   - Primary database
    stest4 - Physical standby database
    stest2 - Physical standby database
Current status for "peak":
SUCCESS

當然了問題修復了，來看看資料檔案的情況，這個時候就沒有問題了。

idle> select file#,df.name,df.ts#,ts.name,df.RFILE# from v$datafile df,v$tablespace ts where df.ts#=ts.ts#;
        20 /U01/app/oracle/oradata/test/test_new_data04.dbf                      9 TEST_NEW_DATA                                                        20
        21 /U01/app/oracle/oradata/test/test_new_index04.dbf                    10 TEST_NEW_INDEX                                                       21

所以通過這個案例我們可以看到，在某些情況下踩雷的時候，還是不要氣餒，在不影響全域性的情況下，可以根據自己的分析大膽假設，小心求證，沒準還真能有所發現。

一個Oracle bug的手工修復(r6筆記第59天)

一個Oracle bug的手工修復(r6筆記第59天)

Orabbix定製監控Oracle的簡單配置(r6筆記第26天)

一個oracle查詢引起的bug (r4筆記第59天)

由drop datafile導致的oracle bug(r6筆記第56天)

巧妙使用exchange partition的一個案例(r6筆記第1天)

通過定製orabbix監控分析潛在的Oracle問題 (r6筆記第32天)

MySQL和Oracle中的隱式轉換（r6筆記第45天)

一個dg警告發現的硬體問題 (r6筆記第60天)

原來Oracle也不喜歡“蜀黍"(r6筆記第54天)

從Java的型別轉換看MySQL和Oracle中的隱式轉換(二)(r6筆記第68天)

Oracle 12c遠端克隆PDB的問題及修復(r12筆記第78天)

作業系統儲存管理和oracle資料庫(第一篇) (r3筆記第76天)

基於時間點的不完全恢復的例子(r6筆記第9天)

歸檔模式下四種完全恢復的場景(r6筆記第8天)

10g,11g中的資料庫克隆安裝（r6筆記第7天)

根據時間欄位匯入資料的問題總結 (r6筆記第6天)

一次資料庫宕機問題的分析(r6筆記第5天)

清理session的小插曲(二) (r6筆記第4天)

ORA-01113問題的簡單分析(r6筆記第3天)

使用expect執行動態指令碼(r6筆記第19天)

一個Oracle bug的手工修復(r6筆記第59天)

相關推薦