1. 程式人生 > >Oracle報錯ORA-16433非歸檔丟失redo無法啟動的恢復過程

Oracle報錯ORA-16433非歸檔丟失redo無法啟動的恢復過程

gen operation mpat ima 丟失 fill result connect rdbms

【案例】Oracle報錯ORA-16433非歸檔丟失redo無法啟動的恢復過程

轉惜紛飛


今天ML的群中女神和travel在糾結一個恢復的問題,11.2.0.3版本,非歸檔,大概是rm掉current的log,然後重建controlfille後恢復導致一系列問題,並最終出現ora-600 2662錯誤,雖然這個錯誤很常見,但是你發現推進scn也是無法open,感覺有點怪,遠程了女神的電腦,操作不便,最後將文件壓縮傳過來,我在自己的vmware進行了恢復。

由於環境的差異,所以解壓後我先進行rename file操作,如下:

SQL> SELECT name FROM v$datafile;

NAME
------------------------------------------------
/u01/app/oracle/oradata/travel/system01.dbf
.........
/u01/app/oracle/oradata/travel/users01.dbf

SQL> SELECT member FROM v$logfile;

MEMBER
-------------------------------------------------
/u01/app/oracle/oradata/travel/redo03.log
/u01/app/oracle/oradata/travel/redo02.log
/u01/app/oracle/oradata/travel/redo01.log

SQL> ALTER DATABASE RENAME file ‘/u01/app/oracle/oradata/travel/system01.dbf‘ TO ‘/home/oracle/travel/travel/system01.dbf‘;
SQL> ALTER DATABASE RENAME file ‘/u01/app/oracle/oradata/travel/sysaux01.dbf‘ TO ‘/home/oracle/travel/travel/sysaux01.dbf‘;
SQL> ALTER DATABASE RENAME file ‘/u01/app/oracle/oradata/travel/undotbs01.dbf‘ TO ‘/home/oracle/travel/travel/undotbs01.dbf‘;
SQL> ALTER DATABASE RENAME file ‘/u01/app/oracle/oradata/travel/users01.dbf‘ TO ‘/home/oracle/travel/travel/users01.dbf‘;
SQL> ALTER DATABASE RENAME file ‘/u01/app/oracle/oradata/travel/redo01.log‘ TO ‘/home/oracle/travel/travel/redo01.log‘;
SQL> ALTER DATABASE RENAME file ‘/u01/app/oracle/oradata/travel/redo02.log‘ TO ‘/home/oracle/travel/travel/redo02.log‘;
SQL> ALTER DATABASE RENAME file ‘/u01/app/oracle/oradata/travel/redo03.log‘ TO ‘/home/oracle/travel/travel/redo03.log‘;

SQL> SELECT name FROM v$datafile;

NAME
-------------------------------------------------
/home/oracle/travel/travel/system01.dbf
/home/oracle/travel/travel/sysaux01.dbf
/home/oracle/travel/travel/undotbs01.dbf
/home/oracle/travel/travel/users01.dbf

SQL> SELECT member FROM v$Logfile;

MEMBER
-------------------------------------------------
/home/oracle/travel/travel/redo03.log
/home/oracle/travel/travel/redo02.log
/home/oracle/travel/travel/redo01.log

此時,進行recover,會報錯ORA-16433,如下:

SQL> recover DATABASE;
ORA-00283: recovery SESSION canceled due TO errors
ORA-16433: The DATABASE must be opened IN READ/WRITE mode.


SQL> recover DATABASE USING backup controlfile until cancel;
ORA-00283: recovery SESSION canceled due TO errors
ORA-16433: The DATABASE must be opened IN READ/WRITE mode.

SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: DATA file 1: ‘/home/oracle/travel/travel/system01.dbf‘
關於該錯誤,通過oerr命令可以大概了解一下其含義,如下:

[oracle@11gR2_primary ~]$ oerr ora 16433
16433, 00000, "The database must be opened in read/write mode."
// *Cause: An attempt was made to open the database in read-only mode after an
// operation that requires that the database be opened in read/write
// mode.
// *Action: Open the database in read/write mode. The database can then be
// opened in read-only mode.

通過這個錯誤,我們可以得出一個信息,數據庫可以以read only模式打開,換句話講,你查詢如下幾個值都是一樣的:
select checkpoint_change# from v$database; --來自controlfile
select checkpoint_change# from v$datafile; --來自controlfile
select checkpoint_change# from v$datafile_orader; --來自datafile header

其實,通常來講,只要上述幾個值一樣,那麽我們的db應該都是可以直接open打開的,然而,這裏卻不行,read only都不行,我試過。
然後下面就開始我的恢復工作。

第一次嘗試恢復:

---重建controlfile
SQL> startup nomount pfile=/home/oracle/travel/travel/a.ora
ORACLE instance started.

Total System Global Area 626327552 bytes
Fixed Size 2230952 bytes
Variable Size 184550744 bytes
Database Buffers 436207616 bytes
Redo Buffers 3338240 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TRAVEL" RESETLOGS NOARCHIVELOG
2 MAXLOGFILES 16
3 MAXLOGMEMBERS 3
4 MAXDATAFILES 100
5 MAXINSTANCES 8
6 MAXLOGHISTORY 292
7 LOGFILE
8 GROUP 1 ‘/home/oracle/travel/travel/redo01.log‘ SIZE 50M BLOCKSIZE 512,
9 GROUP 2 ‘/home/oracle/travel/travel/redo02.log‘ SIZE 50M BLOCKSIZE 512,
10 GROUP 3 ‘/home/oracle/travel/travel/redo03.log‘ SIZE 50M BLOCKSIZE 512
11 -- STANDBY LOGFILE
12 DATAFILE
13 ‘/home/oracle/travel/travel/system01.dbf‘,
14 ‘/home/oracle/travel/travel/sysaux01.dbf‘,
15 ‘/home/oracle/travel/travel/undotbs01.dbf‘,
16 ‘/home/oracle/travel/travel/users01.dbf‘
17 CHARACTER SET AL32UTF8
18 ;

Control file created.

SQL>

開始開始進行recover:
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/home/oracle/travel/travel/system01.dbf‘

SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done

SQL> recover database using backup controlfile until cancel;
ORA-00279: change 244977 generated at 01/19/2013 01:56:54 needed for thread 1
ORA-00289: suggestion : /oracle/product/11.2.0/db_1/dbs/arch1_1_805082211.dbf
ORA-00280: change 244977 for thread 1 is in sequence #1

Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/home/oracle/travel/travel/system01.dbf‘

ORA-01112: media recovery not started
因為本身是非歸檔,所以這個不完全恢復的步驟無非是為了後面可以進行open resetlog。然後停庫,加入隱含參數進行open:

SQL> shutdown immediate
ORA-01109: DATABASE NOT OPEN

DATABASE dismounted.
ORACLE instance shut down.
SQL> startup nomount pfile=/home/oracle/travel/travel/b.ora
ORACLE instance started.

Total System Global Area 626327552 bytes
Fixed SIZE 2230952 bytes
Variable SIZE 184550744 bytes
DATABASE Buffers 436207616 bytes
Redo Buffers 3338240 bytes

SQL> ALTER DATABASE mount
DATABASE altered.

SQL> ALTER DATABASE OPEN resetlogs;
ALTER DATABASE OPEN resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance TERMINATED. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0],
[244998], [4194432], [], [], [], [], [], []
Process ID: 3641
SESSION ID: 1 Serial NUMBER: 5

此時alert log對應的錯誤如下:

Fri Dec 14 21:58:39 2012
SMON: enabling cache recovery
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc (incident=4937):
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], []
Incident details in: /oracle/diag/diag/rdbms/travel/travel/incident/incdir_4937/travel_ora_3641_i4937.trc
Fri Dec 14 21:58:43 2012
Dumping diagnostic data in directory=[cdmp_20121214215843], requested by (instance=1, osid=3641), summary=[incident=4937].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc:
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], []
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc:
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3641): terminating the instance due to error 600
Instance terminated by USER, pid = 3641
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (3641) as a result of ORA-1092

甚至手工執行如下操作都無法open數據庫,如下:

SQL> ALTER SESSION SET events ‘10015 trace name ADJUST_SCN level 14‘;

SESSION altered.

SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: DATA file 1: ‘/home/oracle/travel/travel/system01.dbf‘

最後dunmp controlfile和CURRENT redo logfile,發現low cache rba信息居然是最大值:

SQL> oradebug setmypid
Statement processed.

SQL> ALTER system SET events ‘immediate trace name controlf level 4‘ ;
System altered.

SQL> ALTER system dump logfile ‘/home/oracle/travel/travel/redo01.log‘;

SQL> oradebug tracefile_name
/oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_4229.trc


++++++++ controlfile dump 片段

***************************************************************************
CHECKPOINT PROGRESS RECORDS
***************************************************************************
(size = 8180, compat size = 8180, section max = 11, section in-use = 0,
last-recid= 0, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 2, numrecs = 11)
THREAD #1 - status:0x2 flags:0x0 dirty:0
low cache rba:(0xffffffff.ffffffff.ffff) on disk rba:(0x1.3.0) ---low cache rba為無窮大
on disk scn: 0x0000.0003bcf7 12/14/2012 21:58:39
resetlogs scn: 0x0000.0003bcf2 12/14/2012 21:58:36
heartbeat: 802069494 mount id: 2869233386
THREAD #2 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0
THREAD #3 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0

+++++++redo01.log dump 片段

DUMP OF REDO FROM FILE ‘/home/oracle/travel/travel/redo01.log‘
Opcodes *.*
RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
Times: creation thru eternity
FILE HEADER:
Compatibility Vsn = 186646528=0xb200000
Db ID=2872261344=0xab333ae0, Db Name=‘TRAVEL‘
Activation ID=2872292516=0xab33b4a4
Control Seq=233=0xe9, File size=102400=0x19000
File Number=1, Blksiz=512, File Type=2 LOG
descrip:"Thread 0001, Seq# 0000000001, SCN 0x00000003bcee-0xffffffffffff"
thread: 1 nab: 0xffffffff seq: 0x00000001 hws: 0x3 eot: 1 dis: 0
resetlogs count: 0x2ffc9463 scn: 0x0000.0003bcee (244974)
prev resetlogs count: 0x2ffc7a41 scn: 0x0000.0003bcea (244970)
Low scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51
Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
Enabled scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51
Thread closed scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51
Disk cksum: 0x3467 Calc cksum: 0x3467
Terminal recovery stop scn: 0x0000.00000000
Terminal recovery 01/01/1988 00:00:00
Most recent redo scn: 0x0000.00000000
Largest LWN: 0 blocks
End-of-redo stream : No
Unprotected mode
Miscellaneous flags: 0x800000
Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
Zero blocks: 0
Format ID is 2
redo log key is 853c461da2eec7ed4b45ce75b8c27d7
redo log key flag is 5
Enabled redo threads: 1

REDO RECORD - Thread:1 RBA: 0x000001.00000002.0010 LEN: 0x0070 VLD: 0x05
SCN: 0x0000.0003bcf2 SUBSCN: 1 01/19/2013 01:56:54
(LWN RBA: 0x000001.00000002.0010 LEN: 0001 NST: 0001 SCN: 0x0000.0003bcee) ,
CHANGE #1 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:0 OP:17.3 ENC:0
END OF REDO DUMP

可以看到實際上current redo的LWN RBA是對的,為1.2.10,是小於on disk rba (0×1.3.0)的。那麽這裏為什麽會出現這個情況?
猜測可能是寫紊亂了。到這裏也就比較明白了,仍然是conrolfile有問題。

既然如此,那麽我直接將controlfile文件rm掉,然後再次重建。如下:

SQL> startup nomount pfile=‘/home/oracle/travel/travel/a.ora
ORACLE instance started.

Total System Global Area 626327552 bytes
Fixed Size 2230952 bytes
Variable Size 184550744 bytes
Database Buffers 436207616 bytes
Redo Buffers 3338240 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TRAVEL" RESETLOGS NOARCHIVELOG
2 MAXLOGFILES 16
3 MAXLOGMEMBERS 3
4 MAXDATAFILES 100
5 MAXINSTANCES 8
6 MAXLOGHISTORY 292
7 LOGFILE
8 GROUP 1 ‘/home/oracle/travel/travel/redo01.log‘ SIZE 50M BLOCKSIZE 512,
9 GROUP 2 ‘/home/oracle/travel/travel/redo02.log‘ SIZE 50M BLOCKSIZE 512,
10 GROUP 3 ‘/home/oracle/travel/travel/redo03.log‘ SIZE 50M BLOCKSIZE 512
11 -- STANDBY LOGFILE
12 DATAFILE
13 ‘/home/oracle/travel/travel/system01.dbf‘,
14 ‘/home/oracle/travel/travel/sysaux01.dbf‘,
15 ‘/home/oracle/travel/travel/undotbs01.dbf‘,
16 ‘/home/oracle/travel/travel/users01.dbf‘
17 CHARACTER SET AL32UTF8
18 ;

Control file created.
SQL> recover database until cancel using backup controlfile;
ORA-00279: change 244985 generated at 12/14/2012 23:05:06 needed for thread 1
ORA-00289: suggestion : /oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf
ORA-00280: change 244985 for thread 1 is in sequence #1

Specify log: {=suggested | filename | AUTO | CANCEL}
AUTO
ORA-00308: cannot open archived log
‘/oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf‘
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

ORA-00308: cannot open archived log
‘/oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf‘
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/home/oracle/travel/travel/system01.dbf‘

SQL>
SQL>
SQL> select file#,CHECKPOINT_CHANGE#,LAST_CHANGE# from v$datafile order by 1;

FILE# CHECKPOINT_CHANGE# LAST_CHANGE#
---------- ------------------ ------------
1 244985
2 244985
3 244985
4 244985

SQL> select file#,CHECKPOINT_CHANGE# from v$datafile_header order by 1;

FILE# CHECKPOINT_CHANGE#
---------- ------------------
1 244985
2 244985
3 244985
4 244985

SQL> select CHECKPOINT_CHANGE# from v$database;

CHECKPOINT_CHANGE#
------------------
0
此時停庫,然後在pfile中加入隱含參數:
*._allow_resetlogs_corruption=TRUE
*._allow_error_simulation=TRUE

接著再次進行mount,並進行scn 推進:

SQL> ALTER DATABASE OPEN resetlogs;
ALTER DATABASE OPEN resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery TO be consistent
ORA-01110: DATA file 1: ‘/home/oracle/travel/travel/system01.dbf‘

SQL>
SQL> shutdown immediate
ORA-01109: DATABASE NOT OPEN

DATABASE dismounted.
ORACLE instance shut down.
SQL> startup nomount pfile=‘/home/oracle/travel/travel/b.ora
ORACLE instance started.

Total System Global Area 626327552 bytes
Fixed Size 2230952 bytes
Variable Size 184550744 bytes
Database Buffers 436207616 bytes
Redo Buffers 3338240 bytes
SQL> alter database mount
2 ;

Database altered.

SQL> alter session set events ‘10015 trace name ADJUST_SCN level 10‘;
Session altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

SQL> alter database open noresetlogs;
alter database open noresetlogs
*
ERROR at line 1:
ORA-01588: must use RESETLOGS option for database open

SQL> alter database open resetlogs;
Database altered.

SQL> show parameter name

NAME TYPE VALUE
-------------Oracleо----------------------- ----------- ------------------------------
db_file_name_convert string
db_name string travel
db_unique_name string travel
global_names boolean FALSE
instance_name string travel
lock_name_space string
log_file_name_convert string
processor_group_name string
service_names string travel
SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL> alter system switch logfile;
System altered.

這裏說明一下是我的alert log裏面還出現了數據字典不一致的問題,因為畢竟是強制open的,如下:

Sun Jan 20 00:25:34 2013
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_m000_4681.trc:
ORA-25153: Temporary Tablespace is Empty
Sun Jan 20 00:25:38 2013
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_j004_4692.trc (incident=21817):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/diag/rdbms/travel/travel/incident/incdir_21817/travel_j004_4692_i21817.trc
Sun Jan 20 00:25:43 2013
Dumping diagnostic data in directory=[cdmp_20130120002543], requested by (instance=1, osid=4692 (J004)), summary=[incident=21817].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Jan 20 00:25:47 2013

這個ora-600錯誤很常見了,通過是index信息不一致。

其trace內容如下:
* kdsgrp1-1: *************************************************
row 0x0040589e.4b continuation at
0x0040589e.4b file# 1 block# 22686 slot 75 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 75 ..... nrows: 75
kdsgrp - dump CR block dba=0x0040589e
Block header dump: 0x0040589e
Object id on Block Y
seg/obj: 0x12 csc: 0x00.3bca9 itc: 1 flg: O typ: 1 - DATA
fsl: 0 fnx: 0x40589f ver: 0x01

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0001.01a.000000a1 0x00c05e97.001d.0d --U- 1 fsc 0x0053.0003bcab
bdba: 0x0040589e
data_block_dump,data header at 0x7a26a044
===============
tsiz: 0x1fb8
hsiz: 0xa8
pbl: 0x7a26a044
76543210
flag=--------
ntab=1
nrow=75
frre=-1
fsbo=0xa8
...........

seg/obj: 0×12 這是obj$對象,屬於bootstrap$核心對象,涉及到這類的對象處理相對麻煩,針對這類情況,數據庫open後都建議把數據導出,然後重建庫,我這裏就不在繼續描述這個ora-600錯誤了,我博客也有類似的例子。

最後來個小節:

1. oracle通過系統checkpoint scn,datafile checkpoint scn,start scn三者之間的比較來判斷數據文件是否需要進行介質恢復.
2. 在redo 線程打開的情況下,即數據庫open的情況下,stop scn會被設置為無窮大,當正常關閉時,stop scn等於datafile scn.
這裏需要註意的是,stop scn是存放在controlfile中的,網上部分資料說是存在datafile header中,這個說法是錯誤的。
3. oracle在open之前是先判斷是否進行介質恢復,然後再是判斷是否進行instance recovery。
4. 關於4種scn的關系如下:

system checkpoint scn — 存放在controlfile中
datafile checkpoint scn –存放在controlfile中
start scn —存放在datafile header中
stop scn —存放在controlfile中


system scn,datafile checkpoint scn,start scn,這3種scn用於判斷數據文件是否需要進行介質恢復。這3個相等這不需要介質恢復。
如何這4個都相等,那麽就不需要進行實例恢復。stop scn是用於判斷是否進行實例恢復的。

5. 如果stop scn比其他的幾個scn要大,那麽就需要進行instance recover,需要進行掃描redo,實例恢復的起點是low cache rba,終點
是redo log的最末端。

Oracle報錯ORA-16433非歸檔丟失redo無法啟動的恢復過程