又遇BUG-ORA-01148:資料檔案忽然變為recover狀態
RAC環境,資料檔案狀態變為recover,檢視alert日誌有如下報錯:
Wed Jun 26 02:31:03 2013
Thread 1 advanced to log sequence 33187
Current log# 1 seq# 33187 mem# 0: +TJDISK/tj/onlinelog/group_1.257.757797483
Wed Jun 26 10:10:03 2013
Errors in file /opt/app/diag/rdbms/tj/tj1/trace/tj1_dbw0_6145.trc:
ORA-01148: cannot refresh file size for datafile 17
ORA-01110: data file 17: '+TJDISK/tj/datafile/ntj_index03.301.757894747'
ORA-01031: insufficient privileges
Automatic datafile offline due to media error on
file 17: +TJDISK/tj/datafile/ntj_index03.301.757894747
Unexpected communication failure with ASM instance:
error 1031 (ORA-01031: insufficient privileges
)
Wed
分析:
1.檢視所有節點的messages系統日誌、asm日誌均沒有出現錯誤資訊。
2.檢視DG的raw許可權,也沒有異常。
/dev/raw/raw6
/dev/raw/raw7
3.該Datafile為autoextend模式。
SQL> select file_name,autoextensible from dba_data_files where file_name like '+TJDISK/tj/datafile/ntj_index03.301.757894747';
FILE_NAME
--------------------------------------------------------------------------------
AUT
---
+TJDISK/tj/datafile/ntj_index03.301.757894747
YES
最後查了下Metalink,懷疑是命中了Oracle的一個BUG:Bug 16734525或Bug 9357097(Bug 16734525 is the duplicate of Bug 9357097)。
Hdr: 16734525 10.2.0.5 RDBMS 11.1.0.7 ASM PRODID-5 PORTID-23 ORA-1148 9357097
Abstract: ORA-1148: CANNOT REFRESH FILE SIZE FOR DATAFILE
*** 04/27/13 02:21 am ***
PROBLEM:--------
Fri Apr 26 11:31:28 EDT 2013
Redo Shipping Client Connected as PUBLIC--
Connected User is ValidRedo Shipping Client Connected as PUBLIC--
Connected User is Valid
Fri Apr 26 11:44:55 EDT 2013
Errors in file /home/oracle/admin/ctopprul/bdump/ctopprul1_dbw0_20315.trc:
ORA-1148: cannot refresh file size for datafile 340
ORA-1110: data file 340: '+DATA/ctopprul_rdc/datafile/wires_data.1968.789654733'
ORA-1031: insufficient privilegesFri
Apr 26 11:44:55 EDT 2013
Automatic datafile offline due to media error onfile 340: +DATA/ctopprul_rdc/datafile/wires_data.1968.789654733
Fri Apr 26 11:44:59 EDT 2013
Unexpected communication failure with ASM instance: error 1031
ORA-1031: insufficient privileges)
NOTE: ASMB process state dumped to trace file /home/oracle/admin/ctopprul/bdump/ctopprul1_dbw0_20315.trc
NOTE: force a map free for map id 345 DIAGNOSTIC
ANALYSIS:--------------------
1. Matches the bug 9357097: SMALL BEEHIVE: FAILURE TO REFRESH FILE SIZE DUE TO SPACE OFFLINES DATAFILE
Need to confirm from DEV as audit file space issues were not there
2. Not using role separation and oracle executable is with correct permissions
3. CT is not sure if dbv or rman validate was run on the problematic datafiledue to media error
ORA-1148: cannot refresh file size for datafile 340
ORA-1110: data file 340: '+DATA/ctopprul_rdc/datafile/wires_data.1968.789654733'
ORA-1031: insufficient privileges
Fri Apr 26 11:44:55 EDT 2013
Automatic datafile offline due to media error on >>>>>>>>>>>> Media error
4. Ulimit was showing nofiles of low value Customer Visible
[Open Update screen]
[Double Click on Activity Text to enable Save operation]
[Audit]Hi team, Oracle:----------- -
Checked if there was any space issues on the server and nothing foundas the above bug is hit when audit files are not able to write-OS watcher logs shows normal
WORKAROUND:-----------
RELATED BUGS:-------------
REPRODUCIBILITY:----------------
TEST CASE:----------
STACK TRACE:------------
SUPPORTING INFORMATION:-----------------------
Uploaded all the relevant info to the bug 24 HOUR CONTACT INFORMATION FOR P1
BUGS:----------------------------------------
DIAL-IN INFORMATION:--------------------
IMPACT DATE:------------
Bug 9357097
Symptoms:
Related To:
1 Error May Occur
2 ORA-1148 / ORA-372 / ORA-376
Range of versions believed to be affected <-- 12.1下的版本都有可能命中
Versions BELOW 12.1
Versions confirmed as being affected
?11.2.0.1
?11.1.0.7
?10.2.0.5
?10.2.0.4
Platforms affected
Generic (all / most platforms affected)
Fixed:
This issue is fixed in <-- 12.1.0.1 11.2.0.2中已修復
?12.1.0.1 (Base Release)
?11.2.0.2 (Server Patch Set)
DBWR can offline the datafile with message "Automatic datafile offline due to media error"
if file size refresh fails with error ORA-1148.
As the file is offline, subsequent attempts to read the affected file produce
error ORA-372 or ORA-376 requiring media recovery.
解決方法:
臨時解決方法是將手動將檔案online,
Oracle並沒有提供專門的補丁,需要升級到對應版本才能徹底解決(11.2.0.2)。
診斷時在ASM例項中執行了以下指令碼。
SPOOL ASM_FIRST<instance#>.HTML
SET MARKUP HTML ON
set echo on
set pagesize 200
alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';
select 'THIS ASM REPORT WAS GENERATED AT: ==)> ' , sysdate " " from dual;
select 'HOSTNAME ASSOCIATED WITH THIS ASM INSTANCE: ==)> ' , MACHINE " " from v$session where program like '%SMON%';
select * from v$asm_diskgroup;
SELECT * FROM V$ASM_DISK ORDER BY GROUP_NUMBER,DISK_NUMBER;
SELECT * FROM V$ASM_CLIENT;
select * from V$ASM_ATTRIBUTE;
select * from gv$asm_operation;
select * from v$version;
show parameter asm
show parameter cluster
show parameter instance_type
show parameter instance_name
show parameter spfile
show sga
spool off
exit