1. 程式人生 > 其它 >資料庫11g升級中一次奇怪的問題 (30天)

資料庫11g升級中一次奇怪的問題 (30天)

客戶的測試環境已經從10g升級到11g了。但是沒過幾天,資料hang住了,登都登不了了,而且通過sys,system,普通使用者連線的錯誤都不一樣

首先通過 一下命令來檢視變數和程序是否都正常

 ps -ef|grep smon  
echo $ORACLE_SID

沒發現問題

--使用sys,顯示連線到一個空例項
sqlplus / as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Tue Aug 13 14:51:25 2013
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> 
--使用system連線,顯示是oracle不在狀態,這和standby物理備庫在apply的時候連線進來的情況類似,但是這個庫壓根沒用dataguard。
sqlplus system/xxxx@TEST
SQL*Plus: Release 11.2.0.2.0 Production on Tue Aug 13 14:49:17 2013
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
ERROR:
ORA-01033: ORACLE initialization or shutdown in progress
Process ID: 0
Session ID: 0 Serial number: 0
--使用普通使用者連線,顯示oracle例項不可用
sqlplus TEST/TEST
SQL*Plus: Release 11.2.0.2.0 Production on Tue Aug 13 14:52:51 2013
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0

我檢視alert日誌的時候就發現瞭如下的錯誤在日誌中反覆出現。

ORA-20011 ORA-29913 and ORA-29400 , KUP-XXXXX Errors
-----------------from alert log------------------------

Tue Aug 13 22:00:04 2013
XDB installed.
XDB initialized.
Tue Aug 13 22:00:17 2013
Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
Tue Aug 13 22:00:24 2013
DBMS_STATS: GATHER_STATS_JOB encountered errors.  Check the trace file.
Errors in file /dbccbsPT1/oracle/xxxx/oradmp/bdump/diag/rdbms/xxxx/xxxx/trace/xxxx_j000_17725.trc:
ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-11010: unable to open at least one dump file for fetch
Tue Aug 13 22:25:18 2013
Errors in file /dbccbsPT1/oracle/xxxx/oradmp/bdump/diag/rdbms/xxxx/xxxx/trace/xxxx_j002_17729.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_133"
ORA-20000: ORU-10027: buffer overflow, limit of 20000 bytes
ORA-06512: at "SYS.DBMS_ADVISOR", line 201
ORA-06512: at "SYS.DBMS_SPACE", line 2465
ORA-06512: at "SYS.DBMS_SPACE", line 2538
Wed Aug 14 02:00:00 2013
Closing scheduler window
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter

檢視對應的trace檔案,發現如下的日誌

Starting background process VKRM
Tue Aug 13 22:00:00 2013
VKRM started with pid=31, OS id=17443 
trace file1:
*** 2013-08-13 22:00:24.222
*** SESSION ID:(6238.93) 2013-08-13 22:00:24.222
*** CLIENT ID:() 2013-08-13 22:00:24.222
*** SERVICE NAME:(SYS$USERS) 2013-08-13 22:00:24.222
*** MODULE NAME:(DBMS_SCHEDULER) 2013-08-13 22:00:24.222
*** ACTION NAME:(ORA$AT_OS_OPT_SY_132) 2013-08-13 22:00:24.222
ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-11010: unable to open at least one dump file for fetch
*** 2013-08-13 22:00:24.230
DBMS_STATS: GATHER_STATS_JOB: GATHER_TABLE_STATS('"SYSTEM"','"TEST_TABLE_TARGET_EXT"','""', ...)
DBMS_STATS: ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-11010: unable to open at least one dump file for fetch
*** 2013-08-13 22:00:24.252
DBMS_STATS: GATHER_STATS_JOB: GATHER_TABLE_STATS('"SYSTEM"','"TEST_TABLE_SOURCE_EXT"','""', ...)
DBMS_STATS: ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-11010: unable to open at least one dump file for fetch

這個問題很容易和其他外部的原因聯絡起來,首先是例項不可用的問題,想要查問題,連都連不進去。 本來打算使用Hanganalyze來分析一下。但sqlplus連不進去,

sqlplus -prelim /nolog之後再connect / as sysdba也不行

最後和Unix team的人溝通了一下,他們有完整的備份,

我記得前幾天storage好像有問題了,和他們的人確認了下,他們最後發現時storage的問題,及時的修復了。

例項可以連上了。檢視alert還是發現會有如上alert裡面的ora 錯誤,我就有點凌亂了。從metalink上可以看到這個問題很可能是datapump相關的問題導致的

排除了job中有datapump相關的job,注意力集中在了外部表上

我採用瞭如下的方式

SQL> spool obj.out
SQL> set linesize 200 trimspool on
SQL> set pagesize 2000
SQL> col owner form. a30
SQL> col created form. a25
SQL> col last_ddl_time form. a25
SQL> col object_name form. a30
SQL> col object_type form. a25
SQL> 
SQL> select OWNER,OBJECT_NAME,OBJECT_TYPE, status,
  2  to_char(CREATED,'dd-mon-yyyy hh24:mi:ss') created
  3  ,to_char(LAST_DDL_TIME , 'dd-mon-yyyy hh24:mi:ss') last_ddl_time
  4  from dba_objects
  5  where object_name like 'ET$%'
  6  /
no rows selected
SQL> select owner, TABLE_NAME, DEFAULT_DIRECTORY_NAME, ACCESS_TYPE
  2  from dba_external_tables
  3  order by 1,2
  4  /
OWNER                          TABLE_NAME                     DEFAULT_DIRECTORY_NAME         ACCESS_
------------------------------ ------------------------------ ------------------------------ -------
SYSTEM                         TEST_TABLE_SOURCE_EXT        DATA_PUMP_DIR                  CLOB
SYSTEM                         TEST_TABLE_TARGET_EXT        DATA_PUMP_DIR                  CLOB

查詢到如上的兩個外部表,基本可以找到問題了。

--使用system連線上來,看看那個表

SQL> conn system/xxx
Connected.
SQL> select count(*)from  TEST_TABLE_SOURCE_EXT;
select count(*)from  TEST_TABLE_SOURCE_EXT
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-11010: unable to open at least one dump file for fetch

發現 報錯正式alert中的錯誤,仔細分析了下,是因為我的這個外部表的directory當時是用了系統中預設的data_pump_dir,升級以後ORACLE_HOME改變了,所以原本的dump所在的目錄就不可用了。

因為那兩個外部表是之前臨時抽取資料用的,所以可以刪掉了。

SQL> drop table TEST_TABLE_SOURCE_EXT;
Table dropped.
SQL> drop table TEST_TABLE_TARGET_EXT;
Table dropped.

隱患排除了,alert再沒有報這個錯。一切正常了。

metalink中對於這個問題的原因描述如下:Doc ID 1274653.1

CAUSE

The primary cause of this issue is that an external table existed at some point in time but does not now. However, the database still believes the table exists since the dictionary information about the object has not been modified to reflect the change. When DBMS_STATS is run against the table in question, it makes a call out to the external table which fails because the object is not there. There are many reasons that an external table may not exist including:

  • Temporary Datapump external tables have not been cleaned up properly. The dictionary information should have been dropped when the DataPump jobs completed.
  • An External table has been removed without clearing up the corresponding data dictionary information. For example: Oracle Demo Schema Tables such as the external table “SALES_TRANSACTIONS_EXT” may have been removed but the dictionary has not been updated to reflect this. The "SALES_TRANSACTIONS_EXT" table is an external table in the "SH" schema which is one of Demo Schema provided by Oracle.