1. 程式人生 > >記錄一則expdp任務異常處理案例

記錄一則expdp任務異常處理案例

環境:AIX 6.1 + Oracle 10.2.0.4
現象:在XTTS遷移測試階段,遇到執行幾個expdp的匯出任務,遲遲沒有返回任何資訊,對應日誌無任何輸出,檢視任務狀態:

SQL> 
set lines 300
col OWNER_NAME for a10
col OPERATION for a15
col JOB_MODE for a20
col STATE for a15
select * from dba_datapump_jobs; 

OWNER_NAME JOB_NAME                       OPERATION       JOB_MODE             STATE               DEGREE ATTACHED_SESSIONS DATAPUMP_SESSIONS
---------- ------------------------------ --------------- -------------------- --------------- ---------- ----------------- -----------------
SYS        SYS_EXPORT_TRANSPORTABLE_01    EXPORT          TRANSPORTABLE        DEFINING                 1                 0                 1
SYS        SYS_EXPORT_TRANSPORTABLE_02    EXPORT          TRANSPORTABLE        DEFINING                 1                 1                 2
SYS        SYS_EXPORT_TRANSPORTABLE_03    EXPORT          TRANSPORTABLE        DEFINING                 1                 1                 2
SYS        SYS_EXPORT_SCHEMA_01           EXPORT          SCHEMA               DEFINING                 1                 1                 2
SYS        SYS_EXPORT_TRANSPORTABLE_04    EXPORT          TRANSPORTABLE        DEFINING                 1                 1                 2
SYS        SYS_EXPORT_SCHEMA_02           EXPORT          SCHEMA               DEFINING                 1                 1                 2

6 rows selected.

可以看到所有的expdp匯出任務的STATE都停留在DEFINING狀態。

1.牛刀小試清異常

先強制殺掉後臺執行的所有expdp任務:

ps -ef|grep expdp|grep -v grep|awk '{print $2}'|xargs kill -9

然後嘗試刪除這些表(其實應該在not running狀態下刪除)

select 'drop table '||OWNER_NAME||'.'||JOB_NAME||' purge;' from dba_datapump_jobs where STATE='NOT RUNNING';

drop table sys.SYS_EXPORT_TRANSPORTABLE_01 purge;
..

可這樣是沒有作用的,查詢結果不變。
甚至嘗試正常shutdown immediate停止資料庫,也無法成功,告警日誌看到有活動呼叫:

Thu Nov  1 15:14:24 2018
Active call for process 4522064 user 'oracle' program '[email protected] (DM00)'
Active call for process 4456536 user 'oracle' program '[email protected] (DM01)'
Active call for process 10027180 user 'oracle' program '
[email protected]
(DM02)' Active call for process 7340140 user 'oracle' program '[email protected] (DM03)' Active call for process 6291888 user 'oracle' program '[email protected] (DM04)' Active call for process 8126596 user 'oracle' program '[email protected] (DM05)' SHUTDOWN: waiting for active calls to complete.

發現這些程序的id都對應了ora_dm的程序:

$ ps -ef|grep ora_dm
  oracle  4456536        1   0 17:00:09      -  0:00 ora_dm01_xxxxdb
  oracle  4522064        1   0 16:50:57      -  0:00 ora_dm00_xxxxdb
  oracle  7340140        1   0 14:06:07      -  0:00 ora_dm03_xxxxdb
  oracle  8126596        1   0 14:35:03      -  0:00 ora_dm05_xxxxdb
  oracle 10027180        1   0 13:55:08      -  0:00 ora_dm02_xxxxdb
  oracle  6291888        1   0 14:31:17      -  0:00 ora_dm04_xxxxdb
  oracle  7340432  8388786   0 15:22:59  pts/4  0:00 grep ora_dm

實際上,這就是expdp任務的相關程序,強制殺掉這些程序:

ps -ef|grep ora_dm|grep -v grep|awk '{print $2}'|xargs kill -9

之後資料庫關閉成功:

Thu Nov  1 15:24:37 2018
All dispatchers and shared servers shutdown
Thu Nov  1 15:24:37 2018
ALTER DATABASE CLOSE NORMAL

啟動資料庫後,再次查詢發現已經成功清理:

SQL> 
set lines 300
col OWNER_NAME for a10
col OPERATION for a15
col JOB_MODE for a20
col STATE for a15
select * from dba_datapump_jobs; 

no rows selected

小結:資料泵任務與ora_dm程序相關;如果資料泵任務發生異常,但任務並沒有退出的情況,需要同時殺掉這類程序(殺掉後狀態就會變為NOT RUNNING)。關庫不是必須的,只是演示此時正常關閉被阻塞的場景。這也能說明為什麼要保證在NOT RUNNING狀態下才可以清理。

2.追本溯源查MOS

上面的步驟只是清理了異常的資料泵任務,但沒有解決問題,再次後臺執行備份任務依然會重現故障:
nohup sh expdp_xtts.sh &

$ ps -ef|grep expdp
  oracle  6684914  8061208   0 15:30:07  pts/2  0:00 grep expdp
  oracle  7143482  8061208   0 15:30:03  pts/2  0:00 sh expdp_xtts.sh
  oracle  6685096  7143482   0 15:30:03  pts/2  0:00 expdp '/ as sysdba' parfile=expdp_xtts.par
$ ps -ef|grep ora_dm
  oracle  7602308  8061208   0 15:30:10  pts/2  0:00 grep ora_dm
  oracle  3997964        1   1 15:30:05      -  0:00 ora_dm00_xxxxdb
$ 

此時查詢dba_datapump_jobs,state依然一直是defining狀態:

OWNER_NAME JOB_NAME                       OPERATION       JOB_MODE                       STATE                              DEGREE ATTACHED_SESSIONS DATAPUMP_SESSIONS
---------- ------------------------------ --------------- ------------------------------ ------------------------------ ---------- ----------------- -----------------
SYS        SYS_EXPORT_TRANSPORTABLE_01    EXPORT          TRANSPORTABLE                  DEFINING                                1                 1                 2

其他的匯出任務都一樣,不再贅述。
為了方便測試,寫一個簡單的單表expdp匯出,現象也一樣。

expdp \'/ as sysdba\' directory=XTTS tables=query.test dumpfile=query_test.dmp logfile=query_test.log

根據故障現象,用如下關鍵字在MOS中搜索: expdp state DEFINING,匹配到文件:

  • DataPump Export/Import Hangs With "DEFINING" Status When Using A Directory On NFS Filesystem (文件 ID 2262196.1)

正好這次測試是在NFS檔案系統上,MOS建議移動到本地檔案系統匯出。

這次再將expdp程序全部殺掉:

ps -ef|grep ora_dm|grep -v grep|awk '{print $2}'|xargs kill -9
ps -ef|grep expdp|grep -v grep|awk '{print $2}'|xargs kill -9

此時查詢dba_datapump_jobs:

OWNER_NAME JOB_NAME                       OPERATION       JOB_MODE                       STATE               DEGREE ATTACHED_SESSIONS DATAPUMP_SESSIONS
---------- ------------------------------ --------------- ------------------------------ --------------- ---------- ----------------- -----------------
SYS        SYS_EXPORT_TABLE_04            EXPORT          TABLE                          NOT RUNNING              0                 0                 0
SYS        SYS_EXPORT_SCHEMA_01           EXPORT          SCHEMA                         NOT RUNNING              0                 0                 0
SYS        SYS_EXPORT_TABLE_02            EXPORT          TABLE                          NOT RUNNING              0                 0                 0
SYS        SYS_EXPORT_TABLE_05            EXPORT          TABLE                          NOT RUNNING              0                 0                 0
SYS        SYS_EXPORT_TABLE_03            EXPORT          TABLE                          NOT RUNNING              0                 0                 0
SYS        SYS_EXPORT_TABLE_01            EXPORT          TABLE                          NOT RUNNING              0                 0                 0
SYS        SYS_EXPORT_TRANSPORTABLE_01    EXPORT          TRANSPORTABLE                  NOT RUNNING              0                 0                 0

7 rows selected.

清理NOT RUNNING的master表:

select 'drop table '||OWNER_NAME||'.'||JOB_NAME||' purge;' from dba_datapump_jobs where STATE='NOT RUNNING';
--執行結果用來執行,再次檢視結果為空:
SQL> select * from dba_datapump_jobs;
no rows selected

按MOS建議,將匯出任務移動到本地檔案系統:
AIX源端匯出XTTS源資料至源端/hxbak/xtts_exp目錄中,而後copy至nfs共享儲存/xtts/dmp中:

mkdir /hxbak/xtts_exp
chown oracle:dba /hxbak/xtts_exp
ls -ld /hxbak/xtts_exp

select * from dba_directories;
create or replace directory XTTS as '/hxbak/xtts_exp';

此時測試expdp任務可正常執行:

$ expdp \'/ as sysdba\' directory=XTTS tables=query.test dumpfile=query_test.dmp logfile=query_test.log
Export: Release 10.2.0.4.0 - 64bit Production on Thursday, 01 November, 2018 16:03:21

Copyright (c) 2003, 2007, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Starting "SYS"."SYS_EXPORT_TABLE_01":  '/******** AS SYSDBA' directory=XTTS tables=query.test dumpfile=query_test.dmp logfile=query_test.log 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 8 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "QUERY"."TEST"                              6.743 MB   72593 rows
Master table "SYS"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TABLE_01 is:
  /hxbak/xtts_exp/query_test.dmp
Job "SYS"."SYS_EXPORT_TABLE_01" successfully completed at 16:03:57

SQL> select * from dba_datapump_jobs;

OWNER_NAME JOB_NAME                       OPERATION       JOB_MODE                       STATE               DEGREE ATTACHED_SESSIONS DATAPUMP_SESSIONS
---------- ------------------------------ --------------- ------------------------------ --------------- ---------- ----------------- -----------------
SYS        SYS_EXPORT_TABLE_01            EXPORT          TABLE                          EXECUTING                1                 1                 3

再次匯出其他元資料:

#expdp_xtts.sh (about 5min)
nohup sh expdp_xtts.sh &
#expdp_xtts_other.sh(about 5min)
nohup sh expdp_xtts_other.sh &
#expdp_tmp_table
nohup sh expdp_tmp_table01.sh &
nohup sh expdp_tmp_table02.sh &
nohup sh expdp_tmp_table03.sh &
nohup sh expdp_tmp_table04.sh &

最後將這些匯出檔案再移動到/xtts/dmp/下,供後續xtts測試目標端匯入使用:

$ pwd
/hxbak/xtts_exp
$ cp -rp * /xtts/dmp/ 

目標端匯入時只需要有讀這些檔案的許可權,即可,實際測試恢復OK。

小結:在自己的linux環境測試過是可以直接expdp到nfs檔案系統的,AIX看來有區別,MOS的建議也只是一個workaround,但也能滿足需求,畢竟元資料匯出檔案沒多大。