1. 程式人生 > >wait for a undo record等待時間的分析與模擬

wait for a undo record等待時間的分析與模擬

RDBMS 11.2.0.4 RAC 

    昨天庫上發生了死鎖,原因是有個job,job呼叫procedure,而procudure有呼叫package。而package裡面寫了很多成對的insert、delete語句,大約有10幾對。而package裡面是沒有commit語句的。而procedure最後,有一個commit語句。開發在除錯這個job的時候,因為一些欄位問題,job中止了。這個時候,剛好又另一個開發在釋出程式,剛好用到job裡面insert和delete的那些表。結果就是庫卡的很厲害。查詢了下,發下有死鎖。感覺很熱鬧,雖然這個問題很快就處理了。

今天看了下alert log,發現該job當是除錯了不下10次,報錯了10幾次。這得回滾到..... 幸虧庫還沒有正式使用。

當是的alert log 。注意裡面的parallel query server .

Wed Dec 26 05:46:36 2018
Archived Log entry 1190 added for thread 2 sequence 622 ID 0x589404d6 dest 1:
Wed Dec 26 06:01:10 2018
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc  (incident=48793):
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/XXXX/XXXX2/incident/incdir_48793/XXXX2_p012_13163_i48793.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc:
ORA-10388: parallel query server interrupt (failure)
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc:
ORA-10388: parallel query server interrupt (failure)
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Wed Dec 26 06:01:13 2018
Dumping diagnostic data in directory=[cdmp_20181226060113], requested by (instance=2, osid=13163 (P012)), summary=[incident=48793].
Wed Dec 26 06:01:15 2018
Sweep [inc][48793]: completed
Sweep [inc2][48793]: completed
Wed Dec 26 06:06:26 2018
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc  (incident=48794):
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/XXXX/XXXX2/incident/incdir_48794/XXXX2_p012_13163_i48794.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc:
ORA-10388: parallel query server interrupt (failure)
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc:
ORA-10388: parallel query server interrupt (failure)
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Wed Dec 26 06:06:27 2018
Dumping diagnostic data in directory=[cdmp_20181226060627], requested by (instance=2, osid=13163 (P012)), summary=[incident=48794].
Wed Dec 26 06:06:28 2018
Sweep [inc][48794]: completed
Sweep [inc2][48794]: completed

     今天看了下當時的awr報告,發現有個等待時間wait for a undo record.

這個等待時間,查了下MOS,IF: Undo Related Wait Event - Wait for an Undo Record (文件 ID 1951704.1) 上面有一些說明。

官方建議是修改fast_start_parallel_rollback = false ,但是修改這個引數,也給出了一些建議,建議查下MOS。

關於這個引數,在官方文件上有說明

官方文件:https://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams091.htm#REFRN10059

FAST_START_PARALLEL_ROLLBACK specifies the degree of parallelism used when recovering terminated transactions. Terminated transactions are transactions that are active before a system failure. If a system fails when there are uncommitted parallel DML or DDL transactions, then you can speed up transaction recovery during startup by using this parameter.

Values:

  • FALSE

    Parallel rollback is disabled

  • LOW

    Limits the maximum degree of parallelism to 2 * CPU_COUNT

  • HIGH

    Limits the maximum degree of parallelism to 4 * CPU_COUNT

If you change the value of this parameter, then transaction recovery will be stopped and restarted with the new implied degree of parallelism.

這個引數,預設設定是low,也就是2*cpu_count。 所以,當回滾的時候,系統性能下降就很正常了。

下面模擬下這個等待事件的產生。

RDBMS 12.2.0.1 

首先,建立一個表,然後插入大量的資料,不要提交

create table rollback as select * from dba_objects;
insert into rollback select * from rollback;
[email protected]>insert into rollback select * from rollback;

80801 rows created.

[email protected]>/

161602 rows created.

[email protected]>/

323204 rows created.

[email protected]>/

646408 rows created.

[email protected]>/

1292816 rows created.

[email protected]>

檢視當前session對應的process id,並在os層面kill掉該程序

[email protected]>select spid from v$process where addr in (select paddr from v$session where sid in (select sid from v$mystat where rownum=1));

SPID
------------------------
17514

[email protected]>
kill -9 17514  

此時,檢視v$fast_start_transactions

檢視session等待時間

回滾完畢後,session等待事件沒有了。v$fast_start_transactions檢視

根據上圖的xid查詢,是哪個sql引起的

[email protected]>select distinct sql_id from V$ACTIVE_SESSION_HISTORY where xid=hextoraw('01000C0006510200');

SQL_ID
-------------
2ux4jwjr3g52b

[email protected]>select sql_id,sql_text from v$sql where sql_id='2ux4jwjr3g52b';

SQL_ID
-------------
SQL_TEXT
--------------------------------------------------------------------------------
2ux4jwjr3g52b
insert into rollback select * from rollback


[email protected]>

檢視awr報告。可以看到等待時間有wait for a undo record.

到此,這個問題搞清楚了。

END