一個oracle查詢引起的bug (r4筆記第59天)
任何軟體都不是完美的,oracle也是如此,隔一段時間就會收到oracle的郵件說建議打哪些安全補丁什麼的。新發布的產品都是release 1,比如10gR1,穩定版本都在10gR2 不要小看著兩個大版本的變化,印象比較深的就是10g 10.2.0.1的安裝包有大概600多M,但是在10.2.0.2.0的補丁包就比安裝包還多,可見在產品線內做了很多的修改,才使得資料庫越來越穩定。
昨天下午在檢查一個問題的時候,發現數據庫日誌報出了ora-600的錯誤,這種症狀不清的錯誤只能求助於metalink了。
錯誤日誌的大體內容如下:
Thu Feb 26 11:06:35 2015
Archived Log entry 60642 added for thread 1 sequence 60576 ID 0xb8c6d509 dest 1:
Thu Feb 26 11:07:20 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/cust01/CUST01/trace/CUST01_p019_23657.trc (incident=2100684):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/cust01/CUST01/incident/incdir_2100684/CUST01_p019_23657_i2100684.trc
Thu Feb 26 11:07:57 2015
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Feb 26 11:07:58 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/cust01/CUST01/trace/CUST01_ora_27252.trc (incident=2119548):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/cust01/CUST01/incident/incdir_2119548/CUST01_ora_27252_i2119548.trc
Thu Feb 26 11:07:58 2015
Sweep [inc][2100684]: completed
Sweep [inc][2119548]: completed
Sweep [inc2][2100684]: completed
Thu Feb 26 11:07:58 2015
Dumping diagnostic data in directory=[cdmp_20150226110758], requested by (instance=1, osid=23657 (P019)), summary=[incident=2100684].
*** 2015-02-26 11:07:20.965 *** SESSION ID:(4404.677) 2015-02-26 11:07:20.965 *** CLIENT ID:() 2015-02-26 11:07:20.965 *** SERVICE NAME:(CUST01) 2015-02-26 11:07:20.965 *** MODULE NAME:(PL/SQL Developer) 2015-02-26 11:07:20.965 *** ACTION NAME:(SQL Window - select /*+ PARALLEL(csm,4) PARALLEL(crd,4) PARALLEL) 2015-02-26 11:07:20.965 Dump continued from file: /opt/app/oracle/dbccbspr1/diag/rdbms/cust01/CUST01/trace/CUST01_p019_23657.trc ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], [] ========= Dump for incident 2100684 (ORA 600 [srsnext_3]) ======== *** 2015-02-26 11:07:20.969 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=17a5yw0f09u66) ----- select /*+ PARALLEL(csm,4) PARALLEL(crd,4) PARALLEL(rater,4) */csm.customer_id,csm.ban,csm.coll_status,csm.l9_crd_status,csm.l9_col_status,crd.tot_obligation_pct,rater.tot_obligation_pct,rater.file_id,rater.sys_creation_date,rater.extract_status,rater.waiver_ind,rater.waiver_exp_date from csm_account csm, cl9_crd_mntr_fa crd, (SELECT * FROM (SELECT cl9_rater_input.*, ROW_NUMBER () OVER (PARTITION BY account_id ORDER BY sys_creation_date desc,notification_timestamp desc) AS RANK FROM cl9_rater_input ) WHERE RANK = 1) rater where csm.ban = crd.account_id and csm.customer_id = crd.customer_id and csm.l9_crd_status = 'PSUS' and csm.customer_id = rater.customer_id and csm.ban = rater.account_id --and payment.sys_creation_date > rater.sys_creation_date and crd.tot_obligation_pct != rater.tot_obligation_pct and rater.tot_obligation_pct < 101
單純來看語句的情況,真是看不出什麼問題,因為這也是客戶端執行的,執行頻率應該很低,而且從語句能夠看出有明顯的修改痕跡,所以就放棄了對這個語句進行進一步調優,直接看看metalink怎麼說。 最後找到一篇相關的文章 Query Fails with ORA-00600: Internal Error Code, Arguments: [srsnext_3] (Doc ID 1589589.1) 這個問題發生的版本是在11.2.0.2,和生產中的問題環境一致。
> sqlplus -v
SQL*Plus: Release 11.2.0.2.0 Production
oracle給出的解答如下:
CAUSE
Bug 11852469 : TS11.2.0.3V3 - TRC - SRSNEXT.
Rediscovery information:
If the srsnext_3 internal error is raised and the query involves statistical functions or other aggregates that are treated as distinct aggregates then you may be encountering this problem.
SOLUTION
Apply patch 11852469 if it exists for your version/platform
or
Apply patchset 11.2.0.3 where the fix is included
按照目前的情況,打資料庫的版本補丁還是需要評估的一件事,需要多方協調來完成。按照問題的情況,因為語句執行的很不頻繁,而且只是有客戶端做一個簡單查詢,所以綜合評估下來問題的影響範圍極小,在備份庫中也做了相同的語句測試,發現問題也不會復現,還是需要一些基本的環境和時機的。