一條sql語句“導致”的資料庫宕機問題及分析（38天)

阿新 • • 發佈：2022-05-04

最近測試環境需要做一些變更，把測試環境切分成兩套環境，儲存空間也需要壓縮壓縮和整理。 unix組的人已經開始做空間劃分了，然後我們需要在此基礎上重建一套環境。有些資料檔案使用空間不大，所以準備壓縮一下。用了下面的sql語句，結果跑了十幾秒中就拋了下面的錯誤。

SQL> set linesize 200
SQL> col name for a40
SQL> col resizecmd for a80
SQL> select a.file#,a.name,a.bytes/1024/1024 CurrentMB,
  2         ceil(HWM * a.block_size)/1024/1024 ResizeTo,
  3         (a.bytes - HWM * a.block_size)/1024/1024 ReleaseMB,
  4         'alter database datafile '''||a.name||''' resize '||
  5         ceil(HWM * a.block_size/1024/1024) || 'M;' ResizeCMD
  6  from v$datafile a,
  7       (select file_id,max(block_id+blocks-1) HWM
  8         from dba_extents
  9         group by file_id) b
 1	0  where a.file# = b.file_id(+)
 11  and (a.bytes - HWM *block_size)>0
order by 5     12  ;
order by 5
             *
ERROR at line 12:
ORA-01116: error in opening database file 4
ORA-01110: data file 4: '/testPT1/oracle/TEST01/oratmp01/temp/ggs_data01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

再一檢視資料庫程序，都沒有了，看來資料庫是宕了。我還想這條sql語句真是厲害，看看日誌裡面怎麼說。 Tue Mar 25 22:04:19 2014 End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Wed Mar 26 02:00:00 2014 Closing scheduler window Closing Resource Manager plan via scheduler window Clearing Resource Manager plan via parameter Mon Mar 31 10:14:55 2014 USER (ospid: 21846): terminating the instance due to error 472 Instance terminated by USER, pid = 21846 先把庫重啟了，看有沒有什麼問題。

SQL> startup nomount ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance ORACLE instance started. Total System Global Area 1.2827E+10 bytes Fixed Size 2253880 bytes Variable Size 4211084232 bytes Database Buffers 8589934592 bytes Redo Buffers 24096768 bytes SQL> alter database mount; Database altered. 當準備Open的時候，報了下面的錯誤。說有個資料檔案丟失了。 SQL> alter database open; alter database open * ERROR at line 1: ORA-01157: cannot identify/lock data file 4 - see DBWR trace file ORA-01110: data file 4: '/testPT1/oracle/TEST01/oratmp01/temp/ggs_data01.dbf' 檢視備份情況，是否有熱備份之類的操作遺留。可以看到這個檔案確實是損壞了或者被認為刪除了。 SQL> select *from v$backup; FILE# STATUS CHANGE# TIME ---------- ------------------ ---------- --------- 1 NOT ACTIVE 1.0583E+13 22-JAN-13 2 NOT ACTIVE 1.0583E+13 22-JAN-13 3 NOT ACTIVE 1.0583E+13 22-JAN-13 4 FILE NOT FOUND 0 ... 40 NOT ACTIVE 1.0583E+13 22-JAN-13 檢視檔案的路徑。 1* select file#,name from v$datafile 4 /testPT1/oracle/TEST01/oratmp01/temp/ggs_data01.dbf 確認了下這個資料檔案goldengate用，現在測試環境上還沒有goldengate，可以刪除，於是頭腦一發熱，就準備馬上刪了，結果報了錯。 SQL> drop tablespace GGS_DATA including contents and datafiles cascade constraint; drop tablespace GGS_DATA including contents and datafiles cascade constraint * ERROR at line 1: ORA-01109: database not open 才反應過來資料庫還在mount狀態先把資料檔案offline了 SQL> alter database datafile '/testPT1/oracle/TEST01/oratmp01/temp/ggs_data01.dbf' offline; Database altered. 這時候再檢視v$backup,那條記錄就不復存在了。 FILE# STATUS CHANGE# TIME ---------- ------------------ ---------- --------- 1 NOT ACTIVE 1.0583E+13 22-JAN-13 2 NOT ACTIVE 1.0583E+13 22-JAN-13 3 NOT ACTIVE 1.0583E+13 22-JAN-13 5 NOT ACTIVE 1.0583E+13 22-JAN-13 6 NOT ACTIVE 1.0583E+13 22-JAN-13 7 NOT ACTIVE 1.0583E+13 22-JAN-13 8 NOT ACTIVE 1.0583E+13 22-JAN-13 9 NOT ACTIVE 1.0583E+13 22-JAN-13 10 NOT ACTIVE 1.0583E+13 22-JAN-13 11 NOT ACTIVE 1.0583E+13 22-JAN-13 12 NOT ACTIVE 1.0583E+13 22-JAN-13 .... 39 rows selected. 把資料庫open起來。 SQL> alter database open; Database altered. 然後再刪除，其實這個階段也只是釋放了控制代碼和更新了資料字典。 SQL> drop tablespace ggs_data including contents and datafiles cascade constraint; Tablespace dropped. 再次檢視資料庫程序是否正常。可以看到資料庫已經正常了。 test01@ccbdbpt4:/opt/app/oracle/TEST01> ps -ef|grep smon test018420 5954 0 10:39 pts/2 00:00:00 grep smon test0110295 1 0 10:18 ? 00:00:01 ora_smon_TEST01 test01@ccbdbpt4:/opt/app/oracle/TEST01> sqlplus / as sysdba SQL*Plus: Release 11.2.0.2.0 Production on Mon Mar 31 10:40:08 2014 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> show user USER is "SYS" SQL> select status from v$instance; STATUS ------------ OPEN 我在另一個環境上碰到了類似的問題。 test01@ccbdbpt4:/opt/app/oracle/TEST01/test> sqlplus / as sysdba SQL*Plus: Release 11.2.0.2.0 Production on Thu Apr 3 17:26:43 2014 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> SQL> select dbms_metadata.get_ddl('USER',u.username) from dba_users u WHERE USER NAME in('TEST'); ERROR: ORA-03113: end-of-file on communication channel Process ID: 4056 Session ID: 4159 Serial number: 15 no rows selected 檢視日誌，錯誤還是類似。 Mon Mar 31 11:47:47 2014 USER (ospid: 26945): terminating the instance due to error 472 Instance terminated by USER, pid = 26945 這個問題的總結如下導致資料庫宕機的原因不是因為sql語句，而是因為unix組做檔案的操作中，導致資料庫程序問題，資料庫其實已經停了，但是我們仍然可以進行簡單的操作。因為作業系統中控制代碼還在。那個資料檔案的丟失和unix組的人確認是認為的失誤，他們在做空間切分的時候，沒有停庫。

一條sql語句“導致”的資料庫宕機問題及分析（38天)

一條sql語句導致的資料庫宕機問題及分析（42天)

一條sql語句“導致”的資料庫宕機問題及分析（38天)

騰訊面試：一條SQL語句執行得很慢的原因有哪些？---不看後悔系列(推薦)

詳解一條sql語句在mysql中是如何執行的

一條SQL語句查詢多個數據庫

一條sql語句完成MySQL去重留一

從 MySQL 架構設計出發，看它是如何執行一條 SQL 語句的

一條sql語句在mysql中是如何執行的

MySQL一條SQL語句的執行過程

一條SQL語句的執行過程

一條sql語句完成MySQL去重留保留一條記錄

一條sql語句的改進探索(r5筆記第70天)

一條insert語句導致的效能問題分析（一)(r8筆記第40天)

一條insert語句導致的效能問題分析（二)（r8筆記第43天)

一條SQL語句的執行計劃變化探究（r10筆記第3天）

一條SQL語句的執行計劃變化探究（r10筆記第9天）

一條SQL語句提交後，db2都做了什麼？

MySQL45講——日誌系統：一條SQL更新語句是如何執行的學習筆記

如何在Oracle中一次執行多條sql語句 (.net C#)

一條 SQL 查詢語句是如何執行的？

一條sql語句“導致”的資料庫宕機問題及分析 （38天)

相關推薦

一條sql語句“導致”的資料庫宕機問題及分析（38天)