你所不知道的Oracle後臺程序SMON功能
SMON(system monitor process)系統監控後臺程序,有時候也被叫做system cleanup process,這麼叫的原因是它負責完成很多清理(cleanup)任務。但凡學習過Oracle基礎知識的技術人員都會或多或少對該background process的功能有所瞭解。
我們所熟知的SMON是個兢兢業業的傢伙,它負責完成一些列系統級別的任務。與PMON(Process Monitor)後臺程序不同的是,SMON負責完成更多和整體系統相關的工作,這導致它會去做一些不知名的”累活”,當系統頻繁產生這些”垃圾任務”,則SMON可能忙不過來。因此在10g中SMON
瞭解你所不知道的SMON功能(一):清理臨時段
觸發場景
很多人錯誤地理解了這裡所說的臨時段temporary segments,認為temporary segments是指temporary
tablespace臨時表空間上的排序臨時段(sort segment)
永久表空間上同樣存在臨時段,譬如當我們在某個永久表空間上使用create table/index等DDL命令建立某個表/索引時,服務程序一開始會在指定的永久表空間上分配足夠多的區間(Extents),這些區間在命令結束之前都是臨時的(Temporary
Extents),直到表/索引完全建成才將該temporary segment
對於永久表空間上的temporary segment,SMON會三分鐘清理一次(前提是接到post),如果SMON過於繁忙那麼可能temporary segment長期不被清理。temporary segment長期不被清理可能造成一個典型的問題是:在rebuild index online失敗後,後續執行的rebuild index命令要求之前產生的temporary segment已被cleanup,如果cleanup沒有完成那麼就需要一直等下去。在10gR2中我們可以使用dbms_repair.online_index_clean來手動清理online index rebuild的遺留問題:
The dbms_repair.online_index_clean function has been created to cleanup online index rebuilds.
Use the dbms_repair.online_index_clean function to resolve the issue.
Please note if you are unable to run the dbms_repair.online_index_clean function it is due to the fact
that you have not installed the patch for Bug 3805539 or are not running on a release that includes this fix.
The fix for this bug is a new function in the dbms_repair package called dbms_repair.online_index_clean,
which has been created to cleanup online index [[sub]partition] [re]builds.
New functionality is not allowed in patchsets;
therefore, this is not available in a patchset but is available in 10gR2.
Check your patch list to verify the database is patched for Bug 3805539
using the following command and patch for the bug if it is not listed:
opatch lsinventory -detail
Cleanup after a failed online index [re]build can be slow to occurpreventing subsequent such operations
until the cleanup has occured.
接著我們通過實踐來看一下smon是如何清理永久表空間上的temporary segment的:
設定10500事件以跟蹤smon程序,這個診斷事件後面會介紹
SQL> alter system set events '10500 trace name context forever,level 10';
System altered.
在第一個會話中執行create table命令,這將產生一定量的Temorary Extents
SQL> create table smon as select * from ymon;
在另一個會話中執行對DBA_EXTENTS檢視的查詢,可以發現產生了多少臨時區間
SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';
COUNT(*)
----------
117
終止以上create table的session,等待一段時間後觀察smon後臺程序的trc可以發現以下資訊:
*** 2011-06-07 21:18:39.817
SMON: system monitor process posted msgflag:0x0200 (-/-/-/-/TMPSDROP/-/-)
*** 2011-06-07 21:18:39.818
SMON: Posted, but not for trans recovery, so skip it.
*** 2011-06-07 21:18:39.818
SMON: clean up temp segments in slave
SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';
COUNT(*)
----------
0
可以看到smon通過slave程序完成了對temporary segment的清理
與永久表空間上的臨時段不同,出於效能的考慮臨時表空間上的Extents並不在操作(operations)完成後立即被釋放和歸還。相反,這些Temporary Extents會被標記為可用,以便用於下一次的排序操作。SMON仍會清理這些Temporary segments,但這種清理僅發生在例項啟動時(instance startup):
For performance issues, extents in TEMPORARY tablespaces are not released ordeallocated
once the operation is complete.Instead, the extent is simply marked as available for the next sort operation.
SMON cleans up the segments at startup.
A sort segment is created by the first statement that used a TEMPORARY tablespacefor sorting, after startup.
A sort segment created in a TEMPOARY tablespace is only released at shutdown.
The large number of EXTENTS is caused when the STORAGE clause has been incorrectly calculated.
現象
可以通過以下查詢瞭解資料庫中Temporary Extent的總數,在一定時間內比較其總數,若有所減少那麼說明SMON正在清理Temporary segment
SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';
也可以通過v$sysstat檢視中的”SMON posted for dropping temp segment”事件統計資訊來了解SMON收到清理要求的情況:
SQL> select name,value from v$sysstat where name like '%SMON%';
NAME VALUE
---------------------------------------------------------------- ----------
total number of times SMON posted 8
SMON posted for undo segment recovery 0
SMON posted for txn recovery for other instances 0
SMON posted for instance recovery 0
SMON posted for undo segment shrink 0
SMON posted for dropping temp segment 1
另外在清理過程中SMON會長期持有Space Transacton(ST)佇列鎖,其他會話可能因為得不到ST鎖而等待超時出現ORA-01575錯誤:
01575, 00000, "timeout waiting for space management resource"
// *Cause: failed to acquire necessary resource to do space management.
// *Action: Retry the operation.
如何禁止SMON清理臨時段
可以通過設定診斷事件event=’10061 trace name context forever, level 10′禁用SMON清理臨時段(disable SMON from cleaning temp segments)。
alter system set events '10061 trace name context forever, level 10';
相關診斷事件
瞭解你所不知道的SMON功能(二):合併空閒區間
SMON的作用還包括合併空閒區間(coalesces free extent)
觸發場景
早期Oracle採用DMT字典管理表空間,不同於今時今日的LMT本地管理方式,DMT下通過對FET$和UET$2張字典基表的遞迴操作來管理區間。SMON每5分鐘(SMON wakes itself every 5 minutes and checks for tablespaces with default pctincrease != 0)會自發地去檢查哪些預設儲存引數pctincrease不等於0的字典管理表空間,注意這種清理工作是針對DMT的,而LMT則無需合併。SMON對這些DMT表空間上的連續相鄰的空閒Extents實施coalesce操作以合併成一個更大的空閒Extent,這同時也意味著SMON需要維護FET$字典基表。
現象
以下查詢可以檢查資料庫中空閒Extents的總數,如果這個總數在持續減少那麼說明SMON正在coalesce free space:
SELECT COUNT(*) FROM DBA_FREE_SPACE;
在合併區間時SMON需要排他地(exclusive)持有ST(Space Transaction)佇列鎖, 其他會話可能因為得不到ST鎖而等待超時出現ORA-01575錯誤。同時SMON可能在繁瑣的coalesce操作中消耗100%的CPU。
如何禁止SMON合併空閒區間
可以通過設定診斷事件event=’10269 trace name context forever, level 10′來禁用SMON合併空閒區間(Don’t do coalesces of free space in SMON)
10269, 00000, "Don't do coalesces of free space in SMON" // *Cause: setting this event prevents SMON from doing free space coalesces alter system set events '10269 trace name context forever, level 10';
瞭解你所不知道的SMON功能(三):清理obj$基表
SMON的作用還包括清理obj$資料字典基表(cleanup obj$)
OBJ$字典基表是Oracle Bootstarp啟動自舉的重要物件之一:
SQL> set linesize 80 ; SQL> select sql_text from bootstrap$ where sql_text like 'CREATE TABLE OBJ$%'; SQL_TEXT -------------------------------------------------------------------------------- CREATE TABLE OBJ$("OBJ#" NUMBER NOT NULL,"DATAOBJ#" NUMBER,"OWNER#" NUMBER NOT N ULL,"NAME" VARCHAR2(30) NOT NULL,"NAMESPACE" NUMBER NOT NULL,"SUBNAME" VARCHAR2( 30),"TYPE#" NUMBER NOT NULL,"CTIME" DATE NOT NULL,"MTIME" DATE NOT NULL,"STIME" DATE NOT NULL,"STATUS" NUMBER NOT NULL,"REMOTEOWNER" VARCHAR2(30),"LINKNAME" VAR CHAR2(128),"FLAGS" NUMBER,"OID$" RAW(16),"SPARE1" NUMBER,"SPARE2" NUMBER,"SPARE3 " NUMBER,"SPARE4" VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE ( INITIAL 16K NEXT 1024K MINEXTEN TS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 18 EXTENTS (FILE 1 BLOCK 121))
觸發場景
OBJ$基表是一張低階資料字典表,該���幾乎對庫中的每個物件(表、索引、包、檢視等)都包含有一行記錄。很多情況下,這些條目所代表的物件是不存在的物件(non-existent),引起這種現象的一種可能的原因是物件本身已經被從資料庫中刪除了,但是物件條目仍被保留下來以滿足消極依賴機制(negative dependency)。因為這些條目的存在會導致OBJ$表不斷膨脹,這時就需要由SMON程序來刪除這些不再需要的行。SMON會在例項啟動(after startup of DB is started cleanup function again)時以及啟動後的每12個小時執行一次清理任務(the cleanup is scheduled to run after startup and then every 12 hours)。
我們可以通過以下演示來了解SMON清理obj$的過程:
SQL> BEGIN 2 FOR i IN 1 .. 5000 LOOP 3 execute immediate ('create synonym gustav' || i || ' for 4 perfstat.sometable'); 5 execute immediate ('drop synonym gustav' || i ); 6 END LOOP; 7 END; 8 / PL/SQL procedure successfully completed. SQL> startup force; ORACLE instance started. Total System Global Area 1065353216 bytes Fixed Size 2089336 bytes Variable Size 486542984 bytes Database Buffers 570425344 bytes Redo Buffers 6295552 bytes Database mounted. Database opened. SQL> select count(*) from user$ u, obj$ o 2 where u.user# (+)=o.owner# and o.type#=10 and not exists 3 (select p_obj# from dependency$ where p_obj# = o.obj#); COUNT(*) ---------- 5000 SQL> / COUNT(*) ---------- 5000 SQL> / COUNT(*) ---------- 4951 SQL> oradebug setospid 18457; Oracle pid: 8, Unix process pid: 18457, image: [email protected] (SMON) SQL> oradebug event 10046 trace name context forever ,level 1; Statement processed. SQL> oradebug tracefile_name; /s01/admin/G10R2/bdump/g10r2_smon_18457.trc select o.owner#, o.obj#, decode(o.linkname, null, decode(u.name, null, 'SYS', u.name), o.remoteowner), o.name, o.linkname, o.namespace, o.subname from user$ u, obj$ o where u.use r#(+) = o.owner# and o.type# = :1 and not exists (select p_obj# from dependency$ where p_obj# = o.obj#) order by o.obj# for update <