Oracle記憶體詳解之四 Buffer Cache 資料緩衝區
一. 官網說明
Memory Architecture
The database buffer cache is the portion of the SGA that holds copies of data blocks read from datafiles. All users concurrently connected to the instance share access to the database buffer cache.
This section includes the following topics:
(1)Organization of the Database Buffer Cache
The buffers in the cache are organized in two lists: the write list and the least recently used (LRU) list. The write list holds dirty buffers, which contain data that has been modified but has not yet been written to disk. The LRU list holds free buffers, pinned buffers, and dirty buffers that have not yet been moved to the write list. Free buffers do not contain any useful data and are available for use. Pinned buffers are currently being accessed.
When an Oracle Database process accesses a buffer, the process moves the buffer to the most recently used (MRU) end of the LRU list. As more buffers are continually moved to the MRU end of the LRU list, dirty buffers age toward the LRU end of the LRU list.
The first time an Oracle Database user process requires a particular piece of data, it searches for the data in the database buffer cache. If the process finds the data already in the cache (a cache hit), it can read the data directly from memory. If the process cannot find the data in the cache (a cache miss), it must copy the data block from a datafile on disk into a buffer in the cache before accessing the data. Accessing data through a cache hit is faster than data access through a cache miss.
Before reading a data block into the cache, the process must first find a free buffer. The process searches the LRU list, starting at the least recently used end of the list. The process searches either until it finds a free buffer or until it has searched the threshold limit of buffers.
If the user process finds a dirty buffer as it searches the LRU list, it moves that buffer to the write list and continues to search. When the process finds a free buffer, it reads the data block from disk into the buffer and moves the buffer to the MRU end of the LRU list.
If an Oracle Database user process searches the threshold limit of buffers without finding a free buffer, the process stops searching the LRU list and signals the DBW0 background process to write some of the dirty buffers to disk.
(2)The LRU Algorithm and Full Table Scans
When the user process is performing a full table scan, it reads the blocks of the table into buffers and puts them on the LRU end (instead of the MRU end) of the LRU list. This is because a fully scanned table usually is needed only briefly, so the blocks should be moved out quickly to leave more frequently used blocks in the cache.
You can control this default behavior of blocks involved in table scans on a table-by-table basis. To specify that blocks of the table are to be placed at the MRU end of the list during a full table scan, use the CACHE clause when creating or altering a table or cluster. You can specify this behavior for small lookup tables or large static historical tables to avoid I/O on subsequent accesses of the table.
二. Buffer Cache 說明
buffer cache is to minimize physical io. When a block is read by Oracle, it places this block into the buffer cache, because there is a chance that this block is needed again. Reading a block from the buffer cache is less costly (in terms of time) than reading it from the disk.
2.1 MRU and LRU blocks
Blocks within the buffer cache are ordered from MRU (most recently used) blocks to LRU (least recently used) blocks. Whenever a block is accessed, the block goes to the MRU end of the list, thereby shifting the other blocks down towards the LRU end. When a block is read from disk and when there is no buffer available in the db buffer cache, one block in the buffer cache has to "leave". It will be the block on the LRU end in the list.
However, blocks read during a full table (multi block reads are placed on the LRU side of the list instead of on the MRU side.
The v$bh dynamic view has an entry for each block in the buffer cache. The time a block has been touched most recently is recorded in tim ofx$bh
2.2 Touch count
Each buffer has an associated touch count. This touch count might be increased if a buffer is accessed (although it needs not always be). It is valid to claim that the higher the touch count, the more important (more used) the buffer. Therefore, buffers with a high touch count should stay in the buffer cache while buffers with a low touch count should age out in order to make room for other buffers.
A touch time can only be increased once within a time period controlled by the parameter_db_aging_touch_time (default: 3 seconds).
The touch count is recorded in the tch column of x$bh.
By the way, it looks like Oracle doesn't protect manipulations of the touch count in a buffer with alatch. This is interesting because all other manipulations on the LRU list are protected by latches. A side effect of the lack of latch-protection is that the touch count is not incremented if anotherprocess updates the buffer header.
2.3 x$bh
Information on buffer headers. Contains a record (the buffer header) for each block in thebuffer cache.
This select statement lists how many blocks are Available, Free and Being Used.
/* Formatted on 2011/6/28 14:34:08 (QP5 v5.163.1008.3004) */
SELECT COUNT (*), State
FROM (SELECT DECODE (state,
0, 'Free',
1, DECODE (lrba_seq, 0, 'Available', 'Being Used'),
3, 'Being Used',
state)
State
FROM x$bh)
GROUP BY state
有關x$bh 的幾個欄位說明
(1)state:
0 |
FREE |
no valid block image |
1 |
XCUR |
a current mode block, exclusive to this instance |
2 |
SCUR |
a current mode block, shared with other instances |
3 |
CR |
a consistent read (stale) block image |
4 |
READ |
buffer is reserved for a block being read from disk |
5 |
MREC |
a block in media recovery mode |
6 |
IREC |
a block in instance (crash) recovery mode |
(2)tch:
tch is the touch count. A high touch count indicates that the buffer is used often. Therefore, it will probably be at the head of the MRU list.
(3)tim: touch time.
(4)class: represents a value designated for the use of the block.
(5)lru_flag
(6)set_ds : maps to addr on x$kcbwds.
(7)le_addr: can be outer joined on x$le.le_addr.
(8)flag :is a bit array.
Bit |
if set |
0 |
Block is dirty |
4 |
temporary block |
9 or 10 |
ping |
14 |
stale |
16 |
direct |
524288 (=0x80000) |
2.4 Different pools within the cache
The cache consists actually of three buffer pools for different purposes.
2.4.1 Keep pool
The keep pool's purpose is to take small objects that should always be cached, for example Look Up Tables. See db_keep_cache_size.
2.4.2 Recycle pool
2.4.3 Default pool
The default pool is for everything else. See also x$kcbwbpd
2.4.4 x$kcbwbpd
Buffer pool descriptor, the base table for v$buffer_pool.
How is the buffer cache split between the default, the recycle and the keep buffer pool.
2.5 Cold and hot area
Each pool's LRU is divided into a hot area and a cold area. Accordingly, buffers with in the hot area are hot buffers (and buffers in the cold are are called cold buffers).
By default, 50% of the buffers belong to the cold area and the other 50% belong to the hot area. This behaviour can be changed with_db_percent_hot_default (for the default pool) _db_percent_hot_recycle (for the recycle pool) and _db_percent_hot_keep (for the keep pool).
A newly read db block will be inserted between the cold and the hot area such that it belongs to the hot area. This is called midpoint insertion. However, this is only true forsingle block reads, multi block reads will be placed at the LRU end.
2.6 Flushing the cache
With Oracle 10g it is possible to flush the buffer cache with:
10g:
SQL>alter system flush buffer_cache;
9i had an undocumented command to flush the buffer cache:
SQL>alter session set events = 'immediate trace name flush_cache';
2.7 Optimal Size
Some common wisdom says that the larger the buffer cache is, the better the performance of the database becomes. However, this claim is not always true.
To begin with, the cache needs to be managed. The bigger the cache, the larger theLRU and dirty list becomes. That results in longer search times for a free buffer (buffer busy waits).
Also, the bigger the cache, the greater the burden on the DBWn process.
--DB Cache 不是越大越好,如果DB Cache 過大,會造成大的LRU 列表和 dirty list。 這會會增加list的掃描時間。 同時大的cache 也會增加DBWn 程序的負擔
2.8 Buffer Cache 中的管理結構
Buffer Cache是SGA的一部分,Oracle利用Buffer Cache來管理data block,Buffer Cache的最終目的就是儘可能的減少磁碟I/O。
Buffer Cache中主要有3大結構用來管理Buffer Cache:
(1)Hash Bucket 和 Hash Chain List :Hash Bucket與Hash Chain List用來實現data block的快速定位。
(2)LRU List :掛載有指向具體的free buffer, pinned buffer以及還沒有被移動到 write list的dirty buffer 等資訊。所謂的free buffer就是指沒有包含任何資料的buffer,所謂的pinned buffer,就是指當前正在被訪問的buffer。
(3)Write(Dirty)List :掛載有指向具體的 dirty block的資訊。所謂的dirty block,就是指在 buffer cache中被修改過但是還沒有被寫入到磁碟的block。
2.8.1 Hash Bucket 和 Hash Chain List
Oracle將buffer cache中所有的buffer通過一個內部的Hash演算法運算之後,將這些buffer放到不同的 Hash Bucket中。每一個Hash Bucket中都有一個Hash Chain List,通過這個list,將這個Bucket中的block串聯起來。
要檢視Hash Chain List組成, 可以通過x$bh字典.
SQL> desc x$bh
Name Null? Type
----------------------- - ----------------
ADDR RAW(8) ---block在buffer cache中的address
HLADDR RAW(8) --latch:cache buffers chains 的address
NXT_HASH RAW(8) ---指向同一個Hash Chain List的下一個block
PRV_HASH RAW(8) ---指向同一個Hash Chain List的上一個block
NXT_REPL RAW(8)---指向LRU list中的下一個block
PRV_REPL RAW(8)---指向LRU list中的上一個block
Hash Chain List就是由x$bh中的NXT_HASH,PRV_HASH 這2個指標構成了一個雙向連結串列,其示意圖如下:
通過NXT_HASH,PRV_HASH這2個指標,那麼在同一個Hash Chain List的block就串聯起來了。
理解了Hash Bucket 和 Hash Chain List,我們現在來看看Hash Bucket 與 Hash Chain List管理Buffer Cache 的結構示意圖
這個圖和Shared Pool 有點類似。從圖中我們可以看到,一個latch:cache buffers chains(x$bh.hladdr) 可以保護多個Hash Bucket,也就是說,如果我要訪問某個block,我首先要獲得這個latch,一個Hash Bucket對應一個Hash Chain List,而這個Hash Chain List掛載了一個或者多個Buffer Header。
Hash Bucket的數量受隱含引數_db_block_hash_buckets的影響;
Latch:cache buffers chains的數量受隱含引數_db_block_hash_latches的影響
該隱含引數可以通過如下查詢檢視:
[email protected](rac2)> select * from v$version where rownum=1;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
[email protected](rac2)> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM x$ksppi x, x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm LIKE '%_db_block_hash%'
7 /
NAME VALUE DESCRIB
------------------------- ---------- -------------------------------------------
_db_block_hash_buckets 65536 Number of database block hash buckets
_db_block_hash_latches 2048 Number of database block hash latches
_db_block_hash_buckets 該隱含引數在8i以前等於db_block_buffers/4的下一個素數,到了8i 該引數等於 db_block_buffers*2 ,到了9i 以後,該引數取的是小於且最接近 db_block_buffers*2 的一個素數。
_db_block_hash_latches 該隱含引數表示的是 cache buffers chains latch的數量。
可以用下面查詢cache buffers chains latch的數量:
[email protected](rac2)> select count(*) from v$latch_children a,v$latchname b where a.latch#=b.latch# and b.name='cache buffers chains';
COUNT(*)
----------
2048
也可以用下面查詢cache buffers chains latch的數量
[email protected](rac2)> select count(distinct hladdr) from x$bh ;
COUNT(DISTINCTHLADDR)
---------------------
2048
根據之前查詢的結果:
_db_block_hash_buckets(65536)/_db_block_hash_latches (2048) = 32
一個cache buffers chains latch 平均下來要管理32個Hash Bucket,現在找一個latch,來驗證一下前面提到的結構圖。
[email protected](rac2)> select * from (select hladdr,count(*) from x$bh group by hladdr) where rownum<=5;
HLADDR COUNT(*)
-------- ----------
2F619B90 9
2F619D0C 12
2F619E88 4
2F61A004 8
2F61A180 4
我們查詢latch address 為2F619B90 所保護的data block
[email protected](rac2)> select hladdr,obj,dbarfil,dbablk, nxt_hash,prv_hash from x$bh where hladdr='2F619B90' order by obj;
HLADDR OBJ DBARFIL DBABLK NXT_HASH PRV_HASH
-------- ---------- ---------- ---------- -------- --------
2F619B90 18 1 38576 2F619C84 22BD92F4
2F619B90 122 1 51823 2F619CCC 2F619CCC
2F619B90 122 1 60499 2F619CAC 2F619CAC
2F619B90 181 1 47252 2F619C64 2F619C64
2F619B90 5127 3 31908 2F619C04 2F619C04
2F619B90 54768 1 73280 2F619C0C 2F619C0C
2F619B90 54768 1 256874 2F619C4C 2F619C4C
2F619B90 54769 1 73746 2F619CFC 2F619CFC
2F619B90 54769 1 73513 28BE54B4 2F619C84
9 rows selected.
注意DBA(1, 51823),它的NXT_HASH與PRV_HASH相同,也就是說DBA(1, 51823)掛載在只包含有1個data block的 Hash Chain上。
我們這裡查詢出了9條記錄,和我們在上面count(*)統計的資料一致,如果我們查出來的比上面少,就說明有缺少的N個block被刷到磁碟上了。
當一個使用者程序想要訪問Block(1,38576), 那麼步驟如下:
(1)對該Block運用Hash演算法,得到Hash值。
(2)獲得cache buffers chains latch
(3)到相應的Hash Bucket中搜尋相應Buffer Header
(4)如果找到相應的Buffer Header,然後判斷該Buffer的狀態,看是否需要構造CR Block,或者Buffer處於pin的狀態,最後讀取。
(5)如果找不到,就從磁碟讀入到Buffer Cache中。
在Oracle9i以前,如果其它使用者程序已經獲得了這個latch,那麼新的程序就必須等待,直到該使用者程序搜尋完畢(搜尋完畢之後就會釋放該latch)。
從Oracle9i開始 cache buffers chains latch可以只讀共享,也就是說使用者程序A以只讀(select)的方式訪問Block(1,73746),這個時候獲得了該latch,同時使用者程序B也以只讀的方式訪問Block(1,73513),那麼這個時候由於是隻讀的訪問,使用者程序B也可以獲得該latch。但是,如果使用者程序B要以獨佔的方式訪問Block(1,73513),那麼使用者程序B就會等待使用者程序A釋放該latch,這個時候Oracle就會對使用者程序B標記一個latch:cache buffers chains的等待事件。
一般來說,導致latch:cache buffers chains 的原因有如下三種:
1. 不夠優化的SQL
大量邏輯讀的SQL語句就有可能產生非常嚴重的latch:cache buffers chains等待,因為每次要訪問一個block,就需要獲得該latch,由於有大量的邏輯讀,那麼就增加了latch:cache buffers chains爭用的機率。
(1)對於正在執行的SQL語句,產生非常嚴重的latch:cache buffers chains爭用,可以利用下面SQL檢視執行計劃,並設法優化SQL語句。
SQL>select * from table(dbms_xplan.display_cursor('sql_id',sql_child_number));
(2)如果SQL已經執行完畢,我們就看AWR報表裡面的SQL Statistics->SQL ordered by Gets->Gets per Exec,試圖優化這些SQL。
2. 熱點塊爭用
(1)下面查詢查出Top 5 的爭用的latch address
/* Formatted on 2011/6/28 17:28:30 (QP5 v5.163.1008.3004) */
SELECT *
FROM ( SELECT CHILD#,
ADDR,
GETS,
MISSES,
SLEEPS
FROM v$latch_children
WHERE name = 'cache buffers chains' AND misses > 0 AND sleeps > 0
ORDER BY 5 DESC,
1,
2,
3)
WHERE ROWNUM < 6;
(2)然後利用下面查詢找出Hot block
/* Formatted on 2011/6/28 17:29:09 (QP5 v5.163.1008.3004) */
SELECT /*+ RULE */
e.owner || '.' || e.segment_name segment_name,
e.extent_id extent#,
x.dbablk - e.block_id + 1 block#,
x.tch, /* sometimes tch=0,we need to see tim */
x.tim,
l.child#
FROM v$latch_children l, x$bh x, dba_extents e
WHERE x.hladdr = '&ADDR'
AND e.file_id = x.file#
AND x.hladdr = l.addr
AND x.dbablk BETWEEN e.block_id AND e.block_id + e.blocks - 1
ORDER BY x.tch DESC;
3. Hash Bucket太少
需要更改_db_block_hash_buckets隱含引數。其實在Oracle9i之後,我們基本上不會遇到這個問題了,除非遇到Bug。所以這個是不推薦的,記住,在對Oracle的隱含引數做修改之前一定要諮詢Oracle Support。
2.8.2 LRU List 和 Write List
前面已經提到過了,如果一個使用者程序發現某個block不在Buffer Cache中,那麼使用者程序就會從磁碟上將這個block讀入Buffer Cache。
在將block讀入到Buffer Cache之前,首先要在LRU list上尋找Free的buffer,在尋找過程中,如果發現了Dirty Buffer就將其移動到 LRU Write List上。
如果Dirty Queue超過了閥值25%(如下面查詢所示),那麼DBWn就會將Dirty Buffer寫入到磁碟中。
[email protected](rac2)> select kvittag,kvitval,kvitdsc from x$kvit where kvittag in('kcbldq','kcbfsp');
KVITTAG KVITVAL KVITDSC
---------- ---------- ----------------------------------------------------------
kcbldq 25 large dirty queue if kcbclw reaches this
kcbfsp 40 Max percentage of LRU list foreground can scan for free
根據上面的查詢我們還知道,當某個使用者程序掃描LRU list超過40%都還沒找到Free Buffer,那麼這個時候使用者程序將停止掃描LRU list,同時通知DBWn將Dirty Buffer寫入磁碟,使用者程序也將記錄一個free buffer wait等待事件。如果我們經常看到free buffer wait等待事件,那麼我們就應該考慮加大Buffer Cache了。
從Oracle8i開始,LRU List和Dirty List都增加了輔助List(Aux List),Oracle將LRU List和LRU Write List統稱為Working Set(WS)。每個WS中都包含了幾個功能不同的List,每個WS都會有一個Cache Buffers LRU chain Latch的保護(知識來源於DSI405)。如果資料庫設定了多個DBWR,資料庫會存在多個WS,如果Buffer Cache中啟用了多快取池(default,keep,recycle)時,每個獨立的緩衝池都會有自己的WS。那麼下面我們來查詢一下,以驗證上述理論:
[email protected](rac2)> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM x$ksppi x, x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm LIKE '%db_block_lru_latches%'
7 /
NAME VALUE DESCRIB
------------------------ ---------- --------------------------------------------
_db_block_lru_latches 32 number of lru latches
[email protected](rac2)> show parameter db_writer
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_writer_processes integer 2
[email protected](rac2)> show parameter cpu_count
NAME TYPE VALUE
------------------------------------ -------------------- ------------
cpu_count integer 8
我們查到有32個Cache Buffers LRU chain Latch,從Oracle9i開始,_db_block_lru_latches是CPU_COUNT的4倍,如果DB_WITER_PROCESS小於4,置於DB_WITER_PROCESS大於四這個不知道,另外也沒見過哪個資料庫引數的DB_WITER_PROCESS大於4,
查詢一下有多少個Working Set:
[email protected](rac2)>select count(*) from x$kcbwds;
COUNT(*)
----------
32
我們查詢到有32個WS,並不代表資料庫就一定使用了這32個WS,有些WS 是資料庫預分配的,這樣在我們啟用Keep pool, recycle pool的時候就不用重啟資料庫了。
那麼我們這裡就只是用了4個WS。
[email protected](rac2)>select count(*) from x$kcbwds where CNUM_REPL>0;
COUNT(*)
----------
4
檢視X$KCBWDS基表主要欄位:
ADDR RAW(4) --address
INST_ID NUMBER --instance number
SET_ID NUMBER --work set id
DBWR_NUM NUMBER --dbwr編號
BLK_SIZE NUMBER --workset的block size
CKPT_LATCH RAW(4) --checkpoint latch
SET_LATCH RAW(4) --next replacement chain
NXT_REPL RAW(4) --prv replacement chain
PRV_REPL RAW(4) --replacement aux chain
NXT_REPLAX RAW(4)
PRV_REPLAX RAW(4)
CNUM_REPL NUMBER --replacement chian上的block數
ANUM_REPL NUMBER --aux chain上的block 數
COLD_HD RAW(4) --cold head的地址
HBMAX NUMBER --hot端的最大buffer數量
HBUFS NUMBER --hot端的當前buffer數量
NXT_WRITE RAW(4) --lru-w鏈
PRV_WRITE RAW(4) --lru-w鏈
NXT_WRITEAX RAW(4) --lru-w aux鏈
PRV_WRITEAX RAW(4) --lru-w aux鏈
CNUM_WRITE NUMBER --lru-w的buffer數
ANUM_WRITE NUMBER --lru-w aux的buffer數
NXT_XOBJ RAW(4) --reuse obj鏈(當truncate,drop等操作時使用)
PRV_XOBJ RAW(4) --reuse obj鏈
NXT_XOBJAX RAW(4) --reuse obj aux鏈
NXT_XRNG RAW(4) --reuse range鏈(tablespace offline等操作的時候使用)
NXT_XRNGAX RAW(4) --reuse range aux鏈
注意紅色欄位,正是由於紅色欄位,以及前面提到過的x$bh中的NXT_REPL,PRV_REPL 欄位形成了LRU List 以及LRU Write List。
下圖就是LRU List的結構示意圖
新增的輔助list(AUX List)作用:
在資料庫啟動之後,Buffer首先被存放在LRU AUX List上,使用者程序搜尋Free Buffer就會從LRU AUX List 的末/冷端進行。當這些塊被修改後或者是使用者程序要構造CR塊的時候(要構造CR塊也就表明這個塊不滿足讀一致性,是Dirty的),在LRU AUX List上的Buffer就會被移動到LRU Main List的中間,記住是中間不是頭部也不是末尾,那麼DBWR來搜尋Dirty Buffer就可以從LRU Main List開始(注意:DBWR 來搜尋LRU Main List 是由於增量檢查點導致的),DBWR在搜尋LRU Main List的時候如果發現冷的可以被重複使用的Buffer,就會將其移動到LRU AUX List上,這樣搜尋LRU Main List上的Buffer基本都是Dirty Buffer,提高了搜尋效率。
DBWR將搜尋到的Dirty Buffer移動到LRUW Main List,當需要將這個Dirty Buffer寫出的時候,就把這個Dirty Buffer移動到LRUW AUX List,這樣,當DBWR要執行寫出可以從LRUW AUX List寫出,這其實是一個非同步的寫出機制。(From Metalink: 157868.1)
根據上面的講解,當用戶程序要將Block從磁碟讀入到Buffer Cache中需要獲得Cache Buffers LRU chain Latch,或者是DBWR掃描LRU Main List的時候要獲得Cache Buffers LRU chain Latch。
所以,當我們發現AWR報表上面Cache Buffers LRU chain Latch排名很靠前,那麼我們可以採取如下方法:
(1)加大Buffer Cache,過小的Buffer Cache導致大量的磁碟I/O,必然引發Cache Buffers LRU chain Latch競爭。
(2)優化具有大量全表掃描,高磁碟I/O的SQL。如果SQL效率很低,大量的全表掃描,或者掃描沒有選擇性的索引就會引發這個問題。
(3)使用多緩衝池技術,把Hot Segments Keep起來,Hot Segments的資訊可以從AWR 報表中的Segments Statistics中得到。
三. Tuning Oracle's Buffer Cache
Oracle maintains its own buffer cache inside the system global area (SGA) for each instance. A properly sized buffer cache can usually yield a cache hit ratio over 90%, meaning that nine requests out of ten are satisfied without going to disk.
If a buffer cache is too small, the cache hit ratio will be small and more physical disk I/O will result. If a buffer cache is too big, then parts of the buffer cache will be under-utilized and memory resources will be wasted.
3.1 Checking The Cache Hit Ratio
Oracle maintains statistics of buffer cache hits and misses. The following query will show you the overall buffer cache hit ratio for the entire instance since it was started:
/* Formatted on 2011/6/28 19:18:29 (QP5 v5.163.1008.3004) */
SELECT (P1.VALUE + P2.VALUE - P3.VALUE) / (P1.VALUE + P2.VALUE)
FROM v$sysstat P1, v$sysstat P2, v$sysstat P3
WHERE P1.name = 'db block gets'
AND P2.name = 'consistent gets'
AND P3.name = 'physical reads'
You can also see the buffer cache hit ratio for one specific session since that session started:
/* Formatted on 2011/6/28 19:19:53 (QP5 v5.163.1008.3004) */
SELECT (P1.VALUE + P2.VALUE - P3.VALUE) / (P1.VALUE + P2.VALUE)
FROM v$sesstat P1,
v$statname N1,
v$sesstat P2,
v$statname N2,
v$sesstat P3,
v$statname N3
WHERE N1.name = 'db block gets'
AND P1.statistic# = N1.statistic#
AND P1.sid = <enter SID of session here>
AND N2.name = 'consistent gets'
AND P2.statistic# = N2.statistic#
AND P2.sid = P1.sid
AND N3.name = 'physical reads'
AND P3.statistic# = N3.statistic#
AND P3.sid = P1.sid
You can also measure the buffer cache hit ratio between time X and time Y by collecting statistics at times X and Y and computing the deltas.
3.2 Adjusting The Size Of The Buffer Cache
The db_block_buffers parameter in the parameter file determines the size of the buffer cache for the instance. The size of the buffer cache (in bytes) is equal to the value of the db_block_buffers parameter multiplied by the data block size.
You can change the size of the buffer cache by editing the db_block_buffers parameter in the parameter file and restarting the instance.
3.3 Determining If The Buffer Cache Should Be Enlarged
If you set the db_block_lru_extended_statistics parameter to a positive number in the parameter file for an instance and restart the instance, Oracle will populate a dynamic performance view called v$recent_bucket. This view will contain the same number of rows as the setting of the db_block_lru_extended_statistics parameter. Each row will indicate how many additional buffer cache hits there might have been if the buffer cache were that much bigger.
For example, if you set db_block_lru_extended_statistics to 1000 and restart the instance, you can see how the buffer cache hit ratio would have improved if the buffer cache were one buffer bigger, two buffers bigger, and so on up to 1000 buffers bigger than its current size. Following is a query you can use, along with a sample result:
/* Formatted on 2011/6/28 19:23:11 (QP5 v5.163.1008.3004) */
SELECT 250 * TRUNC (ROWNUM / 250)
+ 1
|| ' to '
|| 250 * (TRUNC (ROWNUM / 250) + 1)
"Interval",
SUM (COUNT) "Buffer Cache Hits"
FROM v$recent_bucket
GROUP BY TRUNC (ROWNUM / 250)
Interval Buffer Cache Hits
--------------- --------------------
1 to 250 16083
251 to 500 11422
501 to 750 683
751 to 1000 177
This result set shows that enlarging the buffer cache by 250 buffers would have resulted in 16,083 more hits. If there were about 30,000 hits in the buffer cache at the time this query was performed, then it would appear that adding 500 buffers to the buffer cache might be worthwhile. Adding more than 500 buffers might lead to under-utilized buffers and therefore wasted memory.
There is overhead involved in collecting extended LRU statistics. Therefore you should set the db_block_lru_extended_ statistics parameter back to zero as soon as your analysis is complete.
In Oracle7, the v$recent_bucket view was named X$KCBRBH. Only the SYS user can query X$KCBRBH. Also note that in X$KCBRBH the columns are called indx and count, instead of rownum and count.
3.4 Determining If The Buffer Cache Is Bigger Than Necessary
If you set the db_block_lru_statistics parameter to true in the parameter file for an instance and restart the instance, Oracle will populate a dynamic performance view called v$current_bucket. This view will contain one row for each buffer in the buffer cache, and each row will indicate how many of the overall cache hits have been attributable to that particular buffer.
By querying v$current_bucket with a GROUP BY clause, you can get an idea of how well the buffer cache would perform if it were smaller. Following is a query you can use, along with a sample result:
SELECT 1000 * TRUNC (rownum / 1000) + 1 || ' to ' ||
1000 * (TRUNC (rownum / 1000) + 1) "Interval",
SUM (count) "Buffer Cache Hits"
FROM v$current_bucket
WHERE rownum > 0
GROUP BY TRUNC (rownum / 1000)
Interval Buffer Cache Hits
------------ -----------------
1 to 1000 668415
1001 to 2000 281760
2001 to 3000 166940
3001 to 4000 14770
4001 to 5000 7030
5001 to 6000 959
This result set shows that the first 3000 buffers are responsible for over 98% of the hits in the buffer cache. This suggests that the buffer cache would be almost as effective if it were half the size; memory is being wasted on an oversized buffer cache.
There is overhead involved in collecting LRU statistics. Therefore you should set the db_block_lru_statistics parameter back to false as soon as your analysis is complete.
In Oracle7, the v$current_bucket view was named X$KCBCBH. Only the SYS user can query X$KCBCBH. Also note that in X$KCBCBH the columns are called indx and count, instead of rownum and count.
3.5 Full Table Scans
When Oracle performs a full table scan of a large table, the blocks are read into the buffer cache but placed at the least recently used end of the LRU list. This causes the blocks to be aged out quickly, and prevents one large full table scan from wiping out the entire buffer cache.
Full table scans of large tables usually result in physical disk reads and a lower buffer cache hit ratio. You can get an idea of full table scan activity at the data file level by querying v$filestat and joining to SYS.dba_data_files. Following is a query you can use and sample results:
/* Formatted on 2011/6/28 19:27:26 (QP5 v5.163.1008.3004) */
SELECT A.file_name, B.phyrds, B.phyblkrd
FROM SYS.dba_data_files A, v$filestat B
WHERE B.file# = A.file_id
ORDER BY A.file_id
FILE_NAME PHYRDS PHYBLKRD
-------------------------------- ---------- ----------
/u01/oradata/PROD/system01.dbf 92832 130721
/u02/oradata/PROD/temp01.dbf 1136 7825
/u01/oradata/PROD/tools01.dbf 7994 8002
/u01/oradata/PROD/users01.dbf 214 214
/u03/oradata/PROD/rbs01.dbf 20518 20518
/u04/oradata/PROD/data01.dbf 593336 9441037
/u05/oradata/PROD/data02.dbf 4638037 4703454
/u06/oradata/PROD/index01.dbf 1007638 1007638
/u07/oradata/PROD/index02.dbf 1408270 1408270
PHYRDS shows the number of reads from the data file since the instance was started.
PHYBLKRD shows the actual number of data blocks read. Usually blocks are requested one at a time. However, Oracle requests blocks in batches when performing full table scans. (The db_file_multiblock_read_count parameter controls this batch size.)
In the sample result set above, there appears to be quite a bit of full table scan activity in the data01.dbf data file, since 593,336 read requests have resulted in 9,441,037 actual blocks read.
3.6 Spotting I/O Intensive SQL Statements
The v$sqlarea dynamic performance view contains one row for each SQL statement currently in the shared SQL area of the SGA for the instance. v$sqlarea shows the first 1000 bytes of each SQL statement, along with various statistics. Following is a query you can use:
/* Formatted on 2011/6/28 19:31:34 (QP5 v5.163.1008.3004) */
SELECT executions,
buffer_gets,
disk_reads,
first_load_time,
sql_text
FROM v$sqlarea
ORDER BY disk_reads
EXECUTIONS indicates the number of times the SQL statement has been executed since it entered the shared SQL area.
BUFFER_GETS indicates the collective number of logical reads issued by all executions of the statement.
DISK_READS shows the collective number of physical reads issued by all executions of the statement. (A logical read is a read that resulted in a cache hit or a physical disk read. A physical read is a read that resulted in a physical disk read.)
You can review the results of this query to find SQL statements that perform lots of reads, both logical and physical. Consider how many times a SQL statement has been executed when evaluating the number of reads.
再來看下中文解釋
Buffer Cache 原理
我們在監控等待事件,檢視AWR,ASH報表的時候經常會看到latch: cache buffers chains,有可能還會看到latch: cache buffers lru chain這些等待事件,對於cache buffers chains這個等待事件,相信是大家最為頭疼的,如果對Buffer Cache理解不深,那麼你就遇到這些等待事件就會束手無策。本文的目的就是通過講解Buffer Cache原理,使大家得心應手的處理這些latch爭用。
Buffer Cache概述
Buffer Cache是SGA的一部分,Oracle利用Buffer Cache來管理data block,Buffer Cache的最終目的就是儘可能的減少磁碟I/O。Buffer Cache中主要有3大結構用來管理Buffer Cache。
Hash Bucket & Hash Chain List :Hash Bucket與Hash Chain List用來實現data block的快速定位。
LRU List :掛載有指向具體的free buffer, pinned buffer以及還沒有被移動到 write list的dirty buffer 等資訊。所謂的free buffer就是指沒有包含任何資料的buffer,所謂的pinned buffer,就是指當前正在被訪問的buffer。
Write(Dirty)List :掛載有指向具體的 dirty block的資訊。所謂的dirty block,就是指在 buffer cache中被修改過但是還沒有被寫入到磁碟的block。
Hash Bucket與Hash Chain List
Oracle將buffer cache中所有的buffer通過一個內部的Hash演算法運算之後,將這些buffer放到不同的 Hash Bucket中。每一個Hash Bucket中都有一個
Hash Chain List,通過這個list,將這個Bucket中的block串聯起來。
下面舉個簡單的例子來介紹一下Hash 演算法,Oracle的Hash 演算法肯定沒這麼簡單,具體演算法只有Oracle公司知道。
• 一個簡單的mod函式 ,我們去mod 4
Ø 1 mod 4 = 1
Ø 2 mod 4 = 2
Ø 3 mod 4 = 3
Ø 4 mod 4 = 0
Ø 5 mod 4 = 1
Ø 6 mod 4 = 2
Ø 7 mod 4 = 3
Ø 8 mod 4 = 0
……………省略…………………..
那麼這裡就相當於建立了4個Hash Bucket
如果有如下block:
blcok :DBA(1,1) ------> (1+1) mod 4 =2
block :DBA(1,2) ------> (1+2) mod 4 =3
block :DBA(1,3) ------> (1+3) mod 4 =0
block :DBA(1,4) ------> (1+4) mod 4 =1
block :DBA(1,5) ------> (1+5) mod 5 =2
………........省略…………………....
比如我要訪問block(1,5),那麼我對它進行Hash運算,然後到Hash Bucket為2的這個Bucket裡面去尋找,Hash Bucket 為2的這個Bucket 現在有2個block,
這2個block是掛在Hash Chain List上面的
Hash Chain List到底是怎麼組成的呢?這裡我們就要提到x$bh這個基表了
SQL> desc x$bh
Name Null? Type
----------------------- - ----------------
ADDR RAW(8) ---block在buffer cache中的address
INDX NUMBER
INST_ID NUMBER
HLADDR RAW(8) --latch:cache buffers chains 的address
BLSIZ NUMBER
NXT_HASH RAW(8) ---指向同一個Hash Chain List的下一個block
PRV_HASH RAW(8) ---指向同一個Hash Chain List的上一個block
NXT_REPL RAW(8)---指向LRU list中的下一個block
PRV_REPL RAW(8)---指向LRU list中的上一個block
………………省略…………………………
Hash Chain List就是由x$bh中的NXT_HASH,PRV_HASH 這2個指標構成了一個雙向連結串列,其示意圖如下:
通過NXT_HASH,PRV_HASH這2個指標,那麼在同一個Hash Chain List的block就串聯起來了。
理解了Hash Bucket 和 Hash Chain List,我們現在來看看
Hash Bucket 與 Hash Chain List管理Buffer Cache 的結構示意圖
從圖中我們可以看到,一個latch:cache buffers chains(x$bh.hladdr) 可以保護多個Hash Bucket,也就是說,如果我要訪問某個block,我首先要獲得這個latch,一個Hash Bucket對應一個Hash Chain List,而這個
Hash Chain List掛載了一個或者多個Buffer Header。
Hash Bucket的數量受隱含引數_db_block_hash_buckets的影響,
Latch:cache buffers chains的數量受隱含引數_db_block_hash_latches的影響
該隱含引數可以通過如下查詢檢視:
SQL> select * from v$version;
BANNER
------------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM x$ksppi x, x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm LIKE '%_db_block_hash%'
7 /
NAME VALUE DESCRIB
------------------------- --------------- --------------------------------------
_db_block_hash_buckets 524288 Number of database block hash buckets
_db_block_hash_latches 16384 Number of database block hash latches
_db_block_hash_buckets 該隱含引數在8i以前 等於db_block_buffers/4的下一個素數
到了8i 該引數等於 db_block_buffers*2 ,
到了9i 以後,該引數取的是小於且最接近 db_block_buffers*2 的一個素數
_db_block_hash_latches 該隱含引數表示的是 cache buffers chains latch的數量,它怎麼計算的我們不用深究
可以看到,從8i以後Hash Bucket數量比以前提升了8倍。
可以用下面查詢計算cache buffers chains latch的數量
SQL> select count(*) from v$latch_children a,v$latchname b where a.latch#=b.latch# and b.name='cache buffers chains';
COUNT(*)
----------
16384
還可以用下面查詢計算cache buffers chains latch的數量
SQL> select count(distinct hladdr) from x$bh ;
COUNT(DISTINCTHLADDR)
---------------------
16384
根據我們的查詢,那麼一個cache buffers chains latch 平均下來要管理32個Hash Bucket,那麼現在我們隨意的找一個latch,來驗證一下前面提到的結構圖。
SQL> select * from (select hladdr,count(*) from x$bh group by hladdr) where rownum<=5;
HLADDR COUNT(*)
---------------- ----------
C000000469F08828 15
C000000469F088F0 14
C000000469F089B8 15
C000000469F08A80 24
C000000469F08B48 17
我們查詢latch address 為C000000469F08828 所保護的data block
SQL> select hladdr,obj,dbarfil,dbablk, nxt_hash,prv_hash from x$bh where hladdr='C000000469F08828' order by obj;
HLADDR OBJ DBARFIL DBABLK NXT_HASH PRV_HASH
---------------- ---------- ---------- ---------- ---------------- ----------------
C000000469F08828 2 388 322034 C0000004686ECBD0 C00000017EF8D658
C000000469F08828 2 388 396246 C0000004686ECA60 C0000004686ECA60
C000000469F08828 18 411 674831 C0000004686ECC00 C0000004686ECC00
C000000469F08828 216 411 438948 C0000004686ECBB0 C0000004686ECBB0
C000000469F08828 216 220 100217 C0000004686ECAA0 C0000004686ECAA0
C000000469F08828 216 220 60942 C000000151FB5DD8 C0000004686ECBD0
C000000469F08828 569 411 67655 C00000011FF81668 C0000001E8FB7AC0
C000000469F08828 569 280 1294 C0000004686ECB60 C000000177F9F078
C000000469F08828 58744570 210 332639 C000000177F9F078 C0000004686ECB60
C000000469F08828 65178270 254 408901 C0000004686ECBF0 C0000004686ECBF0
C000000469F08828 65347592 84 615093 C0000004686ECB90 C0000004686ECB90
C000000469F08828 65349200 765 1259399 C0000004686ECA70 C0000004686ECA70
請觀察DBA(388,396246),它的NXT_HASH與PRV_HASH相同,也就是說DBA(388,396246)掛載在只包含有1個data block的 Hash Chain上。
另外也請注意,我通過count(*)計算出來的時候有15個block,但是查詢之後就變成了12個block,那說明有3個block被刷到磁碟上了。
當一個使用者程序想要訪問Block(569,411):
l 對該Block運用Hash演算法,得到Hash值。
l 獲得cache buffers chains latch
l 到相應的Hash Bucket中搜尋相應Buffer Header
l 如果找到相應的Buffer Header,然後判斷該Buffer的狀態,看是否需要構造CR Block,或者Buffer處於pin的狀態,最後讀取。
l 如果找不到,就從磁碟讀入到Buffer Cache中。
在Oracle9i以前,如果其它使用者程序已經獲得了這個latch,那麼新的程序就必須等待,直到該使用者程序搜尋完畢(搜尋完畢之後就會釋放該latch)。從Oracle9i開始cache buffers chains latch可以只讀共享,也就是說使用者程序A以只讀(select)的方式訪問Block(84,615093),這個時候獲得了該latch,同時使用者程序B也以只讀的方式訪問Block(765,1259399),那麼這個時候由於是隻讀的訪問,使用者程序B也可以獲得該latch。但是,如果使用者程序B要以獨佔的方式訪問Block(765,1259399),那麼使用者程序B就會等待使用者程序A釋放該latch,這個時候Oracle就會對使用者程序B標記一個latch:cache buffers chains的等待事件。
我們遇到了latch:cache buffers chains該怎麼辦?
l 不夠優化的SQL。大量邏輯讀的SQL語句就有可能產生非常嚴重的latch:cache buffers chains等待,因為每次要訪問一個block,就需要獲得該latch,由於有大量的邏輯讀,那麼就增加了latch:cache buffers chains爭用的機率。
Ø 對於正在執行的SQL語句,產生非常嚴重的latch:cache buffers chains爭用,可以利用下面SQL檢視執行計劃,並設法優化SQL語句。
select * from table(dbms_xplan.display_cursor('sql_id',sql_child_number));
Ø 如果SQL已經執行完畢,我們就看AWR報表裡面的SQL Statistics->SQL ordered by Gets->Gets per Exec,試圖優化這些SQL。
l 熱點塊爭用。
Ø 下面查詢查出Top 5 的爭用的latch address。
select * from( select CHILD#,ADDR,GETS ,MISSES,SLEEPS from v$latch_children where name = 'cache buffers chains' and misses>0 and sleeps>0 order by 5 desc, 1, 2, 3) where rownum<6;
Ø 然後利用下面查詢找出Hot block。
select /*+ RULE */
e.owner ||'.'|| e.segment_name segment_name,
e.extent_id extent#,
x.dbablk - e.block_id + 1 block#,
x.tch, /* sometimes tch=0,we need to see tim */
x.tim ,
l.child#
from
v$latch_children l,
x$bh x,
dba_extents e
where
x.hladdr = '&ADDR' and
e.file_id = x.file# and
x.hladdr = l.addr and
x.dbablk between e.block_id and e.block_id + e.blocks -1
order by x.tch desc ;
l Hash Bucket太少,需要更改_db_block_hash_buckets隱含引數。其實在Oracle9i之後,我們基本上不會遇到這個問題了,除非遇到Bug。所以這個是不推薦的,記住,在對Oracle的隱含引數做修改之前一定要諮詢Oracle Support。
LRU List與LRU Write List
前面已經提到過了,如果一個使用者程序發現某個block不在Buffer Cache中,那麼使用者程序就會從磁碟上將這個block讀入Buffer Cache。在將block讀入到Buffer Cache之前,首先要在LRU list上尋找Free的buffer,在尋找過程中,如果發現了Dirty Buffer就將其移動到 LRU Write List上。如果Dirty Queue超過了閥值25%(如下面查詢所示),那麼DBWn就會將Dirty Buffer寫入到磁碟中。
SQL> select kvittag,kvitval,kvitdsc from x$kvit where kvittag in('kcbldq','kcbfsp');
KVITTAG KVITVAL KVITDSC
-------------------- ---------- -------------------------------------------------------
kcbldq 25 large dirty queue if kcbclw reaches this
kcbfsp 40 Max percentage of LRU list foreground can scan for free
根據上面的查詢我們還知道,當某個使用者程序掃描LRU list超過40%都還沒找到Free Buffer,那麼這個時候使用者程序將停止掃描LRU list,同時通知DBWn將Dirty Buffer寫入磁碟,使用者程序也將記錄一個free buffer wait等待事件。如果我們經常看到free buffer wait等待事件,那麼我們就應該考慮加大Buffer Cache了。
從Oracle8i開始,LRU List和Dirty List都增加了輔助List,Oracle將LRU List和LRU Write List統稱為Working Set(WS)。每個WS中都包含了幾個功能不同的List,每個WS都會有一個Cache Buffers LRU chain Latch的保護(知識來源於DSI405)。如果資料庫設定了多個DBWR,資料庫會存在多個WS,如果Buffer Cache中啟用了多快取池(default,keep,recycle)時,每個獨立的緩衝池都會有自己的WS。那麼下面我們來查詢一下,以驗證上述理論:
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM x$ksppi x, x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm LIKE '%db_block_lru_latches%'
7 /
NAME VALUE DESCRIB
------------------------------ ---------- -----------------------------
_db_block_lru_latches 32 number of lru latches
SQL> show parameter db_writer
NAME TYPE VALUE
----------------------------- -------------- ------
db_writer_processes inte 2
SQL> show parameter cpu_count
NAME TYPE VALUE
------------------------------------ -------------------- ------------
cpu_count integer 8
我們查到有32個Cache Buffers LRU chain Latch,從Oracle9i開始,_db_block_lru_latches是CPU_COUNT的4倍,如果DB_WITER_PROCESS小於4,置於DB_WITER_PROCESS大於四這個不知道,另外也沒見過哪個資料庫引數的DB_WITER_PROCESS大於4那我們來查詢一下有多少個Working Set:
SQL> select count(*) from x$kcbwds;
COUNT(*)
----------<