linux磁碟排程策略
磁碟的排程演算法有多種,先來先服務(First Come,First Server,FCFS),最短尋道優先(Shortest Seek Time First,SSTF),掃描演算法SCAN等等。
這裡介紹Linux支援的4種磁碟排程演算法:The Schedulers
There are currently 4 available:
- Noop Scheduler
- Anticipatory IO Scheduler ("as scheduler")
- Deadline Scheduler
- Complete Fair Queueing Scheduler ("cfq scheduler")
Noop Scheduler
This scheduler only implements request merging.
在Linux2.4或更早的版本的排程程式,那時只有這一種I/O排程演算法.
NOOP演算法的全寫為No Operation。該演算法實現了最最簡單的FIFO佇列,所有IO請求大致按照先來後到的順序進行操作。之所以說“大致”,原因是NOOP在FIFO的基礎上還做了相鄰IO請求的合併,並不是完完全全按照先進先出的規則滿足IO請求。NOOP假定I/O請求由驅動程式或者裝置做了優化或者重排了順序(就像一個智慧控制器完成的工作那樣)。在有些SAN環境下,這個選擇可能是最好選擇。Noop
對於 IO 不那麼操心,對所有的 IO請求都用 FIFO 佇列形式處理,預設認為 IO 不會存在效能問題。這也使得 CPU 也不用那麼操心。當然,對於複雜一點的應用型別,使用這個排程器,使用者自己就會非常操心。
NOOP對於快閃記憶體裝置,RAM,嵌入式系統是最好的選擇.
Anticipatory IO Scheduler ("as scheduler")
The anticipatory scheduler is the default scheduler in older 2.6 kernels – if you’ve not specified one, this is the one that will be loaded. It implements request merging, a one-way elevator, read and write request batching, and attempts some anticapatory reads by holding off a bit after a read batch if it thinks a user is going to ask for more data. It tries to optimise for physical disks by avoiding head movements if possible – one downside to this is that it probably give highly erratic performance on database or storage systems.
CFQ和DEADLINE考慮的焦點在於滿足零散IO請求上。對於連續的IO請求,比如順序讀,並沒有做優化。為了滿足隨機IO和順序IO混合的場景,Linux還支援ANTICIPATORY排程演算法。ANTICIPATORY的在DEADLINE的基礎上,為每個讀IO都設定了6ms 的等待時間視窗。如果在這6ms內OS收到了相鄰位置的讀IO請求,就可以立即滿足 Anticipatory
scheduler(as) 曾經一度是 Linux 2.6 Kernel 的 IO scheduler 。Anticipatory 的中文含義是”預料的, 預想的”, 這個詞的確揭示了這個演算法的特點,簡單的說,有個 IO 發生的時候,如果又有程序請求 IO 操作,則將產生一個預設的 6 毫秒猜測時間,猜測下一個 程序請求 IO 是要幹什麼的。這對於隨即讀取會造成比較大的延時,對資料庫應用很糟糕,而對於 Web Server 等則會表現的不錯。這個演算法也可以簡單理解為面向低速磁碟的,因為那個”猜測”實際上的目的是為了減少磁頭移動時間。
Deadline Scheduler
The deadline scheduler implements request merging, a one-way elevator, and imposes a deadline on all operations to prevent resource starvation. Because writes return instantly within linux, with the actual data being held in cache, the deadline scheduler will also prefer readers – as long as the deadline for a write request hasn’t passed. The kernel docs suggest this is the preferred scheduler for database systems, especially if you have TCQ aware disks, or any system with high disk performance.
DEADLINE在CFQ的基礎上,解決了IO請求餓死的極端情況。除了CFQ本身具有的IO排序佇列之外,DEADLINE額外分別為讀IO和寫IO提供了FIFO佇列。讀FIFO佇列的最大等待時間為500ms,寫FIFO佇列的最大等待時間為5s。FIFO佇列內的IO請求優先順序要比CFQ佇列中的高,,而讀FIFO佇列的優先順序又比寫FIFO佇列的優先順序高。
優先順序可以表示如下:
FIFO(Read) > FIFO(Write) > CFQ
deadline 演算法保證對於既定的 IO 請求以最小的延遲時間,從這一點理解,對於 DSS 應用應該會是很適合的。
Complete Fair Queueing Scheduler ("cfq scheduler")
The complete fair queueing scheduler implements both request merging and the elevator, and attempts to give all users of a particular device the same number of IO requests over a particular time interval. This should make it more efficient for multiuser systems. It seems that Novel SLES sets cfq as the scheduler by default, as does the latest Ubuntu release. As of the 2.6.18 kernel, this is the default schedular in kernel.org releases.
CFQ演算法的全寫為Completely Fair Queuing。該演算法的特點是按照IO請求的地址進行排序,而不是按照先來後到的順序來進行響應。
在傳統的SAS盤上,磁碟尋道花去了絕大多數的IO響應時間。CFQ的出發點是對IO地址進行排序,以儘量少的磁碟旋轉次數來滿足儘可能多的IO請求。在CFQ演算法下,SAS盤的吞吐量大大提高了。但是相比於NOOP的缺點是:先來的IO請求並不一定能被滿足,可能會出現餓死的情況。
Completely Fair Queuing (cfq, 完全公平佇列) 在 2.6.18 取代了 Anticipatory scheduler 成為 Linux Kernel 預設的
IO scheduler 。cfq 對每個程序維護一個 IO 佇列,各個程序發來的 IO 請求會被 cfq 以輪循方式處理。也就是對每一個 IO 請求都是公平的。這使得 cfq 很適合離散讀的應用(eg: OLTP DB)。我所知道的企業級 Linux 發行版中,SuSE Linux 好像是最先預設用 cfq 的.
Changing Schedulers
The most reliable way to change schedulers is to set the kernel option ‘elevator’ at boot time. You can set it to one of "as", "cfq", "deadline" or "noop", to set the appropriate scheduler.
It seems under more recent 2.6 kernels (2.6.11, possibly earlier), you can change the scheduler at runtime by echoing the name of the scheduler into /sys/block/<devicename>/queue/scheduler, where devicename is the base name of the block device, eg sda for /dev/sda
Which one should I use?
I’ve not personally done any testing on this, so I can’t speak from experience yet. The anticipatory scheduler will be the default one for a reason however – it is optimised for the common case. If you’ve only got single disk systems (ie, no RAID – hardware or software) then this scheduler is probably the right one for you. If it’s a multiuser system, you will probably find cfq or deadline providing better performance, and the numbers seem to back deadline giving the best performance for database systems.
Tuning the IO schedulers
The schedulers may have parameters that can be tuned at runtime. Read the linux documentation on the schedulers listed in theReferences section below
More information
Read the documents mentioned in the References section below, especially the linux kernel documentation on the anticipatory and deadline schedulers.