centos上配置dm-cache
dm-cache概念:
dm-cache作為第一個進入kernel的ssd-cache,在3.9核心版本併入,目前是最穩定的。3.10版本的dm-cache採用了全新的smq淘汰策略。優化了記憶體消耗,並且提升了效能。
與flashcache類似,dm-cache也是基於device-mapper框架
。
在kernel 2.6.x中dm-cache採用和flashcache相同的set associative hash
dm-cache配置:
為了給後端低速HDD盤做快取記憶體,要對SSD開啟cache,並虛擬出一個裝置用來快取。
1.檢視cache的SSD大小
blockdev --getsize64 /dev/sdb1
2。根據SSD大小計算metadata大小
4194304(4M)+(16 * 53687091200)= 7471104
7471104 / 512 = 14592
所以要從SSD中為metadata分配14592個block。
3.先建立ssd-metadata裝置(在SSD上建立)
dmsetup create ssd-metadata --table '0 14592 linear /dev/sdb1 0'
建立完成後,對新建的ssd-metadata裝置進行清空
dd if=/dev/zero of=/dev/mapper/ssd-metadata
4.建立ssd-cache並檢視
計算SSD分出去後剩餘的block塊個數
(53687091200 /512) - 14592 = 104843008
dmsetup create ssd-cache --table '0 104843008 linear /dev/sdb1 14592'
建立完成後,lsblk檢視ssd-metadata,ssd-cache
5.獲得機械盤的扇區數
blockdev --getsize64 /dev/sdb1
將cache與後端磁碟建立聯絡,並虛擬出裝置hdd-origin
dmsetup create hdd-origin --table '0 1048576000 cache /dev/mapper/ssd-metadata /dev/mapper/ssd-cache /dev/sda1 512 1 writeback default 0'
在建立的時候,需要填寫策略,cache裝置,metadata裝置等引數,下面是我找到的關於建立時候的資料
看不清的話下面有,但是有點錯位
sudo blockdev --getsz /dev/sdd #The data device
Im Beispiel liefert dieser Befehl 8388608 zurück.
Folgender Befehl richtet den Cache ein:
sudo dmsetup create device_name --table '0 8388608 cache /dev/sde1 /dev/sde2 /dev/sdd 512 1 writeback default 0'
│ │ │ │ │ │ │ │ │ └─Anzahl der Policy Argumente
│ │ │ │ │ │ │ │ └──────Verwendete Caching Policy
│ │ │ │ │ │ │ └───────────────Feature-Argument:(Write-Cache (writeback oder writethrough))
│ │ │ │ │ │ └─────────────────────Anzahl der Feature-Argumente
│ │ │ │ │ └────────────────────────Blockgröße in Sektoren (256KB = 512*512Byte)
│ │ │ │ └──────────────────────────────Daten-Device
│ │ │ └────────────────────────────────────────Cache-Device
│ │ └──────────────────────────────────────────────────Metadaten-Device
│ └─────────────────────────────────────────────────────────────────Letzter Sektor des Devices
└──────────────────────────────────────────────────────────────────────Erster Sektor des Devices
Das erstellte Device ist danch unter /dev/mapper/device_name verfügbar und kann gemountet werden.
6.驗證建立的dm-cache是否成功
ls -l /dev/mapper/hdd-origin
dmsetup status /dev/mapper/hdd-origin
這裡面的每個引數代表的意義:
Der Device-Mapper stellt über dmsetup status den aktuellen Status des Caches zur Verfügung.
sudo dmsetup status /dev/mapper/dmcache_test
Beispielausgabe:
0 8388608 cache 34/1280 1781667 761103 200117 225373 0 0 3567 1 1 writethrough 2 migration_threshold 2048 4 random_threshold 4 sequential_threshold 512
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └──────────└───────────└───────────└─<policy args>*
│ │ │ │ │ │ │ │ │ │ │ │ │ │ └─<#policy args>
│ │ │ │ │ │ │ │ │ │ │ │ └────────────└─<core args>*
│ │ │ │ │ │ │ │ │ │ │ └─<#core args>
│ │ │ │ │ │ │ │ │ │ └─<feature args>*
│ │ │ │ │ │ │ │ │ └─<#feature args>
│ │ │ │ │ │ │ │ └─<#dirty>
│ │ │ │ │ │ │ └─<#blocks in cache>
│ │ │ │ │ │ └─<#promotions>
│ │ │ │ │ └─<#demotions>
│ │ │ │ └─<#write misses>
│ │ │ └─<#write hits>
│ │ └─<#read misses>
│ └─<#read hits>
└─<#used metadata blocks>/<#total metadata blocks>
Die Erklärung der Variablen wird in der Kernel-Dokumentation kurz erläutert:[3]
#used metadata blocks : Number of metadata blocks used
#total metadata blocks : Total number of metadata blocks
#read hits : Number of times a READ bio has been mapped to the cache
#read misses : Number of times a READ bio has been mapped to the origin
#write hits : Number of times a WRITE bio has been mapped to the cache
#write misses : Number of times a WRITE bio has been mapped to the origin
#demotions : Number of times a block has been removed from the cache
#promotions : Number of times a block has been moved to the cache
#blocks in cache : Number of blocks resident in the cache
#dirty : Number of blocks in the cache that differ from the origin
#feature args : Number of feature args to follow
feature args : 'writethrough' (optional)
#core args : Number of core arguments (must be even)
core args : Key/value pairs for tuning the core e.g. migration_threshold
#policy args : Number of policy arguments to follow (must be even)
policy args : Key/value pairs e.g. 'sequential_threshold 1024
7.停用dm-cache
umount /dev/mapper/hdd-origin
dmsetup remove hdd-origin
dmsetup remove ssd-metadata
dmsetup remove ssd-cache
資料:
Caching beenden
Beim Beenden des Caches müssen alle Daten aus dem Cache auf das Daten-Device geschrieben werden. Folgenden Befehle werden dabei verwendet:
dmsetup suspend device_name
dmsetup reload device_name --table '0 8388608 cache /dev/sde1 /dev/sde2 /dev/sdd 512 0 cleaner 0'
dmsetup resume device_name
dmsetup wait device_name
Danach sind alle Daten vom Cache auf das Daten-Device geschrieben worden und der Cache wird entfernt:
dmsetup suspend device_name
dmsetup clear device_name
dmsetup remove device_name
最後lsblk可以看到剛建立dm-cache的全都沒了,在/dev/mapper/ 沒有了hdd-origin 如果不放心 可重啟後 再lsblk觀察是否刪乾淨。
寫策略是可以切換的,方法如下:
We can see the current write mode with 'dmsetup status
' on the top level object, although the output is what they call somewhat hard to interpret:
# dmsetup status testing-test
0 10485760 cache 8 259/128000 128 4412/81920 16168 6650 39569 143311 0 0 0 1 writeback 2 migration_threshold 2048 mq 10 random_threshold 4 sequential_threshold 512 discard_promote_adjustment 1 read_promote_adjustment 4 write_promote_adjustment 8
I have bolded the important bit. This says that the cache is in the potentially dangerous writeback mode. To change the writeback mode we must redefine the cache, which is a little bit alarming and also requires temporarily suspending and unsuspending the device; the latter may have impacts if, for example, you are actually using it for a mounted filesystem at the time.
DM devices are defined through what the dmsetup
manpage describes as 'a table that specifies a target for each sector in the logical device'. Fortunately the tables involved are relatively simple and better yet, we can get dmsetup
to give us a starting point:
# dmsetup table testing-test 0 10485760 cache 253:3 253:2 253:4 128 0 default 0
To change the cache mode, we reload an altered table and then suspend and resume the device to activate our newly loaded table. For now I am going to just present the new table with the changes bolded:
# dmsetup reload --table '0 10485760 cache 253:3 253:2 253:4 128 1 writethrough default 0' testing-test
# dmsetup suspend testing-test
# dmsetup resume testing-test
At this point you can rerun 'dmsetup status
' to see that the cache device has changed to writethrough.
So let's talk about the DM table we (re)created here. The first two numbers are the logical sector range and the rest of it describes the target specification for that range. The format of this specification is, to quote the big comment in the kernel's devices/md/dm-cache-target.c
:
cache <metadata dev> <cache dev> <origin dev> <block size> <#feature args> [<feature arg>]* <policy> <#policy args> [<policy arg>]*
The original table's ending of '0 default 0
' thus meant 'no feature arguments, default
policy, no policy arguments'. Our new version of '1 writethrough default 0
' is a change to '1 feature argument of writethrough
, still the default
policy, no policy arguments'. Also, if you're changing from writethrough back to writeback you don't end the table with '1 writeback default 0
' because it turns out that writeback isn't a feature, it's just the default state. So you write the end of the table as '0 default 0
' (as it was initially here).
Now it's time for the important disclaimers. The first disclaimer is that I'm not sure what happens to any dirty blocks on your cache device if you switch from writeback to writethrough mode. I assume that they still get flushed back to your real device and that this happens reasonably fast, but I can't prove it from reading the kernel source or my own tests and I can't find any documentation. At the moment I would call this caveat emptor until I know more. In fact I'm not truly confident what happens if you switch between writethrough and writeback in general.
(I do see some indications that there is a flush and it is rapid, but I feel very nervous about saying anything definite until I'm sure of things.)
The second disclaimer is that at the moment the Fedora 20 LVM cannot set up writethrough cache LVs. You can tell it to do this and it will appear to have succeeded, but the actual cache device as created at the DM layer will be writeback. This issue is what prompted my whole investigation of this at the DM level. I have filed Fedora bug #1135639about this, although I expect it's an upstream issue.
The third disclaimer is that all of this is as of Fedora 20 and its 3.15.10-200.fc20 kernel (on 64-bit x86, in specific). All of this may change over time and probably will, as I doubt that the kernel people consider very much of this to be a stable interface.
Given all of the uncertainties involved, I don't plan to consider using LVM caching until LVM can properly create writethrough caches. Apart from the hassle involved, I'm just not happy with converting live dm-cache setups from one mode to the other right now, not unless someone who really knows this system can tell us more about what's really going on and so on.
(A great deal of my basic understanding of dmsetup
usage comes from Kyle Manna's entry SSD caching using dm-cache tutorial.)
Sidebar: forcing a flush of the cache
In theory, if you want to be more sure that the cache is clean in a switch between writeback and writethrough you can explicitly force a cache clean by switching to the cleaner
policy first and waiting for it to stabilize.
# dmsetup reload --table '0 10485760 cache 253:3 253:2 253:4 128 0 cleaner 0' testing-test
# dmsetup wait testing-test
I don't know how long you have to wait for this to be safe. If the cache LV (or other dm-cache device) is quiescent at the user level, I assume that you should be okay when IO to the actual devices goes quiet. But, as before, caveat emptor applies; this is not really well documented.
Sidebar: the output of 'dmsetup status
'
The output of 'dmsetup status
' is mostly explained in a comment in front of the cache_status()
function in devices/md/dm-cache-target.c
. To save people looking for it, I will quote it here:
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
<cache block size> <#used cache blocks>/<#total cache blocks>
<#read hits> <#read misses> <#write hits> <#write misses>
<#demotions> <#promotions> <#dirty>
<#features> <features>*
<#core args> <core args>
<policy name> <#policy args> <policy args>*
Unlike with the definition table, 'writeback
' is considered a feature here. By cross-referencing this with the earlier 'dmsetup status
' output we can discover that mq
is the default policy and it actually has a number of arguments, the exact meanings of which I haven't researched (but see here and here).