1. 程式人生 > 實用技巧 >【Docker】使用 OverlayFS 儲存驅動

【Docker】使用 OverlayFS 儲存驅動

參考教程:https://docs.docker.com/storage/storagedriver/overlayfs-driver/
以下內容來自官方文件翻譯

環境

  1. virtual box 6.1
  2. centos 7.8
  3. docker 19.03

OverlayFS is a modern union filesystem that is similar to AUFS, but faster and with a simpler implementation. Docker provides two storage drivers for OverlayFS: the original overlay

, and the newer and more stable overlay2.

OverlayFS 是一種現代的聯合檔案系統,與 AUFS 類似,但是速度更快且實現更簡單。Docker 為 OverlayFS 提供了兩個儲存驅動程式:原始的 overlay 和更新且更穩定的 overlay2

This topic refers to the Linux kernel driver as OverlayFS and to the Docker storage driver as overlay or overlay2.

本主題將 Linux 核心驅動程式稱為 OverlayFS,並將 Docker 儲存驅動程式稱為 overlay

overlay2

Note: If you use OverlayFS, use the overlay2 driver rather than the overlay driver, because it is more efficient in terms of inode utilization. To use the new driver, you need version 4.0 or higher of the Linux kernel, or RHEL or CentOS using version 3.10.0-514 and above.

注意:如果使用 OverlayFS,請使用 overlay2

驅動程式而不是 overlay 驅動程式,因為它在 inode 利用率方面更為有效。要使用新的驅動程式,您需要 Linux 核心的版本 4.0 或更高版本,或者使用 3.10.0-514 及更高版本的 RHEL 或 CentOS。

前置條件

OverlayFS is the recommended storage driver, and supported if you meet the following prerequisites:

OverlayFS 是推薦的儲存驅動程式,如果滿足以下先決條件,則受支援:

  • Version 4.0 or higher of the Linux kernel, or RHEL or CentOS using version 3.10.0-514 of the kernel or higher. If you use an older kernel, you need to use the overlay driver, which is not recommended.

  • Linux 核心的版本 4.0 或更高版本,或使用核心的版本 3.10.0-514 或更高版本的 RHEL 或 CentOS。如果您使用較舊的核心,則需要使用 overlay 驅動程式,不建議這樣做。

  • The overlay and overlay2 drivers are supported on xfs backing filesystems, but only with d_type=true enabled.

  • xfs 支援檔案系統支援 overlayoverlay2 驅動程式,但僅在啟用了d_type = true 的情況下。

    Use xfs_info to verify that the ftype option is set to 1. To format an xfs filesystem correctly, use the flag -n ftype=1.

    使用xfs_info來驗證ftype選項是否設定為1。要正確格式化 xfs 檔案系統,請使用標誌 -n ftype = 1

Warning: Running on XFS without d_type support now causes Docker to skip the attempt to use the overlay or overlay2 driver. Existing installs will continue to run, but produce an error. This is to allow users to migrate their data. In a future version, this will be a fatal error, which will prevent Docker from starting.

警告:在不支援 d_type 的 XFS 上執行現在會導致 Docker 跳過使用 overlayoverlay2 驅動程式的嘗試。現有安裝將繼續執行,但會提示錯誤,這是為了允許使用者遷移其資料。在將來的版本中,這將是一個致命錯誤,它將阻止Docker啟動。

  • Changing the storage driver makes existing containers and images inaccessible on the local system. Use docker save to save any images you have built or push them to Docker Hub or a private registry before changing the storage driver, so that you do not need to re-create them later.

  • 更改儲存驅動程式將使現有容器和映像在本地系統上不可訪問。使用 docker save 儲存您已構建的任何映像,或在更改儲存驅動程式之前將其推送到 Docker Hub 或私有登錄檔,這樣您以後就無需重新建立它們。

overlay2 驅動怎樣工作

If you are still using the overlay driver rather than overlay2, see How the overlay driver works instead.

如果您仍在使用 overlay 驅動程式而不是 overlay2,請參閱 overlay 驅動程式的工作原理。

OverlayFS layers two directories on a single Linux host and presents them as a single directory. These directories are called layers and the unification process is referred to as a union mount. OverlayFS refers to the lower directory as lowerdir and the upper directory a upperdir. The unified view is exposed through its own directory called merged.

OverlayFS 在單個 Linux 主機上有兩個目錄,並將它們合併顯示為單個目錄。這些目錄稱為 layers,合併的過程稱為聯合掛載。OverlayFS 將較低的目錄稱為 lowerdir,而將較高的目錄稱為 upperdir。 合併後的檢視通過 merged目錄暴露。

The overlay2 driver natively supports up to 128 lower OverlayFS layers. This capability provides better performance for layer-related Docker commands such as docker build and docker commit, and consumes fewer inodes on the backing filesystem.

overlay2 驅動程式原生支援多達 128 個較低的 OverlayFS 層。此功能可為與層相關的 Docker 命令(如docker build 和 docker commit)提供更好的效能,並在支援檔案系統上消耗更少的 inode。

磁碟上的映象和容器層

After downloading a five-layer image using docker pull ubuntu, you can see six directories under /var/lib/docker/overlay2.

使用 docker pull ubuntu 下載五層映象後,可以在 /var/lib/docker/overlay2 下看到六個目錄。

Warning: Do not directly manipulate any files or directories within /var/lib/docker/. These files and directories are managed by Docker.

警告:請勿直接在 /var/lib/docker/ 中操作任何檔案或目錄。這些檔案和目錄由 Docker 管理。

$ docker pull ubuntu
Using default tag: latest
latest: Pulling from library/ubuntu
6a5697faee43: Pull complete 
ba13d3bc422b: Pull complete 
a254829d9e55: Pull complete 
Digest: sha256:fff16eea1a8ae92867721d90c59a75652ea66d29c05294e6e2f898704bdb8cf1
Status: Downloaded newer image for ubuntu:latest
docker.io/library/ubuntu:latest

$ ls -l
total 0
drwx------    4 root     root            55 Nov 24 10:51 57bb3984fedd79355d16d090824068eb25a262897bd03f4cc903458a270fb135
drwx------    4 root     root            72 Nov 24 10:51 63492365c5b50b0b87988f7d7b90c817f87d9a4a88be4efd604ece8ce987e932
drwx------    3 root     root            47 Nov 24 10:51 654a21568916b79f8ce2b0b9795710980ef429d3b4d136a1cd6fdefabe8a4bed
brw-------    1 root     root        8,  16 Nov 24 10:50 backingFsBlockDev
drwx------    2 root     root           108 Nov 24 10:51 l

The new l (lowercase L) directory contains shortened layer identifiers as symbolic links. These identifiers are used to avoid hitting the page size limitation on arguments to the mount command.

新的 l(小寫L)目錄包含縮短的層識別符號作為符號連結。這些識別符號用於避免在 mount 命令的引數上達到頁面大小限制。

$ ls -l l
total 0
lrwxrwxrwx    1 root     root            72 Nov 24 10:51 6ZP27KCNIOAMZOSGMQJXZPXB5J -> ../654a21568916b79f8ce2b0b9795710980ef429d3b4d136a1cd6fdefabe8a4bed/diff
lrwxrwxrwx    1 root     root            72 Nov 24 10:51 OOHUGSX2SX7GFNVJQ6TQL4ZWW7 -> ../63492365c5b50b0b87988f7d7b90c817f87d9a4a88be4efd604ece8ce987e932/diff
lrwxrwxrwx    1 root     root            72 Nov 24 10:51 TKENHGOKTAE4UDAMIQPWY65JBU -> ../57bb3984fedd79355d16d090824068eb25a262897bd03f4cc903458a270fb135/diff

The lowest layer contains a file called link, which contains the name of the shortened identifier, and a directory called diff which contains the layer’s contents.

最低層包含一個名為 link 的檔案,其中包含縮短的識別符號的名稱;一個目錄是 diff,其中包含該層的內容。

[node1] (local) [email protected] /var/lib/docker/overlay2
$ cat 654a21568916b79f8ce2b0b9795710980ef429d3b4d136a1cd6fdefabe8a4bed/link 
6ZP27KCNIOAMZOSGMQJXZPXB5J[node1] (local) [email protected] /var/lib/docker/overlay2
$ ls -l l
total 0
lrwxrwxrwx    1 root     root            72 Nov 24 10:51 6ZP27KCNIOAMZOSGMQJXZPXB5J -> ../654a21568916b79f8ce2b0b9795710980ef429d3b4d136a1cd6fdefabe8a4bed/diff
lrwxrwxrwx    1 root     root            72 Nov 24 10:51 OOHUGSX2SX7GFNVJQ6TQL4ZWW7 -> ../63492365c5b50b0b87988f7d7b90c817f87d9a4a88be4efd604ece8ce987e932/diff
lrwxrwxrwx    1 root     root            72 Nov 24 10:51 TKENHGOKTAE4UDAMIQPWY65JBU -> ../57bb3984fedd79355d16d090824068eb25a262897bd03f4cc903458a270fb135/diff
[node1] (local) [email protected] /var/lib/docker/overlay2
$ cat 654a21568916b79f8ce2b0b9795710980ef429d3b4d136a1cd6fdefabe8a4bed/link 
6ZP27KCNIOAMZOSGMQJXZPXB5J[node1] (local) [email protected] /var/lib/docker/overlay2
$ ls 57bb3984fedd79355d16d090824068eb25a262897bd03f4cc903458a270fb135/diff/
run
[node1] (local) [email protected] /var/lib/docker/overlay2
$ ls 63492365c5b50b0b87988f7d7b90c817f87d9a4a88be4efd604ece8ce987e932/diff
etc  usr  var
[node1] (local) [email protected] /var/lib/docker/overlay2
$ ls 654a21568916b79f8ce2b0b9795710980ef429d3b4d136a1cd6fdefabe8a4bed/diff
bin     dev     home    lib32   libx32  mnt     proc    run     srv     tmp     var
boot    etc     lib     lib64   media   opt     root    sbin    sys     usr
[node1] (local) [email protected] /var/lib/docker/overlay2
$ ^C
[node1] (local) [email protected] /var/lib/docker/overlay2
$ 

The second-lowest layer, and each higher layer, contain a file called lower, which denotes its parent, and a directory called diff which contains its contents. It also contains a merged directory, which contains the unified contents of its parent layer and itself, and a work directory which is used internally by OverlayFS.

第二最低的層,以及每個較高的層,包含一個名為 lower 的檔案(表示其父檔案)和一個名為 diff 的目錄,該檔案包含其內容。它還包含一個 merged 目錄,該目錄包含其父層及其本身的統一內容,以及一個 work 目錄,供 OverlayFS 內部使用。

容器使用 overlay 讀寫

讀取檔案

Consider three scenarios where a container opens a file for read access with overlay.

考慮三種容器使用 overlay 開啟檔案來讀取的場景。

  • The file does not exist in the container layer: If a container opens a file for read access and the file does not already exist in the container (upperdir) it is read from the image (lowerdir). This incurs very little performance overhead.

  • 檔案在容器層中不存在:如果容器開啟檔案進行讀取訪問,並且檔案在容器中不存在(upperdir),則從映象(lowerdir)中讀取。這幾乎不會產生效能開銷。

  • The file only exists in the container layer: If a container opens a file for read access and the file exists in the container (upperdir) and not in the image (lowerdir), it is read directly from the container.

  • 檔案僅存在於容器層中:如果容器開啟檔案進行讀取訪問,並且檔案存在於容器中(upperdir)而不存在於映象中(lowerdir),則直接從容器層讀取。

  • The file exists in both the container layer and the image layer: If a container opens a file for read access and the file exists in the image layer and the container layer, the file’s version in the container layer is read. Files in the container layer (upperdir) obscure files with the same name in the image layer (lowerdir).

  • 檔案同時存在於容器層和影象層中:如果容器開啟檔案進行讀取訪問,並且檔案存在於映象層和容器層中,則將讀取容器層中檔案的版本。容器層(upperdir)中的檔案會遮擋映象層(lowerdir)中具有相同名稱的檔案。

修改檔案或者目錄

Consider some scenarios where files in a container are modified.

考慮在某些情況下修改了容器中的檔案。

  • Writing to a file for the first time: The first time a container writes to an existing file, that file does not exist in the container (upperdir). The overlay/overlay2 driver performs a copy_up operation to copy the file from the image (lowerdir) to the container (upperdir). The container then writes the changes to the new copy of the file in the container layer.

  • 第一次寫入檔案:容器第一次寫入現有檔案時,該檔案在容器中不存在(upperdir)。overlay/overlay2 驅動程式執行 copy_up 操作,以將檔案從映象(lowerdir)複製到容器(upperdir)。然後,容器將更改寫入容器層中檔案的新副本。

    However, OverlayFS works at the file level rather than the block level. This means that all OverlayFS copy_up operations copy the entire file, even if the file is very large and only a small part of it is being modified. This can have a noticeable impact on container write performance. However, two things are worth noting:

    但是,OverlayFS 在檔案級別而不是塊級別工作。這意味著所有 OverlayFS copy_up 操作都將複製整個檔案,即使該檔案非常大且只有一小部分正在被修改。這會對容器寫入效能產生明顯影響。其中,有兩點值得注意:

    • The copy_up operation only occurs the first time a given file is written to. Subsequent writes to the same file operate against the copy of the file already copied up to the container.

    • 複製操作僅在第一次寫入給定檔案時發生。隨後對同一檔案的寫入將對已經複製到容器的檔案副本進行操作。

    • OverlayFS only works with two layers. This means that performance should be better than AUFS, which can suffer noticeable latencies when searching for files in images with many layers. This advantage applies to both overlay and overlay2 drivers. overlayfs2 is slightly less performant than overlayfs on initial read, because it must look through more layers, but it caches the results so this is only a small penalty.

    • OverlayFS 僅有兩層。這意味著效能應優於 AUFS,因為 AUFS 在多層影象中搜索檔案時會出現明顯的延遲。這一優勢同時適用於 overlayoverlay2 驅動程式。 overlayfs2 在初次讀取時的效能比 overlayfs 稍差,因為它必須遍歷更多的層,但是會快取結果,因此這只是一個小小的代價。

  • Deleting files and directories:

  • 刪除檔案和目錄

    • When a file is deleted within a container, a whiteout file is created in the container (upperdir). The version of the file in the image layer (lowerdir) is not deleted (because the lowerdir is read-only). However, the whiteout file prevents it from being available to the container.

    • 在容器中刪除檔案時,會在容器中建立 whiteout 檔案(upperdir)。映象層中的檔案版本(lowerdir)不會被刪除(因為lowerdir是隻讀的)。但是,whiteout 檔案會阻止容器使用它。

    • When a directory is deleted within a container, an opaque directory is created within the container (upperdir). This works in the same way as a whiteout file and effectively prevents the directory from being accessed, even though it still exists in the image (lowerdir).

    • 當在容器內刪除目錄時,會在容器內建立 opaque directoryupperdir)。這與 whiteout 檔案的工作方式相同,並且即使該目錄仍然存在於映象中,也有效地阻止了該目錄的訪問(“ lowerdir”)。

  • Renaming directories: Calling rename(2) for a directory is allowed only when both the source and the destination path are on the top layer. Otherwise, it returns EXDEV error (“cross-device link not permitted”). Your application needs to be designed to handle EXDEV and fall back to a “copy and unlink” strategy.

  • 重新命名目錄:僅當源路徑和目標路徑都位於頂層時,才允許為目錄呼叫 rename(2)。否則,它將返回 EXDEV 錯誤(“不允許跨裝置連結”)。您的應用程式需要設計為處理 EXDEV 並退回到“複製和取消連結”策略。

OverlayFS and Docker Performance

Both overlay2 and overlay drivers are more performant than aufs and devicemapper. In certain circumstances, overlay2 may perform better than btrfs as well. However, be aware of the following details.

overlay2overlay 驅動程式都比 aufsdevicemapper 效能更高。在某些情況下,overlay2 的效果也可能會優於 btrfs。但是,請注意以下詳細資訊。

  • Page Caching. OverlayFS supports page cache sharing. Multiple containers accessing the same file share a single page cache entry for that file. This makes the overlay and overlay2 drivers efficient with memory and a good option for high-density use cases such as PaaS.

  • 頁面快取。 OverlayFS 支援頁面快取共享。訪問同一檔案的多個容器共享該檔案的單個頁面快取條目。這使得 overlayoverlay2 驅動程式可以有效地利用記憶體,並且是高密度用例(例如PaaS)的不錯選擇。

  • copy_up. As with AUFS, OverlayFS performs copy-up operations whenever a container writes to a file for the first time. This can add latency into the write operation, especially for large files. However, once the file has been copied up, all subsequent writes to that file occur in the upper layer, without the need for further copy-up operations.

  • 複製。與 AUFS 一樣,每當容器第一次寫入檔案時,OverlayFS 都會執行復制操作。這會增加寫入操作的延遲,尤其是對於大檔案。但是,一旦檔案被複制,對該檔案的所有後續寫入都將在上層進行,而無需進行進一步的複製操作。

    The OverlayFS copy_up operation is faster than the same operation with AUFS, because AUFS supports more layers than OverlayFS and it is possible to incur far larger latencies if searching through many AUFS layers. overlay2 supports multiple layers as well, but mitigates any performance hit with caching.

    OverlayFS 的 copy_up 操作比 AUFS 的相同操作要快,這是因為 AUFS 支援的圖層比 OverlayFS 還要多,並且如果搜尋許多 AUFS 圖層,可能會產生更大的延遲。overlay2 也支援多層,但是可以減輕快取對效能的影響。

  • Inode limits. Use of the legacy overlay storage driver can cause excessive inode consumption. This is especially true in the presence of a large number of images and containers on the Docker host. The only way to increase the number of inodes available to a filesystem is to reformat it. To avoid running into this issue, it is highly recommended that you use overlay2 if at all possible.

  • Inode 限制。使用傳統的 overlay 儲存驅動程式可能會導致 inode 消耗過多。在 Docker 主機上存在大量映象和容器的情況下尤其如此。增加檔案系統可用的索引節點數量的唯一方法是對其進行重新格式化。為了避免遇到此問題,強烈建議您儘可能使用 overlay2

Performance best practices

The following generic performance best practices also apply to OverlayFS.

以下通用效能最佳實踐也適用於 OverlayFS。

  • Use fast storage: Solid-state drives (SSDs) provide faster reads and writes than spinning disks.

  • 使用快速儲存:固態驅動器(SSD)提供比旋轉磁碟更快的讀寫速度。

  • Use volumes for write-heavy workloads: Volumes provide the best and most predictable performance for write-heavy workloads. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. Volumes have other benefits, such as allowing you to share data among containers and persisting your data even if no running container is using them.

  • 將資料卷用於繁重的工作負載:卷可為繁重的工作負載提供最佳和最可預測的效能。 這是因為它們繞過了儲存驅動程式,並且不會產生任何精簡配置和寫時複製所帶來的潛在開銷。卷還有其他好處,例如,即使沒有執行中的容器正在使用它們,它也允許您在容器之間共享資料並保留資料。

總結

介紹了 overlay 的實現方式,以及在容器中的如何處理檔案的增刪改查等。