聊聊你可能誤解的Kubernetes Deployment滾動更新機制
Author: [email protected]
摘要: Kubernetes Deployment滾動更新機制不同於ReplicationController rolling update,Deployment rollout還提供了滾動進度查詢,滾動歷史記錄,回滾等能力,無疑是使用Kubernetes進行應用滾動釋出的首選。本博文,將帶你聊聊那些容易被大家忽略或者誤解的特性。
定義Deployment時與rolling update的相關項
以下面的frontend Deployment為例,重點關注.spec.minReadySeconds
,.spec.strategy.rollingUpdate.maxSurge
.spec.strategy.rollingUpdate. maxUnavailable
。
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: frontend
spec:
minReadySeconds: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 3
maxUnavailable: 2
replicas: 25
template:
metadata:
labels:
app: guestbook
tier: frontend
spec:
containers:
- name: php-redis
image: gcr.io/google-samples/gb-frontend:v4
resources:
requests:
cpu: 100 m
memory: 100Mi
env:
- name: GET_HOSTS_FROM
value: dns
# If your cluster config does not include a dns service, then to
# instead access environment variables to find service host
# info, comment out the 'value: dns' line above, and uncomment the
# line below:
# value : env
ports:
- containerPort: 80
.spec.minReadySeconds
: 新建立的Pod狀態為Ready持續的時間至少為.spec.minReadySeconds
才認為Pod Available(Ready)。.spec.strategy.rollingUpdate.maxSurge
: specifies the maximum number of Pods that can be created over the desired number of Pods. The value cannot be 0 if MaxUnavailable is 0. 可以為整數或者百分比,預設為desired Pods數的25%. Scale Up新的ReplicaSet時,按照比例計算出允許的MaxSurge,計算時向上取整(比如3.4,取4)。.spec.strategy.rollingUpdate.maxUnavailable
: specifies the maximum number of Pods that can be unavailable during the update process. The value cannot be 0 if maxSurge is 0.可以為整數或者百分比,預設為desired Pods數的25%. Scale Down舊的ReplicaSet時,按照比例計算出允許的maxUnavailable,計算時向下取整(比如3.6,取3)。
因此,在Deployment rollout時,需要保證Available(Ready) Pods數不低於 desired pods number - maxUnavailable
; 保證所有的Pods數不多於 desired pods number + maxSurge
。
滾動更新的流程
Note: A Deployment’s rollout is triggered if and only if the Deployment’s pod template (that is, .spec.template) is changed, for example if the labels or container images of the template are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.
我們繼續以上面的Deployment為例子,並考慮最常用的情況–更新image(釋出新版本):
kubectl set image deploy frontend php-redis=gcr.io/google-samples/gb-frontend:v3 --record
set image之後,導致Deployment’s Pod Template發生變化,就會觸發rollout。我們只考慮RollingUpdate策略(Kubernetes還支援ReCreate更新策略)。通過kubectl get rs -w
來watch ReplicaSet的變化。
[root@master03 ~]# kubectl get rs -w
NAME DESIRED CURRENT READY AGE
frontend-3114648124 25 25 25 14m
frontend-3099797709 0 0 0 1h
frontend-3099797709 0 0 0 1h
frontend-3099797709 3 0 0 1h
frontend-3114648124 23 25 25 17m
frontend-3099797709 5 0 0 1h
frontend-3114648124 23 25 25 17m
frontend-3114648124 23 23 23 17m
frontend-3099797709 5 0 0 1h
frontend-3099797709 5 3 0 1h
frontend-3099797709 5 5 0 1h
frontend-3099797709 5 5 1 1h
frontend-3114648124 22 23 23 17m
frontend-3099797709 5 5 2 1h
frontend-3114648124 22 23 23 17m
frontend-3114648124 22 22 22 17m
frontend-3099797709 6 5 2 1h
frontend-3114648124 21 22 22 17m
frontend-3099797709 6 5 2 1h
frontend-3114648124 21 22 22 17m
frontend-3099797709 7 5 2 1h
frontend-3099797709 7 6 2 1h
frontend-3114648124 21 21 21 17m
frontend-3099797709 7 6 2 1h
frontend-3099797709 7 7 2 1h
frontend-3099797709 7 7 2 1h
frontend-3099797709 7 7 3 1h
frontend-3099797709 7 7 4 1h
frontend-3114648124 20 21 21 17m
frontend-3099797709 8 7 4 1h
frontend-3114648124 20 21 21 17m
frontend-3114648124 20 20 20 17m
frontend-3099797709 8 7 4 1h
frontend-3099797709 8 8 4 1h
frontend-3099797709 8 8 5 1h
frontend-3114648124 19 20 20 17m
frontend-3099797709 9 8 5 1h
frontend-3114648124 19 20 20 17m
frontend-3099797709 9 8 5 1h
frontend-3099797709 9 9 5 1h
frontend-3114648124 19 19 19 17m
frontend-3099797709 9 9 5 1h
frontend-3114648124 18 19 19 18m
frontend-3099797709 10 9 5 1h
frontend-3114648124 18 19 19 18m
frontend-3099797709 10 9 5 1h
frontend-3114648124 18 18 18 18m
frontend-3099797709 10 10 5 1h
frontend-3099797709 10 10 5 1h
frontend-3114648124 18 18 18 18m
frontend-3099797709 10 10 6 1h
frontend-3099797709 10 10 6 1h
frontend-3114648124 17 18 18 18m
frontend-3114648124 17 18 18 18m
frontend-3099797709 11 10 6 1h
frontend-3099797709 11 10 6 1h
frontend-3114648124 17 17 17 18m
frontend-3099797709 11 11 6 1h
說明:
1. frontend-3114648124為原來的RS(成為OldRS),frontend-3099797709為新建的RS(成為NewRS,當然也可能是Old RS,如果之前執行過這個一樣的內容)。
2. maxSurge:3, maxUnavailable=2, desired replicas=25
- NewRS建立maxSurge(3)個Pods,這時達到pods數的上限值
desired replicas + maxSurge
(28個) - 不會等NewRS建立的Pods Ready,而是馬上delete OldRS maxUnavailable(2)個Pods,這時Ready的Pods number最差也能保證
desired replicas - maxUnavailable
(23個) - 接下來的流程是不固定,只要新建的Pods有幾個返回Ready,則意味著可以接著刪除幾個舊的Pods了。只要有幾個刪除成功的Pods返回,就會建立一定數量的Pods,只要All pods數量與上限值
desired replicas + maxSurge
有差值空間,就會接著建立新的Pods。 - 如此進行滾動更新, 直到建立的新Pods個數達到
desired replicas
,並等待它們都Ready,然後再刪除所有剩餘的舊的Pods。至此,滾動流程結束。
對同一個Deployment先後觸發滾動更新,邏輯如何?
我們考慮這個情況,但使用者執行某個滾動更新後,未等待此次滾動更新結束,就繼續執行了一次新的滾動更新請求,這時後臺滾動流程會怎麼樣呢?會亂成一鍋粥麼?
我們繼續以這個例子來看:
# deploy frontend 穩定執行在v2(frontend-888714875)時:
[[email protected] ~]# kubectl get rs -w
NAME DESIRED CURRENT READY AGE
====執行 kubectl set image deploy frontend php-redis=gcr.io/google-samples/gb-frontend:v3 --record
----備註: v3 --> frontend-776431694
frontend-776431694 0 0 0 6h
frontend-776431694 0 0 0 6h
frontend-776431694 3 0 0 6h
frontend-888714875 23 25 25 5h
frontend-776431694 5 0 0 6h
frontend-888714875 23 25 25 5h
frontend-888714875 23 23 23 5h
frontend-776431694 5 0 0 6h
frontend-776431694 5 3 0 6h
frontend-776431694 5 5 0 6h
frontend-776431694 5 5 1 6h
frontend-776431694 5 5 2 6h
frontend-776431694 5 5 3 6h
frontend-776431694 5 5 4 6h
frontend-776431694 5 5 4 6h
frontend-888714875 22 23 23 5h
frontend-776431694 6 5 4 6h
frontend-888714875 22 23 23 5h
frontend-888714875 22 22 22 5h
frontend-776431694 6 5 4 6h
frontend-776431694 6 6 4 6h
frontend-776431694 6 6 4 6h
frontend-888714875 19 22 22 5h
frontend-776431694 9 6 4 6h
frontend-888714875 19 22 22 5h
frontend-776431694 9 6 4 6h
frontend-888714875 19 19 19 5h
frontend-776431694 9 9 4 6h
frontend-888714875 19 19 19 5h
==== 執行 kubectl set image deploy frontend php-redis=gcr.io/google-samples/gb-frontend:v4 --record ====
----- 備註:v4 --> frontend-3099797709 ----
frontend-3099797709 0 0 0 6h
frontend-3099797709 0 0 0 6h
frontend-776431694 4 9 4 6h
frontend-3099797709 5 0 0 6h
frontend-3099797709 5 0 0 6h
frontend-3099797709 5 5 0 6h
frontend-776431694 4 9 4 6h
frontend-776431694 4 4 4 6h
frontend-3099797709 5 5 0 6h
frontend-3099797709 5 5 1 6h
frontend-3099797709 5 5 2 6h
frontend-3099797709 5 5 3 6h
frontend-3099797709 5 5 4 6h
frontend-3099797709 5 5 4 6h
frontend-776431694 2 4 4 6h
frontend-3099797709 7 5 4 6h
frontend-776431694 2 4 4 6h
frontend-776431694 2 2 2 6h
frontend-776431694 2 2 2 6h
frontend-3099797709 7 5 4 6h
frontend-776431694 0 2 2 6h
frontend-3099797709 7 7 4 6h
frontend-776431694 0 2 2 6h
frontend-3099797709 9 7 4 6h
frontend-776431694 0 0 0 6h
frontend-3099797709 9 7 4 6h
frontend-3099797709 9 9 4 6h
frontend-776431694 0 0 0 6h
frontend-3099797709 9 9 4 6h
frontend-3099797709 9 9 5 6h
frontend-3099797709 9 9 6 6h
frontend-3099797709 9 9 7 6h
frontend-888714875 17 19 19 5h
frontend-3099797709 11 9 7 6h
frontend-888714875 17 19 19 5h
frontend-888714875 17 17 17 5h
frontend-3099797709 11 9 7 6h
frontend-888714875 16 17 17 5h
frontend-3099797709 11 11 7 6h
frontend-3099797709 12 11 7 6h
frontend-888714875 16 17 17 5h
frontend-888714875 16 16 16 5h
frontend-3099797709 12 11 7 6h
frontend-3099797709 12 12 7 6h
frontend-3099797709 12 12 8 6h
frontend-3099797709 12 12 8 6h
frontend-888714875 15 16 16 5h
frontend-3099797709 13 12 8 6h
frontend-888714875 15 16 16 5h
frontend-888714875 15 15 15 5h
frontend-3099797709 13 12 8 6h
frontend-3099797709 13 13 8 6h
frontend-3099797709 13 13 8 6h
frontend-3099797709 13 13 9 6h
frontend-3099797709 13 13 10 6h
frontend-888714875 14 15 15 5h
frontend-3099797709 14 13 10 6h
frontend-888714875 14 15 15 5h
frontend-888714875 14 14 14 5h
frontend-3099797709 14 13 10 6h
frontend-888714875 14 14 14 5h
frontend-3099797709 14 14 11 6h
frontend-3099797709 14 14 12 6h
frontend-3099797709 14 14 12 6h
frontend-3099797709 14 14 12 6h
frontend-888714875 11 14 14 5h
frontend-3099797709 17 14 12 6h
frontend-888714875 11 14 14 5h
frontend-3099797709 17 14 12 6h
frontend-888714875 11 11 11 5h
frontend-3099797709 17 17 12 6h
frontend-888714875 11 11 11 5h
frontend-3099797709 17 17 12 6h
frontend-3099797709 17 17 13 6h
frontend-3099797709 17 17 14 6h
frontend-3099797709 17 17 14 6h
frontend-888714875 10 11 11 5h
frontend-3099797709 18 17 14 6h
frontend-888714875 10 11 11 5h
frontend-888714875 10 10 10 5h
frontend-3099797709 18 17 14 6h
frontend-3099797709 18 18 14 6h
frontend-3099797709 18 18 15 6h
frontend-888714875 9 10 10 5h
frontend-3099797709 18 18 16 6h
frontend-888714875 9 10 10 5h
frontend-3099797709 19 18 16 6h
frontend-3099797709 19 18 16 6h
frontend-888714875 9 9 9 5h
frontend-888714875 7 9 9 5h
frontend-3099797709 19 18 16 6h
frontend-888714875 7 9 9 5h
frontend-3099797709 21 18 16 6h
frontend-888714875 7 9 9 5h
frontend-3099797709 21 19 16 6h
frontend-888714875 7 7 7 5h
frontend-3099797709 21 21 16 6h
frontend-888714875 7 7 7 5h
frontend-3099797709 21 21 17 6h
frontend-3099797709 21 21 18 6h
frontend-3099797709 21 21 18 6h
frontend-888714875 5 7 7 5h
frontend-888714875 5 7 7 5h
frontend-3099797709 23 21 18 6h
frontend-888714875 5 5 5 5h
frontend-3099797709 23 21 18 6h
frontend-3099797709 23 23 18 6h
frontend-3099797709 23 23 18 6h
frontend-3099797709 23 23 19 6h
frontend-3099797709 23 23 20 6h
frontend-3099797709 23 23 20 6h
frontend-888714875 3 5 5 5h
frontend-3099797709 25 23 20 6h
frontend-888714875 3 5 5 5h
frontend-888714875 3 3 3 5h
frontend-3099797709 25 23 20 6h
frontend-888714875 3 3 3 5h
frontend-3099797709 25 25 20 6h
frontend-3099797709 25 25 21 6h
frontend-3099797709 25 25 22 6h
frontend-3099797709 25 25 22 6h
frontend-888714875 2 3 3 5h
frontend-888714875 2 3 3 5h
frontend-888714875 2 2 2 5h
frontend-888714875 2 2 2 5h
frontend-3099797709 25 25 23 6h
frontend-888714875 1 2 2 5h
frontend-888714875 1 2 2 5h
frontend-888714875 1 1 1 5h
frontend-3099797709 25 25 23 6h
frontend-888714875 0 1 1 5h
frontend-888714875 0 1 1 5h
frontend-888714875 0 0 0 5h
frontend-3099797709 25 25 24 6h
frontend-3099797709 25 25 25 6h
frontend-3099797709 25 25 25 6h
說明:
deployment frontend穩定執行在v2版本(RS:frontend-888714875),然後執行kubectl set image觸發滾動更新到v3版本(RS: frontend-776431694), 當v3 RS的desired個數scale up到9個,ready個數為4個時,使用者又執行kubectl set image觸發滾動更新到v4版本(RS: frontend-3099797709)。說明,我自己是這樣玩的,先建立的v4 RS,然後v3 RS,然後v2 RS。因此按照建立時間從新到舊排序RS為,v2–>v3–>v4。
- v2到v3的滾動流程同上一小節的描述;
- 當新的滾動流程觸發後,按照RS建立時間排序,最新(除v4外)的v2的RS保持不動,不會繼續scale down。
- 然後v4將通過滾動更新的方式把已經scale up的9個最老的v3 RS的pods替換掉,將所有v3的Pods升級到v4。
- 最後再接著v4 RS滾動更新把v2的RS所有的舊Pods都升級到v4。
- 整個完整的滾動流程中,都必須遵守maxSurge和maxUnavailable的約束,不能越雷池半步。
設想一個更復雜的場景:如果在上述v4滾動更新替換到半吊子的v3 RS過程中,使用者又觸發了一個滾動更新到v5版本,流程會怎麼樣呢?
不要怕,原理是一樣的,Deployment rolling update總是先把最老的RS滾動更新替換掉,然後逐步把新的RS滾動更新替換掉,直到最最新的那個RS scale down為0,流程就結束了。
理解rollout pause和resume
或許很多人至今還會這麼覺得:整個滾動更新的過程中,一旦使用者執行了kubectl rollout pause deploy/frontend
後,正在執行的滾動流程就會立刻停止,然後使用者執行kubectl rollout resume deploy/frontend
就會繼續未完成的滾動更新。那你就大錯特錯了!
kubectl rollout pause
只會用來停止觸發下一次rollout。什麼意思呢? 上面描述的這個場景,正在執行的滾動歷程是不會停下來的,而是會繼續正常的進行滾動,直到完成。等下一次,使用者再次觸發rollout時,Deployment就不會真的去啟動執行滾動更新了,而是等待使用者執行了kubectl rollout resume
,流程才會真正啟動執行。
ReplicaSet和rollout history的關係
前提,你要知道關於
--record
:
Setting the kubectl flag –record to true allows you to record current command in the annotations of the resources being created or updated.
預設情況下,所有通過kubectl xxxx –record都會被kubernetes記錄到etcd進行持久化,這無疑會佔用資源,最重要的是,時間久了,當你kubectl get rs時,會有成百上千的垃圾RS返回給你,那時你可能就眼花繚亂了。
上生產時,我們最好通過設定Deployment的.spec.revisionHistoryLimit
來限制最大保留的revision number,比如15個版本,回滾的時候一般只會回滾到最近的幾個版本就足夠了。
執行下面的命令,可以返回某個Deployment的所有record記錄:
$ kubectl rollout history deployment/nginx-deployment
deployments "nginx-deployment"
REVISION CHANGE-CAUSE
1 kubectl create -f docs/user-guide/nginx-deployment.yaml --record
2 kubectl set image deployment/nginx-deployment nginx=nginx:1.9.1
3 kubectl set image deployment/nginx-deployment nginx=nginx:1.91
然後執行rollout undo命令就可以回滾到to-revision
指定的版本。
kubectl rollout undo deployment/nginx-deployment --to-revision=2
deployment "nginx-deployment" rolled back
其實rollout history
中記錄的revision都和ReplicaSets一一對應。如果手動delete某個ReplicaSet,對應的rollout history就會被刪除,也就是還說你無法回滾到這個revison了。
roolout history和ReplicaSet的對應關係,可以在kubectl describe rs $RSNAME
返回的revision欄位中得到,這裡的revision就對應著roolout history返回的revison。
回滾是如何進行的
使用者通過執行rollout undo並指定--to-revison
,可以將Deployment回滾到指定的revision。
kubectl rollout undo deploy frontend --to-revision=7
通過觀察後端RS的資料變化,同樣發現,回滾的時候也是按照滾動的機制進行的,同樣要遵守maxSurge和maxUnavailable的約束。並不是一次性將所有的Pods刪除,然後再一次性建立新的Pods。
[root@master01 ~]# kubectl get rs -w
NAME DESIRED CURRENT READY AGE
frontend-888714875 3 0 0 23h
frontend-776431694 8 10 10 23h
frontend-888714875 5 0 0 23h
frontend-776431694 8 10 10 23h
frontend-776431694 8 8 8 23h
frontend-888714875 5 0 0 23h
frontend-888714875 5 3 0 23h
frontend-888714875 5 5 0 23h
frontend-888714875 5 5 1 23h
frontend-888714875 5 5 2 23h
frontend-888714875 5 5 4 23h
frontend-776431694 6 8 8 23h
frontend-888714875 5 5 4 23h
frontend-888714875 5 5 5 23h
frontend-776431694 6 8 8 23h
frontend-888714875 7 5 5 23h
frontend-776431694 6 6 6 23h
frontend-776431694 3 6 6 23h
frontend-888714875 10 5 5 23h
frontend-776431694 3 6 6 23h
frontend-776431694 3 3 3 23h
frontend-888714875 10 5 5 23h
frontend-776431694 3 3 3 23h
frontend-888714875 10 7 5 23h
frontend-888714875 10 10 5 23h
frontend-888714875 10 10 6 23h
frontend-888714875 10 10 7 23h
frontend-888714875 10 10 8 23h
frontend-888714875 10 10 8 23h
frontend-888714875 10 10 9 23h
frontend-888714875 10 10 9 23h
frontend-888714875 10 10 9 23h
frontend-776431694 0 3 3 23h
frontend-776431694 0 3 3 23h
frontend-776431694 0 0 0 23h
frontend-888714875 10 10 10 23h
frontend-888714875 10 10 10 23h
總結
本博文介紹了關於Deployment rolling update那些容易被大家忽略或者誤解的特性,如果看完這篇博文,你覺得“我去! 本來就是這樣子的啊!”,那說明你對Deployment Controller非常熟悉。
- 介紹了Deployment時與rolling update的相關項;
- 說明了滾動更新的流程;
- 介紹了對同一個Deployment先後觸發滾動更新,邏輯如何?
- 正確理解rollout pause和resume
- 明白ReplicaSet和rollout history的內在關係
- 回滾的機制同滾動更新。