Kubernetes 滾動升級
Kubernetes Rolling Upgrade
背景
Kubernetes 是一個很好的容器應用叢集管理工具,尤其是採用ReplicaSet這種自動維護應用生命週期事件的物件後,將容器應用管理的技巧發揮得淋漓盡致。在容器應用管理的諸多特性中,有一個特性是最能體現Kubernetes強大的叢集應用管理能力的,那就是滾動升級。
滾動升級的精髓在於升級過程中依然能夠保持服務的連續性,使外界對於升級的過程是無感知的。整個過程中會有三個狀態,全部舊例項,新舊例項皆有,全部新例項。舊例項個數逐漸減少,新例項個數逐漸增加,最終達到舊例項個數為0,新例項個數達到理想的目標值。
kubernetes 滾動升級
Kubernetes 中採用ReplicaSet(簡稱RS)來管理Pod例項。如果當前叢集中的Pod例項數少於目標值,RS 會拉起新的Pod,反之,則根據策略刪除多餘的Pod。Deployment正是利用了這樣的特性,通過控制兩個RS裡面的Pod,從而實現升級。
滾動升級是一種平滑過渡式的升級,在升級過程中,服務仍然可用。這是kubernetes作為應用服務化管理的關鍵一步。服務無處不在,並且按需使用。這是雲端計算的初衷,對於PaaS平臺來說,應用抽象成服務,遍佈整個叢集,為應用提供隨時隨地可用的服務是PaaS的終極使命。
1.ReplicaSet
關於RS的概念大家都很清楚了,我們來看看在k8s原始碼中的RS。
type ReplicaSetController struct {
kubeClient clientset.Interface
podControl controller.PodControlInterface
// internalPodInformer is used to hold a personal informer. If we're using
// a normal shared informer, then the informer will be started for us. If
// we have a personal informer, we must start it ourselves. If you start
// the controller using NewReplicationManager(passing SharedInformer), this
// will be null
internalPodInformer framework.SharedIndexInformer
// A ReplicaSet is temporarily suspended after creating/deleting these many replicas.
// It resumes normal action after observing the watch events for them.
burstReplicas int
// To allow injection of syncReplicaSet for testing.
syncHandler func(rsKey string) error
// A TTLCache of pod creates/deletes each rc expects to see.
expectations *controller.UIDTrackingControllerExpectations
// A store of ReplicaSets, populated by the rsController
rsStore cache.StoreToReplicaSetLister
// Watches changes to all ReplicaSets
rsController *framework.Controller
// A store of pods, populated by the podController
podStore cache.StoreToPodLister
// Watches changes to all pods
podController framework.ControllerInterface
// podStoreSynced returns true if the pod store has been synced at least once.
// Added as a member to the struct to allow injection for testing.
podStoreSynced func() bool
lookupCache *controller.MatchingCache
// Controllers that need to be synced
queue *workqueue.Type
// garbageCollectorEnabled denotes if the garbage collector is enabled. RC
// manager behaves differently if GC is enabled.
garbageCollectorEnabled bool
}
這個結構體位於pkg/controllers/replicaset,這裡我們可以看出,RS最主要的幾個物件,一個是針對Pod的操作物件-podControl.看到這個名字就知道,這個物件是控制RS下面的Pod的生命週期的,我們看看這個PodControl所包含的方法。
// PodControlInterface is an interface that knows how to add or delete pods
// created as an interface to allow testing.
type PodControlInterface interface {
// CreatePods creates new pods according to the spec.
CreatePods(namespace string, template *api.PodTemplateSpec, object runtime.Object) error
// CreatePodsOnNode creates a new pod accorting to the spec on the specified node.
CreatePodsOnNode(nodeName, namespace string, template *api.PodTemplateSpec, object runtime.Object) error
// CreatePodsWithControllerRef creates new pods according to the spec, and sets object as the pod's controller.
CreatePodsWithControllerRef(namespace string, template *api.PodTemplateSpec, object runtime.Object, controllerRef *api.OwnerReference) error
// DeletePod deletes the pod identified by podID.
DeletePod(namespace string, podID string, object runtime.Object) error
// PatchPod patches the pod.
PatchPod(namespace, name string, data []byte) error
}
這裡我們可以看到,RS可以完全控制Pod.這裡有兩個watch,rsController和podController,他們分別負責watch ETCD中RS和Pod的變化。這裡一個重要的物件不得不提,那就是syncHandler,這個是所有Controller都有的物件。每一個控制器通過Watch來監視ETCD中的變化,使用sync的方式來同步這些物件的狀態,注意這個Handler只是一個委託,實際真正的Handler在建立控制器的時候指定。這種模式不僅僅適用於RS,其他控制器亦如此。
下面的邏輯更加清晰地說明了watch的邏輯。
rsc.rsStore.Store, rsc.rsController = framework.NewInformer(
&cache.ListWatch{
ListFunc: func(options api.ListOptions) (runtime.Object, error) {
return rsc.kubeClient.Extensions().ReplicaSets(api.NamespaceAll).List(options)
},
WatchFunc: func(options api.ListOptions) (watch.Interface, error) {
return rsc.kubeClient.Extensions().ReplicaSets(api.NamespaceAll).Watch(options)
},
},
&extensions.ReplicaSet{},
// TODO: Can we have much longer period here?
FullControllerResyncPeriod,
framework.ResourceEventHandlerFuncs{
AddFunc: rsc.enqueueReplicaSet,
UpdateFunc: rsc.updateRS,
// This will enter the sync loop and no-op, because the replica set has been deleted from the store.
// Note that deleting a replica set immediately after scaling it to 0 will not work. The recommended
// way of achieving this is by performing a `stop` operation on the replica set.
DeleteFunc: rsc.enqueueReplicaSet,
},
)
每次Watch到ETCD中的物件的變化,採取相應的措施,具體來說就是放入佇列,更新或者取出佇列。對於Pod來說,也有相應的處理。
podInformer.AddEventHandler(framework.ResourceEventHandlerFuncs{
AddFunc: rsc.addPod,
// This invokes the ReplicaSet for every pod change, eg: host assignment. Though this might seem like
// overkill the most frequent pod update is status, and the associated ReplicaSet will only list from
// local storage, so it should be ok.
UpdateFunc: rsc.updatePod,
DeleteFunc: rsc.deletePod,
})
RS基本的內容就這些,在RS的上層是Deployment,這個物件也是一個控制器。
// DeploymentController is responsible for synchronizing Deployment objects stored
// in the system with actual running replica sets and pods.
type DeploymentController struct {
client clientset.Interface
eventRecorder record.EventRecorder
// To allow injection of syncDeployment for testing.
syncHandler func(dKey string) error
// A store of deployments, populated by the dController
dStore cache.StoreToDeploymentLister
// Watches changes to all deployments
dController *framework.Controller
// A store of ReplicaSets, populated by the rsController
rsStore cache.StoreToReplicaSetLister
// Watches changes to all ReplicaSets
rsController *framework.Controller
// A store of pods, populated by the podController
podStore cache.StoreToPodLister
// Watches changes to all pods
podController *framework.Controller
// dStoreSynced returns true if the Deployment store has been synced at least once.
// Added as a member to the struct to allow injection for testing.
dStoreSynced func() bool
// rsStoreSynced returns true if the ReplicaSet store has been synced at least once.
// Added as a member to the struct to allow injection for testing.
rsStoreSynced func() bool
// podStoreSynced returns true if the pod store has been synced at least once.
// Added as a member to the struct to allow injection for testing.
podStoreSynced func() bool
// Deployments that need to be synced
queue workqueue.RateLimitingInterface
}
對於DeploymentController來說,需要監聽Deployment,RS和Pod。從Controller的建立過程中可以看出來。
dc.dStore.Store, dc.dController = framework.NewInformer(
&cache.ListWatch{
ListFunc: func(options api.ListOptions) (runtime.Object, error) {
return dc.client.Extensions().Deployments(api.NamespaceAll).List(options)
},
WatchFunc: func(options api.ListOptions) (watch.Interface, error) {
return dc.client.Extensions().Deployments(api.NamespaceAll).Watch(options)
},
},
&extensions.Deployment{},
FullDeploymentResyncPeriod,
framework.ResourceEventHandlerFuncs{
AddFunc: dc.addDeploymentNotification,
UpdateFunc: dc.updateDeploymentNotification,
// This will enter the sync loop and no-op, because the deployment has been deleted from the store.
DeleteFunc: dc.deleteDeploymentNotification,
},
)
dc.rsStore.Store, dc.rsController = framework.NewInformer(
&cache.ListWatch{
ListFunc: func(options api.ListOptions) (runtime.Object, error) {
return dc.client.Extensions().ReplicaSets(api.NamespaceAll).List(options)
},
WatchFunc: func(options api.ListOptions) (watch.Interface, error) {
return dc.client.Extensions().ReplicaSets(api.NamespaceAll).Watch(options)
},
},
&extensions.ReplicaSet{},
resyncPeriod(),
framework.ResourceEventHandlerFuncs{
AddFunc: dc.addReplicaSet,
UpdateFunc: dc.updateReplicaSet,
DeleteFunc: dc.deleteReplicaSet,
},
)
dc.podStore.Indexer, dc.podController = framework.NewIndexerInformer(
&cache.ListWatch{
ListFunc: func(options api.ListOptions) (runtime.Object, error) {
return dc.client.Core().Pods(api.NamespaceAll).List(options)
},
WatchFunc: func(options api.ListOptions) (watch.Interface, error) {
return dc.client.Core().Pods(api.NamespaceAll).Watch(options)
},
},
&api.Pod{},
resyncPeriod(),
framework.ResourceEventHandlerFuncs{
AddFunc: dc.addPod,
UpdateFunc: dc.updatePod,
DeleteFunc: dc.deletePod,
},
cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc},
)
dc.syncHandler = dc.syncDeployment
dc.dStoreSynced = dc.dController.HasSynced
dc.rsStoreSynced = dc.rsController.HasSynced
dc.podStoreSynced = dc.podController.HasSynced
這裡最核心的就是syncDeployment,因為這裡面有rollingUpdate和rollback的實現。在這裡如果watch到某個Deployment物件的RollbackTo.Revision部位nil,則執行rollingbach。這個Revision是版本號,注意雖然是回滾,但k8s內部記錄的版本號永遠是增長的。
有人會好奇,rollback是怎麼做到的,其實原理很簡單,k8s記錄了各個版本的PodTemplate,把舊的PodTemplate覆蓋新的Template即可。
對於K8S來說,升級有兩種方式,一種是重新構建,一種是滾動升級。
switch d.Spec.Strategy.Type {
case extensions.RecreateDeploymentStrategyType:
return dc.rolloutRecreate(d)
case extensions.RollingUpdateDeploymentStrategyType:
return dc.rolloutRolling(d)
}
這個rolloutRolling裡面包含了所有的祕密,這裡我們可以看到。
func (dc *DeploymentController) rolloutRolling(deployment *extensions.Deployment) error {
newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(deployment, true)
if err != nil {
return err
}
allRSs := append(oldRSs, newRS)
// Scale up, if we can.
scaledUp, err := dc.reconcileNewReplicaSet(allRSs, newRS, deployment)
if err != nil {
return err
}
if scaledUp {
// Update DeploymentStatus
return dc.updateDeploymentStatus(allRSs, newRS, deployment)
}
// Scale down, if we can.
scaledDown, err := dc.reconcileOldReplicaSets(allRSs, controller.FilterActiveReplicaSets(oldRSs), newRS, deployment)
if err != nil {
return err
}
if scaledDown {
// Update DeploymentStatus
return dc.updateDeploymentStatus(allRSs, newRS, deployment)
}
dc.cleanupDeployment(oldRSs, deployment)
// Sync deployment status
return dc.syncDeploymentStatus(allRSs, newRS, deployment)
}
這裡做了如下幾件事:
1. 查詢新的RS和舊的RS,並計算出新的Revision(這是Revision的最大值);
2. 對新的RS進行擴容操作;
3. 對舊的RS進行縮容操作;
4. 完成之後,刪掉舊的RS;
5. 通過Deployment狀態到etcd;
至此,我們知道了滾動升級在kubernetes中的原理。其實在傳統的負載均衡應用中,滾動升級的做法很類似,但是在容器環境中,我們有RS,通過這種方法更為便捷。