kube-scheduler排程器及排程框架原始碼學習篇0
阿新 • • 發佈:2021-07-02
目錄
排程器流程
本文及後續系列記錄中均參考kubernetes程式碼版本1.21,對應倉庫分支為release-1.21
- kube-scheduler watch etcd,獲取podSpec中nodeName為空的pod
- pod進入scheduler的相應佇列,最終經過排程器流程,會被安排到合適的節點,即通過apiserver寫入podSpec的nodeName;也可能排程失敗,重回相應的佇列
- kubelet監聽到屬於自己所在節點的pod,啟動後續的容器相關操作
排程框架流程
具體到pod進入排程器內部的流程,主要由排程框架完成一系列類似於流水線的操作,詳情見參考的部落格,不再重複描述
scheduler的本地啟動
入口位於cmd/kube-scheduler/scheduler.go
首次在開發環境本地啟動時,需要先配置kubeconfig,以連線apiserver
在啟動引數中增加--kubeconfig=$path-to-kubeconfig
後,即可成功在本地啟動scheduler
匯出預設配置
為了觀察scheduler的預設配置(kubescheduler.config.k8s.io/v1beta1組裡的KubeSchedulerConfiguration物件),可以繼續增加啟動引數--write-config-to $path-to-default-config
,即可匯出預設啟動的配置檔案。
本次匯出的預設配置檔案如下:
apiVersion: kubescheduler.config.k8s.io/v1beta1 clientConnection: acceptContentTypes: "" burst: 100 contentType: application/vnd.kubernetes.protobuf # kubeconfig路徑 kubeconfig: /Users/xxxxx/.kube/config qps: 50 enableContentionProfiling: true enableProfiling: true healthzBindAddress: 0.0.0.0:10251 kind: KubeSchedulerConfiguration # 選舉相關 leaderElection: leaderElect: true leaseDuration: 15s renewDeadline: 10s resourceLock: leases resourceName: kube-scheduler resourceNamespace: kube-system retryPeriod: 2s metricsBindAddress: 0.0.0.0:10251 parallelism: 16 percentageOfNodesToScore: 0 podInitialBackoffSeconds: 1 podMaxBackoffSeconds: 10 # 有且只能有一個profile生效 profiles: # 傳給排程框架外掛的引數 - pluginConfig: - args: apiVersion: kubescheduler.config.k8s.io/v1beta1 kind: DefaultPreemptionArgs minCandidateNodesAbsolute: 100 minCandidateNodesPercentage: 10 name: DefaultPreemption - args: apiVersion: kubescheduler.config.k8s.io/v1beta1 hardPodAffinityWeight: 1 kind: InterPodAffinityArgs name: InterPodAffinity - args: apiVersion: kubescheduler.config.k8s.io/v1beta1 kind: NodeAffinityArgs name: NodeAffinity - args: apiVersion: kubescheduler.config.k8s.io/v1beta1 kind: NodeResourcesFitArgs name: NodeResourcesFit - args: apiVersion: kubescheduler.config.k8s.io/v1beta1 kind: NodeResourcesLeastAllocatedArgs resources: - name: cpu weight: 1 - name: memory weight: 1 name: NodeResourcesLeastAllocated - args: apiVersion: kubescheduler.config.k8s.io/v1beta1 defaultingType: System kind: PodTopologySpreadArgs name: PodTopologySpread - args: apiVersion: kubescheduler.config.k8s.io/v1beta1 bindTimeoutSeconds: 600 kind: VolumeBindingArgs name: VolumeBinding # 排程框架啟用的外掛 plugins: bind: enabled: - name: DefaultBinder weight: 0 filter: enabled: - name: NodeUnschedulable weight: 0 - name: NodeName weight: 0 - name: TaintToleration weight: 0 - name: NodeAffinity weight: 0 - name: NodePorts weight: 0 - name: NodeResourcesFit weight: 0 - name: VolumeRestrictions weight: 0 - name: EBSLimits weight: 0 - name: GCEPDLimits weight: 0 - name: NodeVolumeLimits weight: 0 - name: AzureDiskLimits weight: 0 - name: VolumeBinding weight: 0 - name: VolumeZone weight: 0 - name: PodTopologySpread weight: 0 - name: InterPodAffinity weight: 0 permit: {} postBind: {} postFilter: enabled: - name: DefaultPreemption weight: 0 preBind: enabled: - name: VolumeBinding weight: 0 preFilter: enabled: - name: NodeResourcesFit weight: 0 - name: NodePorts weight: 0 - name: PodTopologySpread weight: 0 - name: InterPodAffinity weight: 0 - name: VolumeBinding weight: 0 - name: NodeAffinity weight: 0 preScore: enabled: - name: InterPodAffinity weight: 0 - name: PodTopologySpread weight: 0 - name: TaintToleration weight: 0 - name: NodeAffinity weight: 0 queueSort: enabled: - name: PrioritySort weight: 0 reserve: enabled: - name: VolumeBinding weight: 0 score: enabled: - name: NodeResourcesBalancedAllocation weight: 1 - name: ImageLocality weight: 1 - name: InterPodAffinity weight: 1 - name: NodeResourcesLeastAllocated weight: 1 - name: NodeAffinity weight: 1 - name: NodePreferAvoidPods weight: 10000 - name: PodTopologySpread weight: 2 - name: TaintToleration weight: 1 # 排程器名稱 schedulerName: default-scheduler
有了配置檔案模板後,就可以在該模板的基礎上按需修改,然後通過啟動引數--config=$path-to-config
的方式啟動scheduler。需要注意的是,如果指定了--config
,那麼kubeconfig的配置也將以該配置檔案中的clientConnection.kubeconfig
欄位為準,命令列引數裡的--kubeconfig
將不再生效。
在動手定製排程框架前,有必要了解配置檔案修改後,scheduler自帶的預設外掛的行為:
- 如果某個擴充套件點沒有配置對應的擴充套件,排程框架將使用預設外掛中的擴充套件
- 如果為某個擴充套件點配置且激活了擴充套件,則排程框架將先呼叫預設外掛的擴充套件,再呼叫配置中的擴充套件
- 預設外掛的擴充套件始終被最先呼叫,然後按照
KubeSchedulerConfiguration
中擴充套件的啟用enabled
順序逐個呼叫擴充套件點的擴充套件- 可以先禁用預設外掛的擴充套件,然後在
enabled
列表中的某個位置啟用預設外掛的擴充套件,這種做法可以改變預設外掛的擴充套件被呼叫時的順序
參考
自定義 Kubernetes 排程器-陽明的部落格|Kubernetes|Istio|Prometheus|Python|Golang|雲原生 (qikqiak.com)