1. 程式人生 > 其它 >kube-scheduler排程器及排程框架原始碼學習篇0

kube-scheduler排程器及排程框架原始碼學習篇0

目錄

排程器流程

本文及後續系列記錄中均參考kubernetes程式碼版本1.21,對應倉庫分支為release-1.21

  1. kube-scheduler watch etcd,獲取podSpec中nodeName為空的pod
  2. pod進入scheduler的相應佇列,最終經過排程器流程,會被安排到合適的節點,即通過apiserver寫入podSpec的nodeName;也可能排程失敗,重回相應的佇列
  3. kubelet監聽到屬於自己所在節點的pod,啟動後續的容器相關操作

排程框架流程

具體到pod進入排程器內部的流程,主要由排程框架完成一系列類似於流水線的操作,詳情見參考的部落格,不再重複描述

scheduler的本地啟動

入口位於cmd/kube-scheduler/scheduler.go

首次在開發環境本地啟動時,需要先配置kubeconfig,以連線apiserver

在啟動引數中增加--kubeconfig=$path-to-kubeconfig後,即可成功在本地啟動scheduler

匯出預設配置

為了觀察scheduler的預設配置(kubescheduler.config.k8s.io/v1beta1組裡的KubeSchedulerConfiguration物件),可以繼續增加啟動引數--write-config-to $path-to-default-config,即可匯出預設啟動的配置檔案。

本次匯出的預設配置檔案如下:

apiVersion: kubescheduler.config.k8s.io/v1beta1
clientConnection:
  acceptContentTypes: ""
  burst: 100
  contentType: application/vnd.kubernetes.protobuf
  # kubeconfig路徑
  kubeconfig: /Users/xxxxx/.kube/config
  qps: 50
enableContentionProfiling: true
enableProfiling: true
healthzBindAddress: 0.0.0.0:10251
kind: KubeSchedulerConfiguration
# 選舉相關
leaderElection:
  leaderElect: true
  leaseDuration: 15s
  renewDeadline: 10s
  resourceLock: leases
  resourceName: kube-scheduler
  resourceNamespace: kube-system
  retryPeriod: 2s
metricsBindAddress: 0.0.0.0:10251
parallelism: 16
percentageOfNodesToScore: 0
podInitialBackoffSeconds: 1
podMaxBackoffSeconds: 10
# 有且只能有一個profile生效
profiles:
  # 傳給排程框架外掛的引數
  - pluginConfig:
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1beta1
          kind: DefaultPreemptionArgs
          minCandidateNodesAbsolute: 100
          minCandidateNodesPercentage: 10
        name: DefaultPreemption
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1beta1
          hardPodAffinityWeight: 1
          kind: InterPodAffinityArgs
        name: InterPodAffinity
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1beta1
          kind: NodeAffinityArgs
        name: NodeAffinity
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1beta1
          kind: NodeResourcesFitArgs
        name: NodeResourcesFit
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1beta1
          kind: NodeResourcesLeastAllocatedArgs
          resources:
            - name: cpu
              weight: 1
            - name: memory
              weight: 1
        name: NodeResourcesLeastAllocated
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1beta1
          defaultingType: System
          kind: PodTopologySpreadArgs
        name: PodTopologySpread
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1beta1
          bindTimeoutSeconds: 600
          kind: VolumeBindingArgs
        name: VolumeBinding
    # 排程框架啟用的外掛
    plugins:
      bind:
        enabled:
          - name: DefaultBinder
            weight: 0
      filter:
        enabled:
          - name: NodeUnschedulable
            weight: 0
          - name: NodeName
            weight: 0
          - name: TaintToleration
            weight: 0
          - name: NodeAffinity
            weight: 0
          - name: NodePorts
            weight: 0
          - name: NodeResourcesFit
            weight: 0
          - name: VolumeRestrictions
            weight: 0
          - name: EBSLimits
            weight: 0
          - name: GCEPDLimits
            weight: 0
          - name: NodeVolumeLimits
            weight: 0
          - name: AzureDiskLimits
            weight: 0
          - name: VolumeBinding
            weight: 0
          - name: VolumeZone
            weight: 0
          - name: PodTopologySpread
            weight: 0
          - name: InterPodAffinity
            weight: 0
      permit: {}
      postBind: {}
      postFilter:
        enabled:
          - name: DefaultPreemption
            weight: 0
      preBind:
        enabled:
          - name: VolumeBinding
            weight: 0
      preFilter:
        enabled:
          - name: NodeResourcesFit
            weight: 0
          - name: NodePorts
            weight: 0
          - name: PodTopologySpread
            weight: 0
          - name: InterPodAffinity
            weight: 0
          - name: VolumeBinding
            weight: 0
          - name: NodeAffinity
            weight: 0
      preScore:
        enabled:
          - name: InterPodAffinity
            weight: 0
          - name: PodTopologySpread
            weight: 0
          - name: TaintToleration
            weight: 0
          - name: NodeAffinity
            weight: 0
      queueSort:
        enabled:
          - name: PrioritySort
            weight: 0
      reserve:
        enabled:
          - name: VolumeBinding
            weight: 0
      score:
        enabled:
          - name: NodeResourcesBalancedAllocation
            weight: 1
          - name: ImageLocality
            weight: 1
          - name: InterPodAffinity
            weight: 1
          - name: NodeResourcesLeastAllocated
            weight: 1
          - name: NodeAffinity
            weight: 1
          - name: NodePreferAvoidPods
            weight: 10000
          - name: PodTopologySpread
            weight: 2
          - name: TaintToleration
            weight: 1
    # 排程器名稱
    schedulerName: default-scheduler

有了配置檔案模板後,就可以在該模板的基礎上按需修改,然後通過啟動引數--config=$path-to-config的方式啟動scheduler。需要注意的是,如果指定了--config,那麼kubeconfig的配置也將以該配置檔案中的clientConnection.kubeconfig欄位為準,命令列引數裡的--kubeconfig將不再生效。

在動手定製排程框架前,有必要了解配置檔案修改後,scheduler自帶的預設外掛的行為:

  • 如果某個擴充套件點沒有配置對應的擴充套件,排程框架將使用預設外掛中的擴充套件
  • 如果為某個擴充套件點配置且激活了擴充套件,則排程框架將先呼叫預設外掛的擴充套件,再呼叫配置中的擴充套件
  • 預設外掛的擴充套件始終被最先呼叫,然後按照 KubeSchedulerConfiguration 中擴充套件的啟用 enabled 順序逐個呼叫擴充套件點的擴充套件
  • 可以先禁用預設外掛的擴充套件,然後在 enabled 列表中的某個位置啟用預設外掛的擴充套件,這種做法可以改變預設外掛的擴充套件被呼叫時的順序

參考

自定義 Kubernetes 排程器-陽明的部落格|Kubernetes|Istio|Prometheus|Python|Golang|雲原生 (qikqiak.com)