1. 程式人生 > >k8s排程器、預選策略及排程方式

k8s排程器、預選策略及排程方式

一、k8s排程流程

1、(預選)先排除完全不符合pod執行要求的節點
2、(優先)根據一系列演算法,算出node的得分,最高沒有相同的,就直接選擇
3、上一步有相同的話,就隨機選一個

 

二、排程方式

1、node(執行在那些node上)
2、pod選擇(當需要執行在某個pod在一個節點上(pod親和性),或不要pod和某個pod執行在一起(pod反親和性))
3、汙點 (pod是否能容忍汙點,能則能排程到該節點,不能容忍則無法排程到該節點,如果存在則驅離pod),可以定義容忍時間

 

三、常用的預選機制

排程器:
預選策略:(一部分)

CheckNodeCondition:#檢查節點是否正常(如ip,磁碟等)
GeneralPredicates
	HostName:#檢查Pod物件是否定義了pod.spec.hostname
	PodFitsHostPorts:#pod要能適配node的埠 pods.spec.containers.ports.hostPort(指定繫結在節點的埠上)
	MatchNodeSelector:#檢查節點的NodeSelector的標籤  pods.spec.nodeSelector
	PodFitsResources:#檢查Pod的資源需求是否能被節點所滿足
NoDiskConflict: #檢查Pod依賴的儲存卷是否能滿足需求(預設未使用)
PodToleratesNodeTaints:#檢查Pod上的spec.tolerations可容忍的汙點是否完全包含節點上的汙點;
PodToleratesNodeNoExecuteTaints:#不能執行(NoExecute)的汙點(預設未使用)
CheckNodeLabelPresence:#檢查指定的標籤再上節點是否存在
CheckServiceAffinity:#將相同services相同的pod儘量放在一起(預設未使用)
MaxEBSVolumeCount: #檢查EBS(AWS儲存)儲存卷的最大數量
MaxGCEPDVolumeCount #GCE儲存最大數
MaxAzureDiskVolumeCount: #AzureDisk 儲存最大數
CheckVolumeBinding: #檢查節點上已繫結或未繫結的pvc
NoVolumeZoneConflict: #檢查儲存卷物件與pod是否存在衝突
CheckNodeMemoryPressure:#檢查節點記憶體是否存在壓力過大
CheckNodePIDPressure:  #檢查節點上的PID數量是否過大
CheckNodeDiskPressure: #檢查記憶體、磁碟IO是否過大
MatchInterPodAffinity:  #檢查節點是否能滿足pod的親和性或反親和性

  

四、常用的優選函式

LeastRequested:#空閒量越高得分越高
(cpu((capacity-sum(requested))*10/capacity)+memory((capacity-sum(requested))*10/capacity))/2
BalancedResourceAllocation:#CPU和記憶體資源被佔用率相近的勝出;
NodePreferAvoidPods:  #節點註解資訊“scheduler.alpha.kubernetes.io/preferAvoidPods”
TaintToleration:#將Pod物件的spec.tolerations列表項與節點的taints列表項進行匹配度檢查,匹配條目越,得分越低;

SeletorSpreading:#標籤選擇器分散度,(與當前pod物件通選的標籤,所選其它pod越多的得分越低)
InterPodAffinity:#遍歷pod物件的親和性匹配專案,專案越多得分越高
NodeAffinity: #節點親和性 、
MostRequested: #空閒量越小得分越高,和LeastRequested相反 (預設未啟用)
NodeLabel:    #節點是否存在對應的標籤 (預設未啟用)
ImageLocality:#根據滿足當前Pod物件需求的已有映象的體積大小之和(預設未啟用)

  

五、高階排程設定方式

1、nodeSelector選擇器

#檢視標籤
[[email protected] ~]# kubectl get  nodes --show-labels
NAME      STATUS    ROLES     AGE       VERSION   LABELS
k8s-m     Ready     master    120d      v1.11.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=k8s-m,node-role.kubernetes.io/master=
node1     Ready     <none>    120d      v1.11.2   app=myapp,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,disktype=ssd,kubernetes.io/hostname=node1,test_node=k8s-node1

#使用nodeSelector選擇器,選擇disk=ssd的node


#檢視
[
[email protected]
schedule]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE nginx-pod 1/1 Running 0 49s 10.244.1.92 node1 <none> [[email protected] schedule]# cat my-pod.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: my-pod spec: containers: - name: my-pod image: nginx ports: - name: http containerPort: 80 nodeSelector: disk: ssd #如果nodeSelector中指定的標籤節點都沒有,該pod就會處於Pending狀態(預選失敗)

  

2、affinity

2.1、nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (軟親和,選擇條件匹配多的,就算都不滿足條件,還是會生成pod)

#使用
[[email protected] schedule]# cat  my-affinity-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: test_node1 #標籤鍵名
            operator: In #In表示在
            values:
            - k8s-node1 #test_node1標籤的值
            - test1     #test_node1標籤的值
        weight: 60 #匹配相應nodeSelectorTerm相關聯的權重,1-100

##檢視(不存在這個標籤,但是還是建立bin運行了)
[[email protected] schedule]# kubectl  get pod  
NAME                     READY     STATUS              RESTARTS   AGE
affinity-pod             1/1       Running             0          16s

  

2.2、requiredDuringSchedulingIgnoredDuringExecution (硬親和,類似nodeSelector,硬性需求,如果不滿足條件不會排程pod,都不滿足則Pending)

[[email protected] schedule]# cat my-affinity-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: test_node1 #標籤鍵名
            operator: In #In表示在
            values:
            - k8s-node1 #test_node1標籤的值
            - test1     #test_node1標籤的值

			
#檢視(沒有test_node1這個標籤,所以會Pending)
[[email protected] schedule]# kubectl  get pod 
NAME                     READY     STATUS              RESTARTS   AGE
affinity-pod             0/1       Pending             0          4s

  

六、pod的親和與反親和性

1、podAffinity:(讓pod和某個pod處於同一地方(同一地方不一定指同一node節點,根據個人使用的標籤定義))

#使用(讓affinity-pod和my-pod1處於同一處)
[[email protected] schedule]# cat my-affinity-pod2.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: my-pod1
  labels: 
    app1: my-pod1
     
spec:
  containers:
  - name: my-pod1
    image: nginx
    ports:
    - name: http
      containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app1 #標籤鍵名,上面pod定義
            operator: In #In表示在
            values:
            - my-pod1 #app1標籤的值
        topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod處於同一位置     #此pod應位於同一位置(親和力)或不位於同一位置(反親和力),與pods匹配指定名稱空間中的labelSelector,其中co-located定義為在標籤值為的節點上執行,key topologyKey匹配任何選定pod的任何節點在跑
#檢視
[[email protected] schedule]# kubectl  get pod   -o wide
NAME                     READY     STATUS              RESTARTS   AGE       IP            NODE      NOMINATED NODE
affinity-pod             1/1       Running             0          54s       10.244.1.98   node1     <none>
my-pod1                  1/1       Running             0          54s       10.244.1.97   node1     <none>

  

2、podAntiAffinity(讓pod和某個pod不處於同一node,和上面相反)

[[email protected] schedule]# cat  my-affinity-pod3.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: my-pod1
  labels: 
    app1: my-pod1
     
spec:
  containers:
  - name: my-pod1
    image: nginx
    ports:
    - name: http
      containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    podAntiAffinity:  #就改了這裡
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app1 #標籤鍵名,上面pod定義
            operator: In #In表示在
            values:
            - my-pod1 #app1標籤的值
        topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod不處於同一位置   

#檢視(我自有一臺node,所有是Pending狀態)
[[email protected] schedule]# kubectl  get pod 
NAME                     READY     STATUS              RESTARTS   AGE
affinity-pod             0/1       Pending             0          1m
my-pod1                  1/1       Running             0          1m

  

七、汙點排程

taint的effect定義對Pod排斥效果:
NoSchedule:#僅影響排程過程,對現存的Pod物件不產生影響;
NoExecute:#既影響排程過程,也影響現在的Pod物件;不容忍的Pod物件將被驅逐;
PreferNoSchedule: #當沒合適地方執行pod了,也會找地方執行pod

 

1、檢視並管理汙點

#檢視node汙點(Taints)
[[email protected] schedule]# kubectl  describe  node  k8s-m |grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule

[[email protected] schedule]# kubectl  describe  node  node1 |grep Taints
Taints:             <none>

#管理汙點taint
kubectl  taint node  -h

#打汙點(給node打標籤)
kubectl  taint    node node1 node-type=PreferNoSchedule:NoSchedule 
#檢視
[[email protected] schedule]# kubectl  describe  node  node1 |grep Taints
Taints:             node-type=PreferNoSchedule:NoSchedule

  

2、使用汙點

#建立pod
[[email protected] ~]# cat  mypod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: my-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80

#檢視pod(Pinding了)
[[email protected] ~]# kubectl  get pod 
NAME                     READY     STATUS              RESTARTS   AGE
nginx-pod                0/1       Pending             0          32s

#不能容忍汙點
[[email protected] ~]# kubectl  describe pod nginx-pod|tail  -1
  Warning  FailedScheduling  3s (x22 over 1m)  default-scheduler  0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.


###使用
[[email protected] ~]# cat mypod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: my-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  tolerations: #容忍的汙點
  - key: "node-type" #之前定義的汙點名
    operator: "Equal" #Exists,如果node-type汙點在,就能容忍,Equal精確
    value: "PreferNoSchedule" #汙點值
    effect: "NoSchedule" #效果
    #tolerationSeconds: 3600  #如果被驅逐的話,容忍時間,只能是effect為tolerationSeconds或NoExecute定義

	
#檢視(已經排程了)
[[email protected] ~]# kubectl  get pod  -o wide
NAME                     READY     STATUS              RESTARTS   AGE       IP             NODE      NOMINATED NODE
nginx-pod                1/1       Running             0          3m        10.244.1.100   node1     <none>