k8s排程器、預選策略及排程方式
一、k8s排程流程
1、(預選)先排除完全不符合pod執行要求的節點
2、(優先)根據一系列演算法,算出node的得分,最高沒有相同的,就直接選擇
3、上一步有相同的話,就隨機選一個
二、排程方式
1、node(執行在那些node上)
2、pod選擇(當需要執行在某個pod在一個節點上(pod親和性),或不要pod和某個pod執行在一起(pod反親和性))
3、汙點 (pod是否能容忍汙點,能則能排程到該節點,不能容忍則無法排程到該節點,如果存在則驅離pod),可以定義容忍時間
三、常用的預選機制
排程器: 預選策略:(一部分) CheckNodeCondition:#檢查節點是否正常(如ip,磁碟等) GeneralPredicates HostName:#檢查Pod物件是否定義了pod.spec.hostname PodFitsHostPorts:#pod要能適配node的埠 pods.spec.containers.ports.hostPort(指定繫結在節點的埠上) MatchNodeSelector:#檢查節點的NodeSelector的標籤 pods.spec.nodeSelector PodFitsResources:#檢查Pod的資源需求是否能被節點所滿足 NoDiskConflict: #檢查Pod依賴的儲存卷是否能滿足需求(預設未使用) PodToleratesNodeTaints:#檢查Pod上的spec.tolerations可容忍的汙點是否完全包含節點上的汙點; PodToleratesNodeNoExecuteTaints:#不能執行(NoExecute)的汙點(預設未使用) CheckNodeLabelPresence:#檢查指定的標籤再上節點是否存在 CheckServiceAffinity:#將相同services相同的pod儘量放在一起(預設未使用) MaxEBSVolumeCount: #檢查EBS(AWS儲存)儲存卷的最大數量 MaxGCEPDVolumeCount #GCE儲存最大數 MaxAzureDiskVolumeCount: #AzureDisk 儲存最大數 CheckVolumeBinding: #檢查節點上已繫結或未繫結的pvc NoVolumeZoneConflict: #檢查儲存卷物件與pod是否存在衝突 CheckNodeMemoryPressure:#檢查節點記憶體是否存在壓力過大 CheckNodePIDPressure: #檢查節點上的PID數量是否過大 CheckNodeDiskPressure: #檢查記憶體、磁碟IO是否過大 MatchInterPodAffinity: #檢查節點是否能滿足pod的親和性或反親和性
四、常用的優選函式
LeastRequested:#空閒量越高得分越高 (cpu((capacity-sum(requested))*10/capacity)+memory((capacity-sum(requested))*10/capacity))/2 BalancedResourceAllocation:#CPU和記憶體資源被佔用率相近的勝出; NodePreferAvoidPods: #節點註解資訊“scheduler.alpha.kubernetes.io/preferAvoidPods” TaintToleration:#將Pod物件的spec.tolerations列表項與節點的taints列表項進行匹配度檢查,匹配條目越,得分越低; SeletorSpreading:#標籤選擇器分散度,(與當前pod物件通選的標籤,所選其它pod越多的得分越低) InterPodAffinity:#遍歷pod物件的親和性匹配專案,專案越多得分越高 NodeAffinity: #節點親和性 、 MostRequested: #空閒量越小得分越高,和LeastRequested相反 (預設未啟用) NodeLabel: #節點是否存在對應的標籤 (預設未啟用) ImageLocality:#根據滿足當前Pod物件需求的已有映象的體積大小之和(預設未啟用)
五、高階排程設定方式
1、nodeSelector選擇器
#檢視標籤 [[email protected] ~]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8s-m Ready master 120d v1.11.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=k8s-m,node-role.kubernetes.io/master= node1 Ready <none> 120d v1.11.2 app=myapp,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,disktype=ssd,kubernetes.io/hostname=node1,test_node=k8s-node1 #使用nodeSelector選擇器,選擇disk=ssd的node #檢視 [[email protected] schedule]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE nginx-pod 1/1 Running 0 49s 10.244.1.92 node1 <none> [[email protected] schedule]# cat my-pod.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: my-pod spec: containers: - name: my-pod image: nginx ports: - name: http containerPort: 80 nodeSelector: disk: ssd #如果nodeSelector中指定的標籤節點都沒有,該pod就會處於Pending狀態(預選失敗)
2、affinity
2.1、nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (軟親和,選擇條件匹配多的,就算都不滿足條件,還是會生成pod)
#使用 [[email protected] schedule]# cat my-affinity-pod.yaml apiVersion: v1 kind: Pod metadata: name: affinity-pod labels: app: my-pod spec: containers: - name: affinity-pod image: nginx ports: - name: http containerPort: 80 affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: test_node1 #標籤鍵名 operator: In #In表示在 values: - k8s-node1 #test_node1標籤的值 - test1 #test_node1標籤的值 weight: 60 #匹配相應nodeSelectorTerm相關聯的權重,1-100 ##檢視(不存在這個標籤,但是還是建立bin運行了) [[email protected] schedule]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-pod 1/1 Running 0 16s
2.2、requiredDuringSchedulingIgnoredDuringExecution (硬親和,類似nodeSelector,硬性需求,如果不滿足條件不會排程pod,都不滿足則Pending)
[[email protected] schedule]# cat my-affinity-pod.yaml apiVersion: v1 kind: Pod metadata: name: affinity-pod labels: app: my-pod spec: containers: - name: affinity-pod image: nginx ports: - name: http containerPort: 80 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: test_node1 #標籤鍵名 operator: In #In表示在 values: - k8s-node1 #test_node1標籤的值 - test1 #test_node1標籤的值 #檢視(沒有test_node1這個標籤,所以會Pending) [[email protected] schedule]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-pod 0/1 Pending 0 4s
六、pod的親和與反親和性
1、podAffinity:(讓pod和某個pod處於同一地方(同一地方不一定指同一node節點,根據個人使用的標籤定義))
#使用(讓affinity-pod和my-pod1處於同一處) [[email protected] schedule]# cat my-affinity-pod2.yaml apiVersion: v1 kind: Pod metadata: name: my-pod1 labels: app1: my-pod1 spec: containers: - name: my-pod1 image: nginx ports: - name: http containerPort: 80 --- apiVersion: v1 kind: Pod metadata: name: affinity-pod labels: app: my-pod spec: containers: - name: affinity-pod image: nginx ports: - name: http containerPort: 80 affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app1 #標籤鍵名,上面pod定義 operator: In #In表示在 values: - my-pod1 #app1標籤的值 topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod處於同一位置 #此pod應位於同一位置(親和力)或不位於同一位置(反親和力),與pods匹配指定名稱空間中的labelSelector,其中co-located定義為在標籤值為的節點上執行,key topologyKey匹配任何選定pod的任何節點在跑 #檢視 [[email protected] schedule]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE affinity-pod 1/1 Running 0 54s 10.244.1.98 node1 <none> my-pod1 1/1 Running 0 54s 10.244.1.97 node1 <none>
2、podAntiAffinity(讓pod和某個pod不處於同一node,和上面相反)
[[email protected] schedule]# cat my-affinity-pod3.yaml apiVersion: v1 kind: Pod metadata: name: my-pod1 labels: app1: my-pod1 spec: containers: - name: my-pod1 image: nginx ports: - name: http containerPort: 80 --- apiVersion: v1 kind: Pod metadata: name: affinity-pod labels: app: my-pod spec: containers: - name: affinity-pod image: nginx ports: - name: http containerPort: 80 affinity: podAntiAffinity: #就改了這裡 requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app1 #標籤鍵名,上面pod定義 operator: In #In表示在 values: - my-pod1 #app1標籤的值 topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod不處於同一位置 #檢視(我自有一臺node,所有是Pending狀態) [[email protected] schedule]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-pod 0/1 Pending 0 1m my-pod1 1/1 Running 0 1m
七、汙點排程
taint的effect定義對Pod排斥效果:
NoSchedule:#僅影響排程過程,對現存的Pod物件不產生影響;
NoExecute:#既影響排程過程,也影響現在的Pod物件;不容忍的Pod物件將被驅逐;
PreferNoSchedule: #當沒合適地方執行pod了,也會找地方執行pod
1、檢視並管理汙點
#檢視node汙點(Taints) [[email protected] schedule]# kubectl describe node k8s-m |grep Taints Taints: node-role.kubernetes.io/master:NoSchedule [[email protected] schedule]# kubectl describe node node1 |grep Taints Taints: <none> #管理汙點taint kubectl taint node -h #打汙點(給node打標籤) kubectl taint node node1 node-type=PreferNoSchedule:NoSchedule #檢視 [[email protected] schedule]# kubectl describe node node1 |grep Taints Taints: node-type=PreferNoSchedule:NoSchedule
2、使用汙點
#建立pod [[email protected] ~]# cat mypod.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: my-pod spec: containers: - name: my-pod image: nginx ports: - name: http containerPort: 80 #檢視pod(Pinding了) [[email protected] ~]# kubectl get pod NAME READY STATUS RESTARTS AGE nginx-pod 0/1 Pending 0 32s #不能容忍汙點 [[email protected] ~]# kubectl describe pod nginx-pod|tail -1 Warning FailedScheduling 3s (x22 over 1m) default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate. ###使用 [[email protected] ~]# cat mypod.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: my-pod spec: containers: - name: my-pod image: nginx ports: - name: http containerPort: 80 tolerations: #容忍的汙點 - key: "node-type" #之前定義的汙點名 operator: "Equal" #Exists,如果node-type汙點在,就能容忍,Equal精確 value: "PreferNoSchedule" #汙點值 effect: "NoSchedule" #效果 #tolerationSeconds: 3600 #如果被驅逐的話,容忍時間,只能是effect為tolerationSeconds或NoExecute定義 #檢視(已經排程了) [[email protected] ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE nginx-pod 1/1 Running 0 3m 10.244.1.100 node1 <none>