1. 程式人生 > 其它 >Kubernetes資源排程之節點親和

Kubernetes資源排程之節點親和

Kubernetes資源排程之節點親和

Pod節點選擇器

nodeSelector指定的標籤選擇器過濾符合條件的節點作為可用目標節點,最終選擇則基於打分機制完成。因此,後者也稱為節點選擇器。使用者事先為特定部分的Node資源物件設定好標籤,而後即可配置Pod通過節點選擇器實現類似於節點的強制親和排程。

可以通過下面的命令檢視每個node上的標籤

[root@k8s-01 ~]# kubectl get nodes --show-labels
NAME     STATUS   ROLES                  AGE   VERSION   LABELS
k8s-01   Ready    control-plane,master   26d   v1.22.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-01,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8s-02   Ready    <none>                 26d   v1.22.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-02,kubernetes.io/os=linux,type=kong,ype=kong
k8s-03   Ready    <none>                 26d   v1.22.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-03,kubernetes.io/os=linux,type=kong
k8s-04   Ready    <none>                 8d    v1.22.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-04,kubernetes.io/os=linux
[root@k8s-01 ~]#

給node4節點打上標籤

[root@k8s-01 ~]# kubectl label nodes k8s-04 type=test
node/k8s-04 labeled

測試的yaml

echo "
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: nginx-test
  name: nginx-test
spec:
  containers:
  - image: nginx
    name: nginx-test
  nodeSelector:
    type: test
" | kubectl apply -f -    

檢視pod的分配情況

[root@k8s-01 ~]# kubectl get pods  -o wide |grep nginx-test
nginx-test                                1/1     Running   0               58s     10.244.7.81      k8s-04   <none>           <none>
[root@k8s-01 ~]#

然後我們可以通過 describe 命令檢視排程結果:

事實上,多數情況下使用者都無須關心Pod物件的具體執行位置,除非Pod依賴的特殊條件僅能由部分節點滿足時,例如GPU和SSD等。即便如此,也應該儘量避免使用.spec.nodeName靜態指定Pod物件的執行位置,而是應該讓排程器基於標籤和標籤選擇器為Pod挑選匹配的工作節點。另外,Pod規範中的.spec.nodeSelector僅支援簡單等值關係的節點選擇器,而.spec.affinity.nodeAffinity支援更靈活的節點選擇器表示式,而且可以實現硬親和與軟親和邏輯。

節點親和排程

節點親和性(nodeAffinity)主要是用來控制 Pod 要部署在哪些節點上,以及不能部署在哪些節點上的,它可以進行一些簡單的邏輯組合了,不只是簡單的相等匹配。在Pod上定義節點親和規則時有兩種型別的節點親和關係:強制(required)親和和首選(preferred)親和,或分別稱為硬親和與軟親和。

  • 強制親和限定了排程Pod資源時必須要滿足的規則,無可用節點時Pod物件會被置為Pending狀態,直到滿足規則的節點出現。
  • 柔性親和規則實現的是一種柔性排程限制,它同樣傾向於將Pod執行在某類特定的節點之上,但無法滿足排程需求時,排程器將選擇一個無法匹配規則的節點,而非將Pod置於Pending狀態。

在Pod規範上定義節點親和規則的關鍵點有兩個:

  • 一是給節點規劃並配置合乎期望的標籤;
  • 二是為Pod物件定義合理的標籤選擇器。

正如preferredDuringSchedulingIgnoredDuringExecution和requiredDuringSchedulingIgnoredDuringExecution欄位名字中的後半段符串IgnoredDuringExecution隱含的意義所指,在Pod資源基於節點親和規則排程至某節點之後,因節點標籤發生了改變而變得不再符合Pod定義的親和規則時,排程器也不會將Pod從此節點上移出,因而親和排程僅在排程執行的過程中進行一次即時的判斷,而非持續地監視親和規則是否能夠得以滿足。

檢視官方說明:

[root@k8s-01 ~]# kubectl explain pod.spec.affinity.nodeAffinity
KIND:     Pod
VERSION:  v1

RESOURCE: nodeAffinity <Object>

DESCRIPTION:
     Describes node affinity scheduling rules for the pod.

     Node affinity is a group of node affinity scheduling rules.

FIELDS:
   preferredDuringSchedulingIgnoredDuringExecution      <[]Object>
     The scheduler will prefer to schedule pods to nodes that satisfy the
     affinity expressions specified by this field, but it may choose a node that
     violates one or more of the expressions. The node that is most preferred is
     the one with the greatest sum of weights, i.e. for each node that meets all
     of the scheduling requirements (resource request, requiredDuringScheduling
     affinity expressions, etc.), compute a sum by iterating through the
     elements of this field and adding "weight" to the sum if the node matches
     the corresponding matchExpressions; the node(s) with the highest sum are
     the most preferred.

   requiredDuringSchedulingIgnoredDuringExecution       <Object>
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to an update), the system may or may not try
     to eventually evict the pod from its node.

[root@k8s-01 ~]#

強制親和

檢視官方說明:

[root@k8s-01 ~]# kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
KIND:     Pod
VERSION:  v1

RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <Object>

DESCRIPTION:
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to an update), the system may or may not try
     to eventually evict the pod from its node.

     A node selector represents the union of the results of one or more label
     queries over a set of nodes; that is, it represents the OR of the selectors
     represented by the node selector terms.

FIELDS:
   nodeSelectorTerms    <[]Object> -required-
     Required. A list of node selector terms. The terms are ORed.

[root@k8s-01 ~]# kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms
KIND:     Pod
VERSION:  v1

RESOURCE: nodeSelectorTerms <[]Object>

DESCRIPTION:
     Required. A list of node selector terms. The terms are ORed.

     A null or empty node selector term matches no objects. The requirements of
     them are ANDed. The TopologySelectorTerm type implements a subset of the
     NodeSelectorTerm.

FIELDS:
   matchExpressions     <[]Object>
     A list of node selector requirements by node's labels.

   matchFields  <[]Object>
     A list of node selector requirements by node's fields.

[root@k8s-01 ~]#

  • matchExpressions:標籤選擇器表示式,基於節點標籤進行過濾;可重複使用以表達不同的匹配條件,各條件間為“或”關係。▪
  • matchFields:以欄位選擇器表達的節點選擇器;可重複使用以表達不同的匹配條件,各條件間為“或”關係。

注意:每個匹配條件可由一到多個匹配規則組成,例如某個matchExpressions條件下可同時存在兩個表示式規則,如下面的示例所示,同一條件下的各條規則彼此間為“邏輯與”關係。這意味著某節點滿足nodeSelectorTerms中的任意一個條件即可,但滿足某個條件指的是可完全匹配該條件下定義的所有規則。

下面開始測試,給node3打上標籤

[root@k8s-01 ~]# kubectl label nodes k8s-03 disktype=ssd
node/k8s-03 labeled

測試yaml

apiVersion: v1
kind: Pod
metadata:
  name: "busy-affinity"
  labels:
    app: "busy-affinity"
spec:
  containers:
  - name: busy-affinity
    image: "busybox"
    command: ["/bin/sh","-c","sleep 600"]
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution: ## 硬標準,後面是排程期間有效,忽略執行期間
        nodeSelectorTerms:
        - matchExpressions:
            - key: disktype
              values: ["ssd","hdd"]
              operator: In               

檢視pod

[root@k8s-01 ~]# kubectl get pods -o wide |grep busy-affinity
busy-affinity                             1/1     Running   0               76s     10.244.165.209   k8s-03   <none>           <none>
[root@k8s-01 ~]#

從結果可以看出 Pod 被部署到了 node3 節點上,因為我們的硬策略就是部署到含有"ssd","hdd"標籤的node節點上,所以會盡量滿足。這裡的匹配邏輯是 label 標籤的值在某個列表中,現在 Kubernetes 提供的操作符有下面的幾種:

  • In:label 的值在某個列表中
  • NotIn:label 的值不在某個列表中
  • Gt:label 的值大於某個值
  • Lt:label 的值小於某個值
  • Exists:某個 label 存在
  • DoesNotExist:某個 label 不存在

下面給node3和node4加上所有的標籤,用於測試

[root@k8s-01 ~]#  kubectl label nodes k8s-04 disktype=hdd
node/k8s-04 labeled
[root@k8s-01 ~]# kubectl label nodes k8s-04 disk=60
node/k8s-04 labeled
[root@k8s-01 ~]# kubectl label nodes k8s-04 gpu=3080
node/k8s-04 labeled
[root@k8s-01 ~]# kubectl label nodes k8s-03 gpu=3090
node/k8s-03 labeled
[root@k8s-01 ~]# kubectl label nodes k8s-03 disk=30
node/k8s-03 labeled
[root@k8s-01 ~]#

柔性親和

節點首選親和機制為節點選擇機制提供了一種柔性控制邏輯,被排程的Pod物件不再是“必須”,而是“應該”放置到某些特定節點之上,但條件不滿足時,該Pod也能夠接受被編排到其他不符合條件的節點之上。另外,多個柔性親和條件並存時,它還支援為每個條件定義weight屬性以區別它們優先順序,取值範圍是1~100,數字越大優先順序越高。

修改後的yaml,加了柔性親和

apiVersion: v1
kind: Pod
metadata:
  name: "busy-affinity"
  labels:
    app: "busy-affinity"
spec:
  containers:
  - name: busy-affinity
    image: "busybox"
    command: ["/bin/sh","-c","sleep 600"]
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution: ## 硬標準,後面是排程期間有效,忽略執行期間
        nodeSelectorTerms:
        - matchExpressions:
            - key: disktype
              values: ["ssd","hdd"]
              operator: In
      preferredDuringSchedulingIgnoredDuringExecution:  ## 軟性分
      - preference: ## 指定我們喜歡的條件 
          matchExpressions:
          - key: disk
            values: ["40"]
            operator: Gt   ## node3的disk不滿足這個條件,node4滿足
        weight: 70 ## 權重 0-100
      - preference: ## 指定我們喜歡的條件
          matchExpressions:
          - key: gpu
            values: ["3080"]
            operator: Gt  ## node4的gpu不滿足這個條件,node3滿足
        weight: 30 ## 權重 0-100                             

因為在硬性條件中,node3和node4都滿足硬性條件,在檢視一下軟性條件,其中node3滿足第二個,node4滿足第一個,但是第一個的權重高一些,所以最終選擇了node4,檢視執行的pods

[root@k8s-01 ~]# kubectl get pods -o wide
NAME                                      READY   STATUS    RESTARTS        AGE     IP               NODE     NOMINATED NODE   READINESS GATES
busy-affinity                             1/1     Running   0               34s     10.244.7.82      k8s-04   <none>           <none>
counter                                   1/1     Running   0               8d      10.244.165.198   k8s-03   <none>           <none>
nfs-client-provisioner-69b76b8dc6-ms4xg   1/1     Running   1 (14d ago)     26d     10.244.179.21    k8s-02   <none>           <none>
nginx-5759cb8dcc-t4sdn                    1/1     Running   0               7d6h    10.244.179.50    k8s-02   <none>           <none>
nginx-liveness                            1/1     Running   2 (5d17h ago)   5d17h   10.244.61.218    k8s-01   <none>           <none>
nginx-nginx                               1/1     Running   0               5d17h   10.244.179.3     k8s-02   <none>           <none>
nginx-readiness                           1/1     Running   0               5d17h   10.244.61.219    k8s-01   <none>           <none>
nginx-test                                1/1     Running   0               45m     10.244.7.81      k8s-04   <none>           <none>
post-test                                 1/1     Running   0               5d18h   10.244.179.62    k8s-02   <none>           <none>