1. 程式人生 > 實用技巧 >Pod健康檢查介紹

Pod健康檢查介紹

Pod健康檢查介紹

預設情況下,kubelet根據容器執行狀態作為健康依據,不能監控容器中應用程式狀態,例如程式假死。這就會導致無法提供服務,丟失流量。因此引入健康檢查機制確保容器健康存活。
Pod通過兩類探針來檢查容器的健康狀態。分別是LivenessProbe(存活探測)和 ReadinessProbe(就緒探測)。

livenessProbe(存活探測)

存活探測將通過http、shell命令或者tcp等方式去檢測容器中的應用是否健康,然後將檢查結果返回給kubelet,如果檢查容器中應用為不健康狀態提交給kubelet後,kubelet將根據Pod配置清單中定義的重啟策略restartPolicy

來對Pod進行重啟。

readinessProbe(就緒探測)

就緒探測也是通過http、shell命令或者tcp等方式去檢測容器中的應用是否健康或則是否能夠正常對外提供服務,如果能夠正常對外提供服務,則認為該容器為(Ready狀態),達到(Ready狀態)的Pod才可以接收請求。

對於被Service所管理的Pod,Service與被管理Pod的關聯關係也將基於Pod是否Ready進行設定,Pod物件啟動後,容器應用通常需要一段時間才能完成其初始化的過程,例如載入配置或資料,甚至有些程式需要執行某類的預熱過程,若在此階段完成之前就接收客戶端的請求,那麼客戶端返回時間肯定非常慢,嚴重影響了體驗,所以因為避免Pod物件啟動後立即讓其處理客戶端請求,而是等待容器初始化工作執行完成並轉為Ready狀態後再接收客戶端請求。

如果容器或則Pod狀態為(NoReady)狀態,Kubernetes則會把該Pod從Service的後端endpoints Pod中去剔除。

健康檢測實現方式

以上介紹了兩種探測型別livenessProbe(存活探測),readinessProbe(就緒探測),這兩種探測都支援以下方式對容器進行健康檢查

  1. ExecAction:在容器中執行命令,命令執行後返回的狀態為0則成功,表示我們探測結果正常
  2. HTTPGetAction:根據容器IP、埠以及路徑傳送HTTP請求,返回碼如果是200-400之間表示成功
  3. TCPSocketAction:根據容器IP地址及特定的埠進行TCP檢查,埠開放表示成功

以上每種檢查動作都可能有以下三種返回狀態

  1. Success,表示通過了健康檢查
  2. Failure,表示沒有通過健康檢查
  3. Unknown,表示檢查動作失敗

livenessProbe存活探測示例

livenessProbe for ExecActiion 示例

通過在目標容器中執行由使用者自定義的命令來判定容器的健康狀態,即在容器內部執行一個命令,如果改命令的返回碼為0,則表明容器健康。spec.containers.LivenessProbe欄位用於定義此類檢測,它只有一個可用屬性command,用於指定要執行的命令,下面是在資源清單檔案中使用liveness-exec方式的示例:

1.建立資源配置清單

建立一個Pod——》執行Nginx容器——》首先啟動nginx——》然後沉睡60秒後——〉刪除nginx.pid
通過livenessProbe存活探測的exec命令判斷nginx.pid檔案是否存在,如果探測返回結果非0,則按照重啟策略進行重啟。
預期是容器真正(Ready)狀態60s後,刪除nginx.pid,exec命令探測生效,按照重啟策略進行重啟

cat ngx-health.yaml
apiVersion: v1
kind: Pod
metadata:
  name: ngx-health
spec:
  containers:
  - name: ngx-liveness
    image: nginx:latest
    command:
    - /bin/sh
    - -c
    - /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
    livenessProbe:
      exec:
        command: [ "/bin/sh", "-c", "test", "-e", "/run/nginx.pid" ]
  restartPolicy: Always

2.建立Pod資源

kubectl apply -f ngx-health.yaml

等待Pod Ready

3.檢視Pod的詳細資訊

#第一次檢視,Pod中的容器啟動成功,事件正常
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
  Type    Reason     Age        From                 Message
  ----    ------     ----       ----                 -------
  Normal  Scheduled  <unknown>  default-scheduler    Successfully assigned default/ngx-health to k8s-node03
  Normal  Pulling    12s        kubelet, k8s-node03  Pulling image "nginx:latest"
  Normal  Pulled     6s         kubelet, k8s-node03  Successfully pulled image "nginx:latest"
  Normal  Created    6s         kubelet, k8s-node03  Created container ngx-liveness
  Normal  Started    5s         kubelet, k8s-node03  Started container ngx-liveness
  
#第二次檢視,容器的livenessProbe探測失敗,
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
  Type     Reason     Age                From                 Message
  ----     ------     ----               ----                 -------
  Normal   Scheduled  <unknown>          default-scheduler    Successfully assigned default/ngx-health to k8s-node03
  Normal   Pulling    52s                kubelet, k8s-node03  Pulling image "nginx:latest"
  Normal   Pulled     46s                kubelet, k8s-node03  Successfully pulled image "nginx:latest"
  Normal   Created    46s                kubelet, k8s-node03  Created container ngx-liveness
  Normal   Started    45s                kubelet, k8s-node03  Started container ngx-liveness
  Warning  Unhealthy  20s (x3 over 40s)  kubelet, k8s-node03  Liveness probe failed:
  Normal   Killing    20s                kubelet, k8s-node03  Container ngx-liveness failed liveness probe, will be restarted
  
#第三次檢視,已經重新拉取映象,然後建立容器再啟動容器
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
  Type     Reason     Age                From                 Message
  ----     ------     ----               ----                 -------
  Normal   Scheduled  <unknown>          default-scheduler    Successfully assigned default/ngx-health to k8s-node03
  Warning  Unhealthy  35s (x3 over 55s)  kubelet, k8s-node03  Liveness probe failed:
  Normal   Killing    35s                kubelet, k8s-node03  Container ngx-liveness failed liveness probe, will be restarted
  Normal   Pulling    4s (x2 over 67s)   kubelet, k8s-node03  Pulling image "nginx:latest"
  Normal   Pulled     2s (x2 over 61s)   kubelet, k8s-node03  Successfully pulled image "nginx:latest"
  Normal   Created    2s (x2 over 61s)   kubelet, k8s-node03  Created container ngx-liveness
  Normal   Started    2s (x2 over 60s)   kubelet, k8s-node03  Started container ngx-liveness

通過長格式輸出可以看到如下,第一次長格式輸出Pod執行時間22s,重啟次數為0
第二次長格式輸出,執行時間是76s,Pod已經完成一次重啟

kubectl get pods -o wide | grep ngx-health
ngx-health                          1/1     Running            0          22s     10.244.5.44   k8s-node03   <none>           <none>

kubectl get pods -o wide | grep ngx-health
ngx-health                          1/1     Running            1          76s     10.244.5.44   k8s-node03   <none>           <none>

第二次健康探測失敗及第二次重啟

kubectl describe pods/ngx-health | grep -A 10 Events
Events:
  Type     Reason     Age                 From                 Message
  ----     ------     ----                ----                 -------
  Normal   Scheduled  <unknown>           default-scheduler    Successfully assigned default/ngx-health to k8s-node03
  Normal   Pulled     58s (x2 over 117s)  kubelet, k8s-node03  Successfully pulled image "nginx:latest"
  Normal   Created    58s (x2 over 117s)  kubelet, k8s-node03  Created container ngx-liveness
  Normal   Started    58s (x2 over 116s)  kubelet, k8s-node03  Started container ngx-liveness
  Warning  Unhealthy  31s (x6 over 111s)  kubelet, k8s-node03  Liveness probe failed:
  Normal   Killing    31s (x2 over 91s)   kubelet, k8s-node03  Container ngx-liveness failed liveness probe, will be restarted
  Normal   Pulling    0s (x3 over 2m3s)   kubelet, k8s-node03  Pulling image "nginx:latest"
  
kubectl get pods -o wide | grep ngx-health
ngx-health                          1/1     Running            2          2m13s   10.244.5.44   k8s-node03   <none>           <none>

livenessProbe for HTTPGetAction示例

通過容器的ip地址,埠號及路徑呼叫HTTPGet方法,如果響應的狀態碼大於等於200且小於400,則認為容器健康,spec.containers.livenessProbe.httpGet欄位用於定義此類檢測,它的可用配置欄位包括如下幾個:

  • host :請求的主機地址,預設為Pod IP;也可以在httpHeaders中使用 Host: 來定義
  • port :請求的埠,必選欄位,埠範圍1-65535
  • httpHeaders <[]Object>:自定義的請求報文首部
  • path :請求的HTTP資源路徑,即URL path
  • scheme:建立連線使用的協議,僅可為HTTP或HTTPS,預設為HTTP

1.建立資源配置清單

建立一個Pod——》執行Nginx容器——》首先啟動nginx——》然後沉睡60秒後——〉刪除nginx.pid
通過livenessProbe存活探測的httpGet方式請求nginx專案根目錄下的index.html檔案,訪問埠為80,訪問地址預設為Pod IP,請求協議為HTTP,如果請求失敗則按照重啟策略進行重啟。

cat ngx-health.yaml
apiVersion: v1
kind: Pod
metadata:
  name: ngx-health
spec:
  containers:
  - name: ngx-liveness
    image: nginx:latest
    command:
    - /bin/sh
    - -c
    - /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
    livenessProbe:
      httpGet:
        path: /index.html
        port: 80
        scheme: HTTP
  restartPolicy: Always

2.建立Pod資源物件

kubectl apply -f ngx-health.yaml

3.檢視Pod執行狀態

#容器建立
kubectl get pods -o wide | grep ngx-health
ngx-health                          0/1     ContainerCreating   0          7s      <none>        k8s-node02   <none>           <none>

#容器執行成功
kubectl get pods -o wide | grep ngx-health
ngx-health                          1/1     Running            0          19s     10.244.2.36   k8s-node02   <none>           <none>

4.檢視Pod的詳細事件資訊

容器映象拉取並啟動成功

kubectl describe pods/ngx-health | grep -A 10 Events
Events:
  Type    Reason     Age        From                 Message
  ----    ------     ----       ----                 -------
  Normal  Scheduled  <unknown>  default-scheduler    Successfully assigned default/ngx-health to k8s-node02
  Normal  Pulling    30s        kubelet, k8s-node02  Pulling image "nginx:latest"
  Normal  Pulled     15s        kubelet, k8s-node02  Successfully pulled image "nginx:latest"
  Normal  Created    15s        kubelet, k8s-node02  Created container ngx-liveness
  Normal  Started    14s        kubelet, k8s-node02  Started container ngx-liveness

容器ready狀態後執行60s左右livenessProbe健康檢測,可以看到下面已經又開始拉取映象

kubectl describe pods/ngx-health | grep -A 15 Events
Events:
  Type    Reason     Age               From                 Message
  ----    ------     ----              ----                 -------
  Normal  Scheduled  <unknown>         default-scheduler    Successfully assigned default/ngx-health to k8s-node02
  Normal  Pulled     63s               kubelet, k8s-node02  Successfully pulled image "nginx:latest"
  Normal  Created    63s               kubelet, k8s-node02  Created container ngx-liveness
  Normal  Started    62s               kubelet, k8s-node02  Started container ngx-liveness
  Normal  Pulling    1s (x2 over 78s)  kubelet, k8s-node02  Pulling image "nginx:latest"

映象拉取完後再次重啟建立並啟動了一遍,可以看到 Age 列的時間已經重新計算

kubectl describe pods/ngx-health | grep -A 15 Events
Events:
  Type    Reason     Age                From                 Message
  ----    ------     ----               ----                 -------
  Normal  Scheduled  <unknown>          default-scheduler    Successfully assigned default/ngx-health to k8s-node02
  Normal  Pulling    18s (x2 over 95s)  kubelet, k8s-node02  Pulling image "nginx:latest"
  Normal  Pulled     2s (x2 over 80s)   kubelet, k8s-node02  Successfully pulled image "nginx:latest"
  Normal  Created    2s (x2 over 80s)   kubelet, k8s-node02  Created container ngx-liveness
  Normal  Started    1s (x2 over 79s)   kubelet, k8s-node02  Started container ngx-liveness

長格式輸出Pod,可以看到Pod已經重啟過一次

kubectl get pods -o wide | grep ngx-health
ngx-health                          0/1     Completed          0          96s     10.244.2.36   k8s-node02   <none>           <none>
k8sops@k8s-master01:~/manifests/pod$ kubectl get pods -o wide | grep ngx-health
ngx-health                          1/1     Running            1          104s    10.244.2.36   k8s-node02   <none>           <none>

通過檢視容器日誌,可以看到下面的探測日誌,預設10秒探測一次

kubectl logs -f pods/ngx-health
10.244.2.1 - - [15/May/2020:03:01:13 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:23 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:33 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:43 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:53 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:02:03 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"

livenessProbe for TCPSocketAction示例

通過容器的IP地址和埠號進行TCP檢查,如果能夠建立TCP連線,則表明容器健康。相比較來說,它比基於HTTP的探測要更高效,更節約資源,但精準度略低,畢竟建立連線成功未必意味著頁面資源可用,spec.containers.livenessProbe.tcpSocket欄位用於定義此類檢測,它主要包含以下兩個可用的屬性:

  • host:請求連線的目標IP地址,預設為Pod IP
  • port:請求連線的目標埠,必選欄位
    下面是在資源清單檔案中使用liveness-tcp方式的示例,它向Pod IP的80/tcp埠發起連線請求,並根據連線建立的狀態判定測試結果:

1.建立資源配置清單

apiVersion: v1
kind: Pod
metadata:
  name: ngx-health
spec:
  containers:
  - name: ngx-liveness
    image: nginx:latest
    command:
    - /bin/sh
    - -c
    - /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
    livenessProbe:
      tcpSocket:
        port: 80
  restartPolicy: Always

2.建立資源物件

kubectl apply -f ngx-health.yaml

3.檢視Pod建立屬性資訊

#容器建立並啟動成功
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
  Type    Reason     Age        From                 Message
  ----    ------     ----       ----                 -------
  Normal  Scheduled  <unknown>  default-scheduler    Successfully assigned default/ngx-health to k8s-node02
  Normal  Pulling    19s        kubelet, k8s-node02  Pulling image "nginx:latest"
  Normal  Pulled     9s         kubelet, k8s-node02  Successfully pulled image "nginx:latest"
  Normal  Created    8s         kubelet, k8s-node02  Created container ngx-liveness
  Normal  Started    8s         kubelet, k8s-node02  Started container ngx-liveness

#在容器ready狀態後60s左右Pod已經有了再次拉取映象的動作
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
  Type    Reason     Age                From                 Message
  ----    ------     ----               ----                 -------
  Normal  Scheduled  <unknown>          default-scheduler    Successfully assigned default/ngx-health to k8s-node02
  Normal  Pulled     72s                kubelet, k8s-node02  Successfully pulled image "nginx:latest"
  Normal  Created    71s                kubelet, k8s-node02  Created container ngx-liveness
  Normal  Started    71s                kubelet, k8s-node02  Started container ngx-liveness
  Normal  Pulling    10s (x2 over 82s)  kubelet, k8s-node02  Pulling image "nginx:latest"

#通過長格式輸出Pod,也可以看到當前Pod已經進入了完成的狀態,接下來就是重啟Pod
 kubectl get pods -o wide | grep ngx-health
ngx-health                          0/1     Completed          0          90s     10.244.2.37   k8s-node02   <none>           <none>

健康檢測引數

上面介紹了兩種在不同時間段的探測方式,以及兩種探測方式所支援的探測方法,這裡介紹幾個輔助引數

  • initialDelaySeconds:檢查開始執行的時間,以容器啟動完成為起點計算
  • periodSeconds:檢查執行的週期,預設為10秒,最小為1秒
  • successThreshold:從上次檢查失敗後重新認定檢查成功的檢查次數閾值(必須是連續成功),預設為1,也必須是1
  • timeoutSeconds:檢查超時的時間,預設為1秒,最小為1秒
  • failureThreshold:從上次檢查成功後認定檢查失敗的檢查次數閾值(必須是連續失敗),預設為1

健康檢測實踐

以下示例使用了就緒探測readinessProbe和存活探測livenessProbe

就緒探測配置解析:

  1. 容器在啟動5秒initialDelaySeconds後進行第一次就緒探測,將通過http訪問探測容器網站根目錄下的index.html檔案,如果探測成功,則Pod將被標記為(Ready)狀態。
  2. 然後就緒檢測通過periodSeconds引數所指定的間隔時間進行迴圈探測,下面我所指定的間隔時間是10秒鐘,每隔10秒鐘就緒探測一次。
  3. 每次探測超時時間為3秒,如果探測失敗1次就將此Pod從Service的後端Pod中剔除,剔除後客戶端請求將無法通過Service訪問到其Pod。
  4. 就緒探測還會繼續對其進行探測,那麼如果發現此Pod探測成功1次,通過successThreshold引數設定的值,那麼會將它再次加入後端Pod。

存活探測配置解析

  1. 容器在啟動15秒initialDelaySeconds後進行第一次存活探測,將通過tcpSocket探測容器的80埠,如果探測返回值為0則成功。
  2. 每次存活探測間隔為3秒鐘,每次探測超時時間為1秒,如果連續探測失敗2次則通過重啟策略重啟Pod。
  3. 檢測失敗後的Pod,存活探測還會對其進行探測,如果再探測成功一次,那麼將認為此Pod為健康狀態

1.資源配置清單

cat nginx-health.yaml
#create namespace
apiVersion: v1
kind: Namespace
metadata:
  name: nginx-health-ns
  labels:
    resource: nginx-ns
spec:

---

#create deploy and pod
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-health-deploy
  namespace: nginx-health-ns
  labels:
    resource: nginx-deploy
spec:
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx-health
  template:
    metadata:
      namespace: nginx-health-ns
      labels:
        app: nginx-health
    spec:
      restartPolicy: Always
      containers:
      - name: nginx-health-containers
        image: nginx:1.17.1
        imagePullPolicy: IfNotPresent
        command:
        - /bin/sh
        - -c
        - /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
        readinessProbe:
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
          failureThreshold: 1
          httpGet:
            path: /index.html
            port: 80
            scheme: HTTP
        livenessProbe:
          initialDelaySeconds: 15
          periodSeconds: 3
          successThreshold: 1
          timeoutSeconds: 1
          failureThreshold: 2
          tcpSocket:
            port: 80
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

---

#create service
apiVersion: v1
kind: Service
metadata:
  name: nginx-health-svc
  namespace: nginx-health-ns
  labels:
    resource: nginx-svc
spec:
   clusterIP: 10.106.189.88
   ports:
   - port: 80
     protocol: TCP
     targetPort: 80
   selector:
     app: nginx-health
   sessionAffinity: ClientIP
   type: ClusterIP

2.建立資源物件

kubectl apply -f nginx-health.yaml
namespace/nginx-health-ns created
deployment.apps/nginx-health-deploy created
service/nginx-health-svc created

3.檢視建立的資源物件

k8sops@k8s-master01:/$ kubectl get all -n nginx-health-ns -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
pod/nginx-health-deploy-6bcc8f7f74-6wc6t   1/1     Running   0          24s   10.244.3.50   k8s-node01   <none>           <none>
pod/nginx-health-deploy-6bcc8f7f74-cns27   1/1     Running   0          24s   10.244.5.52   k8s-node03   <none>           <none>
pod/nginx-health-deploy-6bcc8f7f74-rsxjj   1/1     Running   0          24s   10.244.2.42   k8s-node02   <none>           <none>

NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE   SELECTOR
service/nginx-health-svc   ClusterIP   10.106.189.88   <none>        80/TCP    25s   app=nginx-health

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS                IMAGES         SELECTOR
deployment.apps/nginx-health-deploy   3/3     3            3           25s   nginx-health-containers   nginx:1.17.1   app=nginx-health

NAME                                             DESIRED   CURRENT   READY   AGE   CONTAINERS                IMAGES         SELECTOR
replicaset.apps/nginx-health-deploy-6bcc8f7f74   3         3         3       25s   nginx-health-containers   nginx:1.17.1   app=nginx-health,pod-template-hash=6bcc8f7f74

4.檢視Pod狀態,目前Pod狀態都沒有就緒並且完成狀態,準備重啟

k8sops@k8s-master01:/$ kubectl get pods -n nginx-health-ns -o wide
NAME                                   READY   STATUS      RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t   0/1     Completed   0          64s   10.244.3.50   k8s-node01   <none>           <none>
nginx-health-deploy-6bcc8f7f74-cns27   0/1     Completed   0          64s   10.244.5.52   k8s-node03   <none>           <none>
nginx-health-deploy-6bcc8f7f74-rsxjj   0/1     Completed   0          64s   10.244.2.42   k8s-node02   <none>           <none>

5.目前已經有一臺Pod完成重啟,已準備就緒

kubectl get pods -n nginx-health-ns -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t   1/1     Running   1          73s   10.244.3.50   k8s-node01   <none>           <none>
nginx-health-deploy-6bcc8f7f74-cns27   0/1     Running   1          73s   10.244.5.52   k8s-node03   <none>           <none>
nginx-health-deploy-6bcc8f7f74-rsxjj   0/1     Running   1          73s   10.244.2.42   k8s-node02   <none>           <none>

6.三臺Pod都均完成重啟,已準備就緒

kubectl get pods -n nginx-health-ns -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t   1/1     Running   1          85s   10.244.3.50   k8s-node01   <none>           <none>
nginx-health-deploy-6bcc8f7f74-cns27   1/1     Running   1          85s   10.244.5.52   k8s-node03   <none>           <none>
nginx-health-deploy-6bcc8f7f74-rsxjj   1/1     Running   1          85s   10.244.2.42   k8s-node02   <none>           <none>

※更多文章和資料|點選後方文字直達 ↓↓↓
100GPython自學資料包
阿里雲K8s實戰手冊
[阿里雲CDN排坑指南]CDN
ECS運維指南
DevOps實踐手冊
Hadoop大資料實戰手冊
Knative雲原生應用開發指南
OSS 運維實戰手冊
雲原生架構白皮書
Zabbix企業級分散式監控系統原始碼文件
雲原生基礎入門手冊
10G大廠面試題戳領