Pod健康檢查介紹
Pod健康檢查介紹
預設情況下,kubelet根據容器執行狀態作為健康依據,不能監控容器中應用程式狀態,例如程式假死。這就會導致無法提供服務,丟失流量。因此引入健康檢查機制確保容器健康存活。
Pod通過兩類探針來檢查容器的健康狀態。分別是LivenessProbe
(存活探測)和 ReadinessProbe
(就緒探測)。
livenessProbe(存活探測)
存活探測將通過http、shell命令或者tcp等方式去檢測容器中的應用是否健康,然後將檢查結果返回給kubelet,如果檢查容器中應用為不健康狀態提交給kubelet後,kubelet將根據Pod配置清單中定義的重啟策略restartPolicy
readinessProbe(就緒探測)
就緒探測也是通過http、shell命令或者tcp等方式去檢測容器中的應用是否健康或則是否能夠正常對外提供服務,如果能夠正常對外提供服務,則認為該容器為(Ready狀態),達到(Ready狀態)的Pod才可以接收請求。
對於被Service所管理的Pod,Service與被管理Pod的關聯關係也將基於Pod是否Ready進行設定,Pod物件啟動後,容器應用通常需要一段時間才能完成其初始化的過程,例如載入配置或資料,甚至有些程式需要執行某類的預熱過程,若在此階段完成之前就接收客戶端的請求,那麼客戶端返回時間肯定非常慢,嚴重影響了體驗,所以因為避免Pod物件啟動後立即讓其處理客戶端請求,而是等待容器初始化工作執行完成並轉為Ready狀態後再接收客戶端請求。
如果容器或則Pod狀態為(NoReady)狀態,Kubernetes則會把該Pod從Service的後端endpoints Pod中去剔除。
健康檢測實現方式
以上介紹了兩種探測型別livenessProbe
(存活探測),readinessProbe
(就緒探測),這兩種探測都支援以下方式對容器進行健康檢查
- ExecAction:在容器中執行命令,命令執行後返回的狀態為0則成功,表示我們探測結果正常
- HTTPGetAction:根據容器IP、埠以及路徑傳送HTTP請求,返回碼如果是200-400之間表示成功
- TCPSocketAction:根據容器IP地址及特定的埠進行TCP檢查,埠開放表示成功
以上每種檢查動作都可能有以下三種返回狀態
- Success,表示通過了健康檢查
- Failure,表示沒有通過健康檢查
- Unknown,表示檢查動作失敗
livenessProbe存活探測示例
livenessProbe for ExecActiion 示例
通過在目標容器中執行由使用者自定義的命令來判定容器的健康狀態,即在容器內部執行一個命令,如果改命令的返回碼為0,則表明容器健康。spec.containers.LivenessProbe
欄位用於定義此類檢測,它只有一個可用屬性command,用於指定要執行的命令,下面是在資源清單檔案中使用liveness-exec方式的示例:
1.建立資源配置清單
建立一個Pod——》執行Nginx容器——》首先啟動nginx——》然後沉睡60秒後——〉刪除nginx.pid
通過livenessProbe存活探測的exec命令判斷nginx.pid檔案是否存在,如果探測返回結果非0,則按照重啟策略進行重啟。
預期是容器真正(Ready)狀態60s後,刪除nginx.pid,exec命令探測生效,按照重啟策略進行重啟
cat ngx-health.yaml
apiVersion: v1
kind: Pod
metadata:
name: ngx-health
spec:
containers:
- name: ngx-liveness
image: nginx:latest
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
livenessProbe:
exec:
command: [ "/bin/sh", "-c", "test", "-e", "/run/nginx.pid" ]
restartPolicy: Always
2.建立Pod資源
kubectl apply -f ngx-health.yaml
等待Pod Ready
3.檢視Pod的詳細資訊
#第一次檢視,Pod中的容器啟動成功,事件正常
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Normal Pulling 12s kubelet, k8s-node03 Pulling image "nginx:latest"
Normal Pulled 6s kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 6s kubelet, k8s-node03 Created container ngx-liveness
Normal Started 5s kubelet, k8s-node03 Started container ngx-liveness
#第二次檢視,容器的livenessProbe探測失敗,
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Normal Pulling 52s kubelet, k8s-node03 Pulling image "nginx:latest"
Normal Pulled 46s kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 46s kubelet, k8s-node03 Created container ngx-liveness
Normal Started 45s kubelet, k8s-node03 Started container ngx-liveness
Warning Unhealthy 20s (x3 over 40s) kubelet, k8s-node03 Liveness probe failed:
Normal Killing 20s kubelet, k8s-node03 Container ngx-liveness failed liveness probe, will be restarted
#第三次檢視,已經重新拉取映象,然後建立容器再啟動容器
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Warning Unhealthy 35s (x3 over 55s) kubelet, k8s-node03 Liveness probe failed:
Normal Killing 35s kubelet, k8s-node03 Container ngx-liveness failed liveness probe, will be restarted
Normal Pulling 4s (x2 over 67s) kubelet, k8s-node03 Pulling image "nginx:latest"
Normal Pulled 2s (x2 over 61s) kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 2s (x2 over 61s) kubelet, k8s-node03 Created container ngx-liveness
Normal Started 2s (x2 over 60s) kubelet, k8s-node03 Started container ngx-liveness
通過長格式輸出可以看到如下,第一次長格式輸出Pod執行時間22s,重啟次數為0
第二次長格式輸出,執行時間是76s,Pod已經完成一次重啟
kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 0 22s 10.244.5.44 k8s-node03 <none> <none>
kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 1 76s 10.244.5.44 k8s-node03 <none> <none>
第二次健康探測失敗及第二次重啟
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Normal Pulled 58s (x2 over 117s) kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 58s (x2 over 117s) kubelet, k8s-node03 Created container ngx-liveness
Normal Started 58s (x2 over 116s) kubelet, k8s-node03 Started container ngx-liveness
Warning Unhealthy 31s (x6 over 111s) kubelet, k8s-node03 Liveness probe failed:
Normal Killing 31s (x2 over 91s) kubelet, k8s-node03 Container ngx-liveness failed liveness probe, will be restarted
Normal Pulling 0s (x3 over 2m3s) kubelet, k8s-node03 Pulling image "nginx:latest"
kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 2 2m13s 10.244.5.44 k8s-node03 <none> <none>
livenessProbe for HTTPGetAction示例
通過容器的ip地址,埠號及路徑呼叫HTTPGet方法,如果響應的狀態碼大於等於200且小於400,則認為容器健康,spec.containers.livenessProbe.httpGet
欄位用於定義此類檢測,它的可用配置欄位包括如下幾個:
- host :請求的主機地址,預設為Pod IP;也可以在httpHeaders中使用 Host: 來定義
- port :請求的埠,必選欄位,埠範圍1-65535
- httpHeaders <[]Object>:自定義的請求報文首部
- path :請求的HTTP資源路徑,即URL path
- scheme:建立連線使用的協議,僅可為HTTP或HTTPS,預設為HTTP
1.建立資源配置清單
建立一個Pod——》執行Nginx容器——》首先啟動nginx——》然後沉睡60秒後——〉刪除nginx.pid
通過livenessProbe存活探測的httpGet方式請求nginx專案根目錄下的index.html檔案,訪問埠為80,訪問地址預設為Pod IP,請求協議為HTTP,如果請求失敗則按照重啟策略進行重啟。
cat ngx-health.yaml
apiVersion: v1
kind: Pod
metadata:
name: ngx-health
spec:
containers:
- name: ngx-liveness
image: nginx:latest
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
livenessProbe:
httpGet:
path: /index.html
port: 80
scheme: HTTP
restartPolicy: Always
2.建立Pod資源物件
kubectl apply -f ngx-health.yaml
3.檢視Pod執行狀態
#容器建立
kubectl get pods -o wide | grep ngx-health
ngx-health 0/1 ContainerCreating 0 7s <none> k8s-node02 <none> <none>
#容器執行成功
kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 0 19s 10.244.2.36 k8s-node02 <none> <none>
4.檢視Pod的詳細事件資訊
容器映象拉取並啟動成功
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulling 30s kubelet, k8s-node02 Pulling image "nginx:latest"
Normal Pulled 15s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 15s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 14s kubelet, k8s-node02 Started container ngx-liveness
容器ready狀態後執行60s左右livenessProbe健康檢測,可以看到下面已經又開始拉取映象
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulled 63s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 63s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 62s kubelet, k8s-node02 Started container ngx-liveness
Normal Pulling 1s (x2 over 78s) kubelet, k8s-node02 Pulling image "nginx:latest"
映象拉取完後再次重啟建立並啟動了一遍,可以看到 Age 列的時間已經重新計算
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulling 18s (x2 over 95s) kubelet, k8s-node02 Pulling image "nginx:latest"
Normal Pulled 2s (x2 over 80s) kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 2s (x2 over 80s) kubelet, k8s-node02 Created container ngx-liveness
Normal Started 1s (x2 over 79s) kubelet, k8s-node02 Started container ngx-liveness
長格式輸出Pod,可以看到Pod已經重啟過一次
kubectl get pods -o wide | grep ngx-health
ngx-health 0/1 Completed 0 96s 10.244.2.36 k8s-node02 <none> <none>
k8sops@k8s-master01:~/manifests/pod$ kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 1 104s 10.244.2.36 k8s-node02 <none> <none>
通過檢視容器日誌,可以看到下面的探測日誌,預設10秒探測一次
kubectl logs -f pods/ngx-health
10.244.2.1 - - [15/May/2020:03:01:13 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:23 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:33 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:43 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:53 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:02:03 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
livenessProbe for TCPSocketAction示例
通過容器的IP地址和埠號進行TCP檢查,如果能夠建立TCP連線,則表明容器健康。相比較來說,它比基於HTTP的探測要更高效,更節約資源,但精準度略低,畢竟建立連線成功未必意味著頁面資源可用,spec.containers.livenessProbe.tcpSocket
欄位用於定義此類檢測,它主要包含以下兩個可用的屬性:
- host:請求連線的目標IP地址,預設為Pod IP
- port:請求連線的目標埠,必選欄位
下面是在資源清單檔案中使用liveness-tcp方式的示例,它向Pod IP的80/tcp埠發起連線請求,並根據連線建立的狀態判定測試結果:
1.建立資源配置清單
apiVersion: v1
kind: Pod
metadata:
name: ngx-health
spec:
containers:
- name: ngx-liveness
image: nginx:latest
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
livenessProbe:
tcpSocket:
port: 80
restartPolicy: Always
2.建立資源物件
kubectl apply -f ngx-health.yaml
3.檢視Pod建立屬性資訊
#容器建立並啟動成功
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulling 19s kubelet, k8s-node02 Pulling image "nginx:latest"
Normal Pulled 9s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 8s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 8s kubelet, k8s-node02 Started container ngx-liveness
#在容器ready狀態後60s左右Pod已經有了再次拉取映象的動作
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulled 72s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 71s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 71s kubelet, k8s-node02 Started container ngx-liveness
Normal Pulling 10s (x2 over 82s) kubelet, k8s-node02 Pulling image "nginx:latest"
#通過長格式輸出Pod,也可以看到當前Pod已經進入了完成的狀態,接下來就是重啟Pod
kubectl get pods -o wide | grep ngx-health
ngx-health 0/1 Completed 0 90s 10.244.2.37 k8s-node02 <none> <none>
健康檢測引數
上面介紹了兩種在不同時間段的探測方式,以及兩種探測方式所支援的探測方法,這裡介紹幾個輔助引數
- initialDelaySeconds:檢查開始執行的時間,以容器啟動完成為起點計算
- periodSeconds:檢查執行的週期,預設為10秒,最小為1秒
- successThreshold:從上次檢查失敗後重新認定檢查成功的檢查次數閾值(必須是連續成功),預設為1,也必須是1
- timeoutSeconds:檢查超時的時間,預設為1秒,最小為1秒
- failureThreshold:從上次檢查成功後認定檢查失敗的檢查次數閾值(必須是連續失敗),預設為1
健康檢測實踐
以下示例使用了就緒探測readinessProbe和存活探測livenessProbe
就緒探測配置解析:
- 容器在啟動5秒
initialDelaySeconds
後進行第一次就緒探測,將通過http訪問探測容器網站根目錄下的index.html檔案,如果探測成功,則Pod將被標記為(Ready)狀態。 - 然後就緒檢測通過
periodSeconds
引數所指定的間隔時間進行迴圈探測,下面我所指定的間隔時間是10秒鐘,每隔10秒鐘就緒探測一次。 - 每次探測超時時間為3秒,如果探測失敗1次就將此Pod從Service的後端Pod中剔除,剔除後客戶端請求將無法通過Service訪問到其Pod。
- 就緒探測還會繼續對其進行探測,那麼如果發現此Pod探測成功1次,通過
successThreshold
引數設定的值,那麼會將它再次加入後端Pod。
存活探測配置解析
- 容器在啟動15秒
initialDelaySeconds
後進行第一次存活探測,將通過tcpSocket探測容器的80埠,如果探測返回值為0則成功。 - 每次存活探測間隔為3秒鐘,每次探測超時時間為1秒,如果連續探測失敗2次則通過重啟策略重啟Pod。
- 檢測失敗後的Pod,存活探測還會對其進行探測,如果再探測成功一次,那麼將認為此Pod為健康狀態
1.資源配置清單
cat nginx-health.yaml
#create namespace
apiVersion: v1
kind: Namespace
metadata:
name: nginx-health-ns
labels:
resource: nginx-ns
spec:
---
#create deploy and pod
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-health-deploy
namespace: nginx-health-ns
labels:
resource: nginx-deploy
spec:
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-health
template:
metadata:
namespace: nginx-health-ns
labels:
app: nginx-health
spec:
restartPolicy: Always
containers:
- name: nginx-health-containers
image: nginx:1.17.1
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
failureThreshold: 1
httpGet:
path: /index.html
port: 80
scheme: HTTP
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
failureThreshold: 2
tcpSocket:
port: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
---
#create service
apiVersion: v1
kind: Service
metadata:
name: nginx-health-svc
namespace: nginx-health-ns
labels:
resource: nginx-svc
spec:
clusterIP: 10.106.189.88
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx-health
sessionAffinity: ClientIP
type: ClusterIP
2.建立資源物件
kubectl apply -f nginx-health.yaml
namespace/nginx-health-ns created
deployment.apps/nginx-health-deploy created
service/nginx-health-svc created
3.檢視建立的資源物件
k8sops@k8s-master01:/$ kubectl get all -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-health-deploy-6bcc8f7f74-6wc6t 1/1 Running 0 24s 10.244.3.50 k8s-node01 <none> <none>
pod/nginx-health-deploy-6bcc8f7f74-cns27 1/1 Running 0 24s 10.244.5.52 k8s-node03 <none> <none>
pod/nginx-health-deploy-6bcc8f7f74-rsxjj 1/1 Running 0 24s 10.244.2.42 k8s-node02 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/nginx-health-svc ClusterIP 10.106.189.88 <none> 80/TCP 25s app=nginx-health
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/nginx-health-deploy 3/3 3 3 25s nginx-health-containers nginx:1.17.1 app=nginx-health
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/nginx-health-deploy-6bcc8f7f74 3 3 3 25s nginx-health-containers nginx:1.17.1 app=nginx-health,pod-template-hash=6bcc8f7f74
4.檢視Pod狀態,目前Pod狀態都沒有就緒並且完成狀態,準備重啟
k8sops@k8s-master01:/$ kubectl get pods -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t 0/1 Completed 0 64s 10.244.3.50 k8s-node01 <none> <none>
nginx-health-deploy-6bcc8f7f74-cns27 0/1 Completed 0 64s 10.244.5.52 k8s-node03 <none> <none>
nginx-health-deploy-6bcc8f7f74-rsxjj 0/1 Completed 0 64s 10.244.2.42 k8s-node02 <none> <none>
5.目前已經有一臺Pod完成重啟,已準備就緒
kubectl get pods -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t 1/1 Running 1 73s 10.244.3.50 k8s-node01 <none> <none>
nginx-health-deploy-6bcc8f7f74-cns27 0/1 Running 1 73s 10.244.5.52 k8s-node03 <none> <none>
nginx-health-deploy-6bcc8f7f74-rsxjj 0/1 Running 1 73s 10.244.2.42 k8s-node02 <none> <none>
6.三臺Pod都均完成重啟,已準備就緒
kubectl get pods -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t 1/1 Running 1 85s 10.244.3.50 k8s-node01 <none> <none>
nginx-health-deploy-6bcc8f7f74-cns27 1/1 Running 1 85s 10.244.5.52 k8s-node03 <none> <none>
nginx-health-deploy-6bcc8f7f74-rsxjj 1/1 Running 1 85s 10.244.2.42 k8s-node02 <none> <none>
※更多文章和資料|點選後方文字直達 ↓↓↓
100GPython自學資料包
阿里雲K8s實戰手冊
[阿里雲CDN排坑指南]CDN
ECS運維指南
DevOps實踐手冊
Hadoop大資料實戰手冊
Knative雲原生應用開發指南
OSS 運維實戰手冊
雲原生架構白皮書
Zabbix企業級分散式監控系統原始碼文件
雲原生基礎入門手冊
10G大廠面試題戳領