Kubernetes無法刪除pod問題排查
原本由兩臺Kubernetes組成的小叢集,但是今天只開啟了一臺機器,也就是隻有一個節點,造成了無法刪除pod例項的原因。
先檢視一下現在的容器的執行狀態:
- [[email protected] ~]# kubectl get pods
- NAME READY STATUS RESTARTS AGE
- nginx-controller-lv8md 1/1 Unknown 0 16h
- nginx-controller-sb3fx 1/1 Unknown 2 16h
- nginx2-1216651254-4b2dw 0/1 ImagePullBackOff 0 8m
- nginx2-1216651254-dbtms 0/1 ImagePullBackOff 0 8m
- nginx2-1216651254-fhb4r 0/1 ImagePullBackOff 0 8m
檢視有哪些replicationcontroller [簡寫rc]
- [[email protected] ~]# kubectl get rc
- No resources found.
檢視有哪些services
- [[email protected] ~]# kubectl get svc
- NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
- kubernetes 10.254.0.1 <none> 443/TCP 2d
看到上面沒有rc,也沒有services,那嘗試這樣刪除所有的pods:
- [[email protected] ~]# kubectl delete pods --all
- pod "nginx-controller-lv8md" deleted
- pod "nginx-controller-sb3fx" deleted
- pod "nginx2-1216651254-4b2dw" deleted
- pod "nginx2-1216651254-dbtms" deleted
- pod "nginx2-1216651254-fhb4r" deleted
但是還是無法刪除,檢視已經部署的容器;
- [[email protected] ~]# kubectl get deployment
- NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
- nginx2 3 3 3 0 16h
- [[email protected] ~]# kubectl delete deployment nginx2
- deployment "nginx2" deleted
為什麼這三個Pod例項沒有rc或者services呢,因為建立它的時候是使用run來實現的;
但是剩下的兩個例項怎麼刪除呢?
- [[email protected] ~]# kubectl get pods
- NAME READY STATUS RESTARTS AGE
- nginx-controller-lv8md 1/1 Unknown 0 20h
- nginx-controller-sb3fx 1/1 Unknown 2 20h
檢視一下例項的資訊:
- [[email protected] ~]# kubectl describe pod nginx-controller-lv8md
- Name: nginx-controller-lv8md
- Namespace: default
- Node: k8s-node/10.0.10.11 #這裡是重點,因為這兩個容器是分配到了k8s-node節點上,而這個節點現在宕機了。
- Start Time: Tue, 13 Jun 2017 02:01:45 +0800
- Labels: app=nginx
- Status: Terminating (expires Mon, 12 Jun 2017 21:46:21 +0800)
- Termination Grace Period: 30s
- Reason: NodeLost
- Message: Node k8s-node which was running pod nginx-controller-lv8md is unresponsive
- IP: 172.21.42.3
- Controllers: ReplicationController/nginx-controller
- Containers:
- nginx:
- Container ID: docker://03fa59f9efc06e43ed8c9acc7d4c7533983d5733223dbb2efa5f65928d965b5b
- Image: reg.docker.lc/share/nginx:latest
- Image ID: docker-pullable://reg.docker.lc/share/[email protected]:e5c82328a509aeb7c18c1d7fb36633dc638fcf433f651bdcda59c1cc04d3ee55
- Port: 80/TCP
- State: Running
- Started: Tue, 13 Jun 2017 02:01:47 +0800
- Ready: True
- Restart Count: 0
- Volume Mounts: <none>
- Environment Variables: <none>
- Conditions:
- Type Status
- Initialized True
- Ready False
- PodScheduled True
- No volumes.
- QoS Class: BestEffort
- Tolerations: <none>
- No events.
因為這兩個容器的rc,services都已經刪除了,但是還保持這個Unknown狀態是由於目標主機無法響應並返回資訊導致;既然目標主機都已經宕機了,那就直接移除節點;
- [[email protected] ~]# kubectl delete node k8s-node
- node "k8s-node" deleted
- [[email protected] ~]# kubectl get node
- NAME STATUS AGE
- k8s Ready 2d
因為節點都不存在了,那就沒有了容器的狀態資訊;
- [[email protected] ~]# kubectl get pods
- No resources found.
上面這樣雖然是刪除了那些已經也錯的例項,但卻刪除了節點。 那麼如果節點現在恢復了,是否會自動加入到叢集呢?
主節點上的日誌很快就發現了k8s-node節點了;
- Jun 13 15:06:13 k8s kube-controller-manager[34050]: E0613 15:06:13.917313 34050 actual_state_of_world.go:475] Failed to set statusUpdateNeeded to needed true because nodeName="k8s-node" does not exist
- Jun 13 15:06:14 k8s kube-controller-manager[34050]: I0613 15:06:14.618864 34050 event.go:217] Event(api.ObjectReference{Kind:"Node", Namespace:"", Name:"k8s-node", UID:"c9864434-5006-11e7-ab16-000c29e9277a", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node k8s-node event: Registered Node k8s-node in NodeController
節點自動又回來了,因為在配置檔案中已經配置好了,只要節點一上線就會加入到叢集中。
- [[email protected] ~]# kubectl get nodes
- NAME STATUS AGE
- k8s Ready 2d
- k8s-node Ready 1m
建立一個Nginx例項:
- [[email protected] ~]# kubectl create -f Nginx.yaml
- replicationcontroller "nginx-controller" created
- service "nginx-service" created
Kubernetes已經把此例項分配到剛啟動的k8s-node節點上了;
- [[email protected] ~]# kubectl get pods
- NAME READY STATUS RESTARTS AGE
- nginx-controller-zsx2q 1/1 Running 0 6s
- [[email protected] ~]# kubectl describe pod nginx-controller-zsx2q |grep "Node"
- Node: k8s-node/10.0.10.11
動態擴充套件,新增一個容器數量;
- [[email protected] ~]# kubectl scale replicationcontroller --replicas=2 nginx-controller
- replicationcontroller "nginx-controller" scaled
檢視叢集現在的狀態資訊;
- [[email protected] ~]# kubectl describe svc nginx-service
- Name: nginx-service
- Namespace: default
- Labels: <none>
- Selector: app=nginx
- Type: ClusterIP
- IP: 10.254.132.82
- External IPs: 10.0.10.10
- Port: <unset> 8000/TCP
- Endpoints: 172.21.42.2:80,172.21.93.2:80
- Session Affinity: None
- No events.
此時再刪除k8s-node節點又會如何呢?
- [[email protected] ~]# kubectl delete node k8s-node
- node "k8s-node" deleted
已經刪除了一個節點了:
- [[email protected] ~]# kubectl get nodes
- NAME STATUS AGE
- k8s Ready 2d
Pod的數量沒有減少還是兩個:
- [[email protected] ~]# kubectl get pods
- NAME READY STATUS RESTARTS AGE
- nginx-controller-43qpx 1/1 Running 0 14m
- nginx-controller-zsx2q 1/1 Running 0 19m
檢視叢集中的分配資訊時發現,兩個容器的Ip段都是172.21.93的說明來自同一節點;
- [[email protected] ~]# kubectl describe svc nginx-service
- Name: nginx-service
- Namespace: default
- Labels: <none>
- Selector: app=nginx
- Type: ClusterIP
- IP: 10.254.132.82
- External IPs: 10.0.10.10
- Port: <unset> 8000/TCP
- Endpoints: 172.21.93.2:80,172.21.93.3:80
- Session Affinity: None
- No events.
重新加入k8s-node 節點,需要在k8s-node上重啟一下服務:
- systemctl restart kubelet
--------------------- 本文來自 yao不ke及 的CSDN 部落格 ,全文地址請點選:https://blog.csdn.net/qq_19674905/article/details/80887461?utm_source=copy