CentOS 7 yum安裝 k8s 建立Pod一直處於ContainerCreating狀態 問題解決
問題描述
使用CentOS7的 yum 包管理器安裝了 Kubernetes 叢集,使用 kubectl 建立服務成功後,執行 kubectl get pods
,發現AGE雖然在不斷增加,但狀態始終不變
本文內容
- 分析問題原因
- 給出直接解決此問題的方式 (不完美)
- 給出其他方案
且聽我娓娓道來~
問題分析與解決
kubectl 提供了 describe 子命令來輸出指定的一個/多個資源的詳細資訊。
執行 kubectl describe pod mytomcat-9lcq5
,檢視問題 Pod 的狀態資訊,輸出如下:
[root@kube-master app]# kubectl describe pod mytomcat-9lcq5 Name: mytomcat-9lcq5 Namespace: default Node: kube-node-2/192.168.87.145 Start Time: Fri, 17 Apr 2020 15:53:50 +0800 Labels: app=mytomcat Status: Pending IP: Controllers: ReplicationController/mytomcat Containers: mytomcat: Container ID: Image: tomcat:9-jre8-alpine Image ID: Port: 8080/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Volume Mounts: <none> Environment Variables: <none> Conditions: Type Status Initialized True Ready False PodScheduled True No volumes. QoS Class: BestEffort Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 5m 5m 1 {default-scheduler } Normal Scheduled Successfully assigned mytomcat-9lcq5 to kube-node-2 4m 4m 1 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Get https://registry.access.redhat.com/v1/_ping: net/http: TLS handshake timeout)" 3m 3m 1 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Network timed out while trying to connect to https://registry.access.redhat.com/v1/repositories/rhel7/pod-infrastructure/images. You may want to check your internet connection or if you are behind a proxy.)" 2m 2m 1 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Error: image rhel7/pod-infrastructure:latest not found)" 3m 1m 3 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"registry.access.redhat.com/rhel7/pod-infrastructure:latest\""
通過檢視最下方的輸出資訊,Successfully assigned mytomcat-9lcq5 to kube-node-2
說明這個 Pod 分配到 kube-node-2 這個主機上了,然後在這個主機上建立 Pod 失敗,
原因是 image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request.
通過以上資訊,我們瞭解到通過紅帽自家的 docker 倉庫 pull 映象,需要使用 CA 證書進行認證,才能 pull 成功
docker的證書在 /etc/docker/certs.d
目錄下,根據上邊的錯誤提示域名是 registry.access.redhat.com
,證書在這個目錄中
經過 ll
命令檢視,發現 /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt
是一個軟連結(軟連結是什麼?),指向到 /etc/rhsm/ca/redhat-uep.pem
,
熟悉軟連線的我們知道,處於紅色閃爍狀態的目標是不存在,需要生成 /etc/rhsm/ca/redhat-uep.pem
證書檔案
生成證書:
# openssl s_client -showcerts -servername registry.access.redhat.com -connect registry.access.redhat.com:443 </dev/null 2>/dev/null | openssl x509 -text > /etc/rhsm/ca/redhat-uep.pem
生成證書命令執行有時會出現
unable to load certificate 139930742028176:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:707:Expecting: TRUSTED CERTIFICATE
問題,重新執行就好
命令執行完畢後,檢視軟連結指向的證書檔案:
[root@kube-node-2 registry.access.redhat.com]# ll /etc/rhsm/ca/redhat-uep.pem
-rw-r--r-- 1 root root 9233 Apr 17 16:55 /etc/rhsm/ca/redhat-uep.pem
證書檔案已經存在,我們去 k8s 管理節點 kube-master 主機刪除剛才的 Pods,等待 Pod 重新建立成功 (第二個節點因為網路問題沒有拉成功映象……)
至此完成 Pod 的建立
但是還有存在些問題的,當前國內網路環境訪問外邊的網路偶爾會有問題,導致建立 Pod 失敗,通過 describe 描述還是同樣的資訊提示,但是檢視證書檔案卻存在且有內容
原因分析與其他方案
k8s 管理節點分配建立 Pod 到執行節點,到達執行節點後,拉取紅帽 docker 倉庫的 Pod基礎映象 pod-infrastructure:latest
,由於其倉庫使用 https 需要驗證證書,證書不存在導致失敗
另外就是因為拉取的映象是紅帽 docker 倉庫中的,在國內網路環境下握手失敗,無法下載映象
所以問題就成了 如何解決 k8s pod-infrastructure 映象拉取失敗
,這裡給出一個方案,步驟如下:
-
拉取 docker 官方倉庫其他人上傳的
pod-infrastructure
映象,docker pull tianyebj/pod-infrastructure
-
新增tag標籤,改為私有倉庫地址,如:
docker tag tianyebj/pod-infrastructure 10.2.7.70:5000/dev/pod-infrastructure
-
push映象到私有倉庫,如:
docker push 10.2.7.70:5000/dev/pod-infrastructure
-
修改所有 worker 節點的
/etc/kubernetes/kubelet
,修改registry.access.redhat.com/rhel7/pod-infrastructure
為剛才設定的 tag 標籤sed -i "s#registry.access.redhat.com/rhel7/pod-infrastructure#<私有倉庫pod-infrastructure映象tag>#" /etc/kubernetes/kubelet
-
重啟所有 worker 節點的 kubelet,
systemctl restart kubelet
,即可
注意事項:
- 上傳的映象要設為公開映象,否則 kubelet 自己沒許可權拉映象的,另外也可以去 ssh 登入 worker 節點登入倉庫,執行
docker pull <私有倉庫pod-infrastructure映象tag>
最後的效果:
參考
https://github.com/CentOS/sig-atomic-buildscripts/issues/329
https://cloud.tencent.com/developer/article/1156329
本文采用 CC BY 4.0 協議進行授權,轉載請標註作者署名及來源。
https://www.cnblogs.com/hellxz/p/k8s-pod-always-container-creating-status-problem.h