解決k8s的coredns一直處於的crashloopbackoff問題

阿新 • • 發佈：2020-12-30

首先來看看採坑記錄

1-檢視日誌：kubectl logs得到具體的報錯：

1 [root@i-F998A4DE ~]# kubectl logs -n kube-system coredns-fb8b8dccf-hhkfm
2 log is DEPRECATED and will be removed in a future version. Use logs instead.
3 E1230 03:03:51.298180       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https:// 
10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
4 E1230 03:03:51.298180       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host 

5 log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-fb8b8dccf-hhkfm.unknownuser.log.ERROR.20201230-030351.1: no such file or directory

2-檢視pod具體資訊：kubectl describe pod得到一些可能比較沒用的資訊：

 1 [root@i-F998A4DE ~]# kubectl describe po -n kube-system coredns-fb8b8dccf-s2nj9
 2 Name:               coredns-fb8b8dccf-s2nj9
 
 3 Namespace:          kube-system
 4 Priority:           2000000000
 5 PriorityClassName:  system-cluster-critical
 6 Node:               master/10.252.37.41
 7 Start Time:         Wed, 30 Dec 2020 10:28:40 +0800
 8 Labels:             k8s-app=kube-dns
 9                     pod-template-hash=fb8b8dccf
10 Annotations:        <none>
11 Status:             Running
12 IP:                 10.244.0.3
13 Controlled By:      ReplicaSet/coredns-fb8b8dccf
14 Containers:
15   coredns:
16     Container ID:  docker://50bab6b378f236af89bec945083bfe1af293a71f1276c3c8df324cfbe6540a54
17     Image:         k8s.gcr.io/coredns:1.3.1
18     Image ID:      docker://sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c
19     Ports:         53/UDP, 53/TCP, 9153/TCP
20     Host Ports:    0/UDP, 0/TCP, 0/TCP
21     Args:
22       -conf
23       /etc/coredns/Corefile
24     State:          Waiting
25       Reason:       CrashLoopBackOff
26     Last State:     Terminated
27       Reason:       Error
28       Exit Code:    2
29       Started:      Wed, 30 Dec 2020 10:29:00 +0800
30       Finished:     Wed, 30 Dec 2020 10:29:01 +0800
31     Ready:          False
32     Restart Count:  2
33     Limits:
34       memory:  170Mi
35     Requests:
36       cpu:        100m
37       memory:     70Mi
38     Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
39     Readiness:    http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
40     Environment:  <none>
41     Mounts:
42       /etc/coredns from config-volume (ro)
43       /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-2gw5w (ro)
44 Conditions:
45   Type              Status
46   Initialized       True 
47   Ready             False 
48   ContainersReady   False 
49   PodScheduled      True 
50 Volumes:
51   config-volume:
52     Type:      ConfigMap (a volume populated by a ConfigMap)
53     Name:      coredns
54     Optional:  false
55   coredns-token-2gw5w:
56     Type:        Secret (a volume populated by a Secret)
57     SecretName:  coredns-token-2gw5w
58     Optional:    false
59 QoS Class:       Burstable
60 Node-Selectors:  beta.kubernetes.io/os=linux
61 Tolerations:     CriticalAddonsOnly
62                  node-role.kubernetes.io/master:NoSchedule
63                  node.kubernetes.io/not-ready:NoExecute for 300s
64                  node.kubernetes.io/unreachable:NoExecute for 300s
65 Events:
66   Type     Reason            Age                From               Message
67   ----     ------            ----               ----               -------
68   Warning  FailedScheduling  38s (x4 over 48s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
69   Normal   Scheduled         30s                default-scheduler  Successfully assigned kube-system/coredns-fb8b8dccf-s2nj9 to master
70   Normal   Pulled            11s (x3 over 29s)  kubelet, master    Container image "k8s.gcr.io/coredns:1.3.1" already present on machine
71   Normal   Created           11s (x3 over 29s)  kubelet, master    Created container coredns
72   Normal   Started           10s (x3 over 28s)  kubelet, master    Started container coredns
73   Warning  BackOff           2s (x6 over 26s)   kubelet, master    Back-off restarting failed container

3-修改coredns配置資訊也沒有效果

  1 [root@master ~]# kubectl edit deployment coredns -n kube-system
  2 # Please edit the object below. Lines beginning with a '#' will be ignored,
  3 # and an empty file will abort the edit. If an error occurs while saving this file will be
  4 # reopened with the relevant failures.
  5 apiVersion: extensions/v1beta1
  6 kind: Deployment
  7 metadata:
  8   annotations:
  9     deployment.kubernetes.io/revision: "1"
 10   creationTimestamp: "2020-12-30T02:28:07Z"
 11   generation: 3
 12   labels:
 13     k8s-app: kube-dns
 14   name: coredns
 15   namespace: kube-system
 16   resourceVersion: "6088"
 17   selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/coredns
 18   uid: a718d791-4a46-11eb-91a1-d00df998a4de
 19 spec:
 20   progressDeadlineSeconds: 600
 21   replicas: 2 #先將這裡改為0，k8s更新完之後，再將這裡改回2
 22   revisionHistoryLimit: 10
 23   selector:
 24     matchLabels:
 25       k8s-app: kube-dns
 26   strategy:
 27     rollingUpdate:
 28       maxSurge: 25%
 29       maxUnavailable: 1
 30     type: RollingUpdate
 31   template:
 32     metadata:
 33       creationTimestamp: null
 34       labels:
 35         k8s-app: kube-dns
 36     spec:
 37       containers:
 38       - args:
 39         - -conf
 40         - /etc/coredns/Corefile
 41         image: k8s.gcr.io/coredns:1.3.1
 42         imagePullPolicy: IfNotPresent
 43         livenessProbe:
 44           failureThreshold: 5
 45           httpGet:
 46             path: /health
 47             port: 8080
 48             scheme: HTTP
 49           periodSeconds: 10
 50           successThreshold: 1
 51           timeoutSeconds: 1
 52         resources:
 53           limits:
 54             memory: 170Mi
 55           requests:
 56             cpu: 100m
 57             memory: 70Mi
 58         securityContext:
 59           allowPrivilegeEscalation: false
 60           capabilities:
 61             add:
 62             - NET_BIND_SERVICE
 63             drop:
 64             - all
 65           procMount: Default
 66           readOnlyRootFilesystem: true
 67         terminationMessagePath: /dev/termination-log
 68         terminationMessagePolicy: File
 69         volumeMounts:
 70         - mountPath: /etc/coredns
 71           name: config-volume
 72           readOnly: true
 73       dnsPolicy: Default
 74       nodeSelector:
 75         beta.kubernetes.io/os: linux
 76       priorityClassName: system-cluster-critical
 77       restartPolicy: Always
 78       schedulerName: default-scheduler
 79       securityContext: {}
 80       serviceAccount: coredns
 81       serviceAccountName: coredns
 82       terminationGracePeriodSeconds: 30
 83       tolerations:
 84       - key: CriticalAddonsOnly
 85         operator: Exists
 86       - effect: NoSchedule
 87         key: node-role.kubernetes.io/master
 88       volumes:
 89       - configMap:
 90           defaultMode: 420
 91           items:
 92           - key: Corefile
 93             path: Corefile
 94           name: coredns
 95         name: config-volume
 96 status:
 97   availableReplicas: 2
 98   conditions:
 99   - lastTransitionTime: "2020-12-30T03:00:38Z"
100     lastUpdateTime: "2020-12-30T03:00:38Z"
101     message: ReplicaSet "coredns-fb8b8dccf" has successfully progressed.
102     reason: NewReplicaSetAvailable
103     status: "True"
104     type: Progressing
105   - lastTransitionTime: "2020-12-30T03:38:49Z"
106     lastUpdateTime: "2020-12-30T03:38:49Z"
107     message: Deployment has minimum availability.
108     reason: MinimumReplicasAvailable
109     status: "True"
110     type: Available
111   observedGeneration: 3
112   readyReplicas: 2
113   replicas: 2 #先將這裡改為0，k8s更新完後，這裡不用動
114   updatedReplicas: 2

4-強制刪除coredns pod沒有效果

1 [root@i-F998A4DE ~]# kubectl delete po coredns-fb8b8dccf-hhkfm --grace-period=0 --force -n kube-system 
2 warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
3 pod "coredns-fb8b8dccf-hhkfm" force deleted
4 [root@i-F998A4DE flannel-dashboard]# kubectl delete po coredns-fb8b8dccf-ll2mp --grace-period=0 --force -n kube-system 
5 warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
6 pod "coredns-fb8b8dccf-ll2mp" force deleted

5-檢視kubelet的訪問也是coredns出錯

1 [root@i-F998A4DE ~]# journalctl -f -u kubelet
2 -- Logs begin at Tue 2020-12-29 11:56:05 CST. --
3 Dec 30 11:30:38 master kubelet[20570]: W1230 11:30:38.307384   20570 container.go:409] Failed to create summary reader for "/libcontainer_16449_systemd_test_default.slice": none of the resources are being tracked.
4 Dec 30 11:30:40 master kubelet[20570]: E1230 11:30:40.356882   20570 pod_workers.go:190] Error syncing pod 2c0dffd5-4a4f-11eb-8c6b-d00df998a4de ("coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=coredns pod=coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"
5 Dec 30 11:30:41 master kubelet[20570]: E1230 11:30:41.375798   20570 pod_workers.go:190] Error syncing pod 2c0dffd5-4a4f-11eb-8c6b-d00df998a4de ("coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=coredns pod=coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"
6 Dec 30 11:30:45 master kubelet[20570]: E1230 11:30:45.899200   20570 pod_workers.go:190] Error syncing pod 1ed96c42-4a4f-11eb-8c6b-d00df998a4de ("coredns-fb8b8dccf-24sxn_kube-system(1ed96c42-4a4f-11eb-8c6b-d00df998a4de)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=coredns pod=coredns-fb8b8dccf-24sxn_kube-system(1ed96c42-4a4f-11eb-8c6b-d00df998a4de)"

6-本地dns配置也沒有什麼問題

1 [root@i-F998A4DE ~]# cat /etc/resolv.conf 
2 # Generated by NetworkManager
3 nameserver 114.114.114.114
4 nameserver 8.8.8.8

最後解決方案

這個問題很可能是由iptables規則的錯亂或者快取導致的，可以依次執行以下命令解決：

1 [root@master ~]# systemctl stop kubelet
2 [root@master ~]# systemctl stop docker
3 [root@master ~]# iptables --flush
4 [root@master ~]# iptables -tnat --flush
5 [root@master ~]# systemctl start kubelet
6 [root@master ~]# systemctl start docker

　　這裡順便解釋一下，遇到問題的伺服器是雲平臺伺服器，需要新增iptables的規則才能遠端通過ssh(22)、vnc(3389)、\\ip(445)使用這臺伺服器。如果在開了防火牆，並且沒有通過firewall-cmd命令設定訪問規則的情況下，清除iptables規則會導致遠端連線不上。但是如果是使用firewall-cmd命令設定的埠，iptables設定的規則即使清除了，對伺服器系統本身的防火牆也沒有影響。從Centos7以後，iptables服務的啟動指令碼已被忽略，會使用firewalld來取代iptables服務。在RHEL7裡，預設是使用firewalld來管理netfilter子系統，不過底層呼叫的命令仍然是iptables。firewalld是iptables的前端控制器，用於實現持久的網路流量規則。　　這是第一篇博文，有誤之處，歡迎指正！參考： https://www.cnblogs.com/sandshell/p/11752539.html

解決k8s的coredns一直處於的crashloopbackoff問題

首先來看看採坑記錄

1-檢視日誌：kubectl logs得到具體的報錯：

2-檢視pod具體資訊：kubectl describe pod得到一些可能比較沒用的資訊：

3-修改coredns配置資訊也沒有效果

4-強制刪除coredns pod沒有效果

5-檢視kubelet的訪問也是coredns出錯

6-本地dns配置也沒有什麼問題

最後解決方案

這個問題很可能是由iptables規則的錯亂或者快取導致的，可以依次執行以下命令解決：

解決k8s的coredns一直處於的crashloopbackoff問題

win10系統Cortana語音助手一直處於離線狀態怎麼回事

SQLSERVER 一直處於正在還原的狀態

kubeadm init 後master一直處於notready狀態

解決activiti6一直呼叫查詢sql問題

terminal命令 — 解決xcode一直不響應

k8s pod一直處於pending

Element UI ——table 動態新增資料滾動條一直處於最下方

spark配置雙master時一直處於standby的情況

PyQt5基礎學習-使得按鈕一直處於介面的下方

C/C++ sem_timedwait 一直阻塞的問題解決和分析

SQLServer資料庫處於恢復掛起狀態的解決辦法

MySQL執行緒處於Opening tables的問題解決方法

解決IDEA中maven匯入jar包一直報錯問題

解決Android studio用真機除錯時logcat一直輸出日誌問題

更新win10系統開啟AutoCAD一直閃退的解決方法

win10重置系統一直轉圈怎麼辦_win10重置系統一直轉圈的解決方法

Win10開機黑屏什麼都不顯示一直閃爍的解決方法

簡單幾步解決win10看視訊時選單欄一直顯示的問題

win10玩夢幻西遊一直掉線如何處理_解決win10玩夢幻西遊總掉線的辦法

解決k8s的coredns一直處於的crashloopbackoff問題

首先來看看採坑記錄

1-檢視日誌：kubectl logs得到具體的報錯：

2-檢視pod具體資訊：kubectl describe pod得到一些可能比較沒用的資訊：

3-修改coredns配置資訊也沒有效果

4-強制刪除coredns pod沒有效果

5-檢視kubelet的訪問也是coredns出錯

6-本地dns配置也沒有什麼問題

最後解決方案

這個問題很可能是由iptables規則的錯亂或者快取導致的，可以依次執行以下命令解決：

相關推薦