kubeadm renew更新證書後,master依舊過期
背景
kubernetes: 1.16.3
master: 3臺
採用kubeadm部署,在證書還有30天以上時,使用kubeadm alpha certs renew all
更新所有證書,以為萬無一失,但是到原有的證書過期時間,發現API異常告警。
問題
api-server日誌:
E0721 08:09:28.129981 1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401 E0721 08:09:28.133091 1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401 E0721 08:09:28.133460 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: bad status from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: 401 E0721 08:09:28.135093 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: bad status from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: 401 E0721 08:09:28.139986 1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401 E0721 08:09:28.141188 1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401 E0721 08:09:28.143084 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: bad status from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: 401
解決思路
1.kubectl無法使用,顯示證書過期或者失效,隨即
$ kubectl get node
Unable to connect to the server: x509: certificate has expired or is not yet valid
2.隨即將/etc/kubernetes/admin.conf複製~/.kube/conf,執行命令後,還是返回步驟1的錯誤,隨即意識到是api-server的問題。
3.檢查pki木塊下的所有證書過期時間
$ for i in $(ls *.crt); do echo "===== $i ====="; openssl x509 -in $i -text -noout | grep -A 3 'Validity' ; done ===== apiserver.crt ===== Validity Not Before: Jul 21 08:08:33 2020 GMT Not After : Apr 14 05:58:54 2022 GMT Subject: CN=kube-apiserver ===== apiserver-kubelet-client.crt ===== Validity Not Before: Jul 21 08:08:33 2020 GMT Not After : Apr 14 06:00:03 2022 GMT Subject: O=system:masters, CN=kube-apiserver-kubelet-client ===== ca.crt ===== Validity Not Before: Jul 21 08:08:33 2020 GMT Not After : Jul 19 08:08:33 2030 GMT Subject: CN=kubernetes ===== front-proxy-ca.crt ===== Validity Not Before: Jul 21 08:08:34 2020 GMT Not After : Jul 19 08:08:34 2030 GMT Subject: CN=front-proxy-ca ===== front-proxy-client.crt ===== Validity Not Before: Jul 21 08:08:34 2020 GMT Not After : Apr 14 06:00:40 2022 GMT Subject: CN=front-proxy-client
4.pki下證書檢測均沒有過期,問題就很明顯,出在正在執行的container上,重啟kubelet無法解決問題
5.重啟kulet並不會重建container
systemctl stop kubelet docker ps -q | xargs docker stop df -Th | grep "docker" | awk '{print $NF}' | xargs umount df -Th | grep "kubelet" | awk '{print $NF}' | xargs umount docker ps -a -q | xargs docker rm systemctl restart kubelet
6.重啟後,發現除了主控制節點(kubeadm init的第一臺伺服器),其餘master節點都恢復正常
7.檢視報錯資訊,主控制節點上的kubelet無法啟動
Jul 21 17:53:03 master001 kubelet[23047]: E0721 17:53:03.545713 23047 bootstrap.go:250] unable to load TLS configuration from existing bootstrap client config: tls: private key does not match public key
Jul 21 17:53:03 master001 kubelet[23047]: F0721 17:53:03.545749 23047 server.go:271] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
8.對比其他叢集,發現均沒有/etc/kubernetes/bootstrap-kubelet.conf檔案
9.隨即針對kubelet進行檢查
$ cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
kubelet在啟動時,會優先讀取kubelet.conf,當kubeleyt不可用時,才讀取bootstrap-kubelet.conf
10.對比住控制節點上的kubelet.conf發現
主控制節點
users:
- name: system:node:master001
user:
client-certificate-data: *****
client-key-data: *****
***** 為hash值
其他主控節點
users:
- name: default-auth
user:
client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
可以看到直接讀取的檔案,在首個master節點初始化叢集時,kubelet還沒加入叢集,也就沒法生成這個檔案,所以直接用的hash值,我們只需要手動更改即可。
11.重啟首臺master,叢集全部恢復正常
總結
1.kubeadm renew在更新證書後,api-server,controller,schedule並不會過載證書檔案,需要重建container。
2.主控制節點的kubelet.conf中使用的hash值,並不是檔案,而且kubeadm renew也不會更新這個檔案,所以需要手動操作。