kubeadm HA master叢集master重置故障恢復_Kubernetes中文社群
文章楔子
對於一個具有HA master的叢集來說,發生單點故障通常不會影響叢集的正常執行,只要及時復原單點故障,就可以避免潛在的資料、狀態丟失。本文旨在指導讀者,在kubeadm搭建的HA master叢集中,某一master主機遭遇硬體更換、系統重置、k8s配置重置的情況下,應當如何恢復K8s HA master叢集。
前置需求
- 一個HA master的Kubernetes叢集,搭建過程可以參考kubeadm HA叢集搭建指南
單點重置恢復
故障重現
首先登陸到其中一臺master上,執行下面的命令以模擬單點重置的場景,隨後k8s HA master陷入單點故障。
kubeadm reset -f
故障恢復
本章節的全過程可以在此處檢視演示視訊:DEMO
首先在一臺健康的master上執行下面的命令獲取etcd叢集中故障member的ID
ETCD=`docker ps|grep etcd|grep -v POD|awk '{print $1}'` docker exec \ -it ${ETCD} \ etcdctl \ --endpoints https://127.0.0.1:2379 \ --ca-file /etc/kubernetes/pki/etcd/ca.crt \ --cert-file /etc/kubernetes/pki/etcd/peer.crt \ --key-file /etc/kubernetes/pki/etcd/peer.key \ cluster-health
本例中的故障member ID為19c5f5e4748dc98b,由於故障節點已經被重置,因此相當於該ID對應的ETCD例項已經丟失,無法再取得聯絡。因此直接執行下面命令將故障的member從etcd叢集中刪除。
ETCD=`docker ps|grep etcd|grep -v POD|awk '{print $1}'` dockerexec \ -it ${ETCD} \ etcdctl \ --endpoints https://127.0.0.1:2379 \ --ca-file /etc/kubernetes/pki/etcd/ca.crt \ --cert-file /etc/kubernetes/pki/etcd/server.crt \ --key-file /etc/kubernetes/pki/etcd/server.key \ member remove 19c5f5e4748dc98b
隨後將新的(重置過的)節點加入到叢集中,重新組成三節點的HA master,注意重建master的過程中使用了kubeadm的配置檔案,該配置檔案為HA master首次部署過程中使用過的,此處直接複用該配置檔案。本例中,檔案內容如下,可以參考,請注意其中一條配置可能需要根據叢集的現有狀態進行修改:
apiVersion: kubeadm.k8s.io/v1alpha2 kind: MasterConfiguration kubernetesVersion: v1.11.0 apiServerCertSANs: - 10.130.29.80 - 10.130.29.81 - 10.130.29.82 - centos-7-x86-64-29-80 - centos-7-x86-64-29-81 - centos-7-x86-64-29-82 - 10.130.29.83 kubeProxy: config: mode: ipvs etcd: local: extraArgs: listen-client-urls: https://127.0.0.1:2379,https://10.130.29.81:2379 advertise-client-urls: https://10.130.29.81:2379 listen-peer-urls: https://10.130.29.81:2380 initial-advertise-peer-urls: https://10.130.29.81:2380 # 注意此處需要修改,確保包括該重置節點在內的所有etcd節點的HOST=IP地址對都被列出在該配置中,不然新節點的etcd啟動失敗 initial-cluster: centos-7-x86-64-29-80=https://10.130.29.80:2380,centos-7-x86-64-29-81=https://10.130.29.81:2380,centos-7-x86-64-29-82=https://10.130.29.82:2380 initial-cluster-state: existing serverCertSANs: - centos-7-x86-64-29-81 - 10.130.29.81 peerCertSANs: - centos-7-x86-64-29-81 - 10.130.29.81 networking: # This CIDR is a Calico default. Substitute or remove for your CNI provider. podSubnet: 172.168.0.0/16
如果讀者是使用kubeadm HA叢集搭建指南該教程部署的,該檔案存放在各個master機器的/etc/kubernetes/kubeadm-config.yaml,否則請在下面的命令中修改所有的”/etc/kubernetes/kubeadm-config.yaml”至讀者被重置的master機器上的kubeadm-config.yaml檔案路徑。同時,需要將host變數和ip變數設為故障節點的hostname和IP地址
ETCD=`docker ps|grep etcd|grep -v POD|awk '{print $1}'` host=centos-7-x86-64-29-81 ip=10.130.29.81 bash -c """ ssh $host 'mkdir -p /etc/kubernetes/pki/etcd' scp /etc/kubernetes/pki/ca.crt $host:/etc/kubernetes/pki/ca.crt scp /etc/kubernetes/pki/ca.key $host:/etc/kubernetes/pki/ca.key scp /etc/kubernetes/pki/sa.key $host:/etc/kubernetes/pki/sa.key scp /etc/kubernetes/pki/sa.pub $host:/etc/kubernetes/pki/sa.pub scp /etc/kubernetes/pki/front-proxy-ca.crt $host:/etc/kubernetes/pki/front-proxy-ca.crt scp /etc/kubernetes/pki/front-proxy-ca.key $host:/etc/kubernetes/pki/front-proxy-ca.key scp /etc/kubernetes/pki/etcd/ca.crt $host:/etc/kubernetes/pki/etcd/ca.crt scp /etc/kubernetes/pki/etcd/ca.key $host:/etc/kubernetes/pki/etcd/ca.key scp /etc/kubernetes/admin.conf $host:/etc/kubernetes/admin.conf docker exec \ -it ${ETCD} \ etcdctl \ --ca-file /etc/kubernetes/pki/etcd/ca.crt \ --cert-file /etc/kubernetes/pki/etcd/peer.crt \ --key-file /etc/kubernetes/pki/etcd/peer.key \ --endpoints=https://127.0.0.1:2379 \ member add $host https://$ip:2380 ssh ${host} ' kubeadm alpha phase certs all --config /etc/kubernetes/kubeadm-config.yaml kubeadm alpha phase kubeconfig controller-manager --config /etc/kubernetes/kubeadm-config.yaml kubeadm alpha phase kubeconfig scheduler --config /etc/kubernetes/kubeadm-config.yaml kubeadm alpha phase kubelet config write-to-disk --config /etc/kubernetes/kubeadm-config.yaml kubeadm alpha phase kubelet write-env-file --config /etc/kubernetes/kubeadm-config.yaml kubeadm alpha phase kubeconfig kubelet --config /etc/kubernetes/kubeadm-config.yaml systemctl restart kubelet kubeadm alpha phase etcd local --config /etc/kubernetes/kubeadm-config.yaml kubeadm alpha phase kubeconfig all --config /etc/kubernetes/kubeadm-config.yaml kubeadm alpha phase controlplane all --config /etc/kubernetes/kubeadm-config.yaml kubeadm alpha phase mark-master --config /etc/kubernetes/kubeadm-config.yaml' """
至此,HA master單點重置故障恢復完畢。