使用Kubeadm安裝Kubernetes1.5版本_Kubernetes中文社群
在《當Docker遇到systemd》一文中,我提到過這兩天兒一直在做的一個task:使用kubeadm在Ubuntu 16.04上安裝部署Kubernetes的最新發布版本-k8s 1.5.1。
年中,Docker宣佈在Docker engine中整合swarmkit工具包,這一announcement在輕量級容器界引發軒然大波。畢竟開發者是懶惰的^0^,有了docker swarmkit,驅動developer去安裝其他容器編排工具的動力在哪裡呢?即便docker engine還不是當年那個被人們高頻使用的IE瀏覽器。作為針對Docker公司這一市場行為的迴應,容器叢集管理和服務編排領先者Kubernetes在三個月後釋出了Kubernetes1.4.0版本。在這個版本中K8s新增了kubeadm工具。kubeadm的使用方式有點像整合在docker engine中的swarm kit工具,旨在改善開發者在安裝、除錯和使用k8s時的體驗,降低安裝和使用門檻。理論上通過兩個命令:init和join即可搭建出一套完整的Kubernetes cluster。
不過,和初入docker引擎的swarmkit一樣,kubeadm目前也在active development中,也不是那麼stable,因此即便在當前最新的k8s 1.5.1版本中,它仍然處於Alpha狀態,官方不建議在Production環境下使用。每次執行kubeadm init時,它都會列印如下提醒日誌:
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
不過由於之前部署的k8s 1.3.7叢集執行良好,這給了我們在k8s這條路上繼續走下去並走好的信心。但k8s在部署和管理方面的體驗的確是太繁瑣了,於是我們準備試驗一下kubeadm是否能帶給我們超出預期的體驗。之前在aliyun ubuntu 14.04上安裝kubernetes 1.3.7的經驗和教訓,讓我略微有那麼一丟丟底氣,但實際安裝過程依舊是一波三折。這既與kubeadm的unstable有關,同樣也與cni、第三方網路add-ons的質量有關。無論哪一方出現問題都會讓你的install過程異常坎坷曲折。
一、環境與約束
在kubeadm支援的Ubuntu 16.04+, CentOS 7 or HypriotOS v1.0.1+三種作業系統中,我們選擇了Ubuntu 16.04。由於阿里雲尚無官方16.04 Image可用,我們新開了兩個Ubuntu 14.04ECS例項,並通過apt-get命令手工將其升級到Ubuntu 16.04.1,詳細版本是:Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-58-generic x86_64)。
Ubuntu 16.04使用了systemd作為init system,在安裝和配置Docker時,可以參考我的這篇《當Docker遇到system》。Docker版本我選擇了目前可以得到的lastest stable release: 1.12.5。
# docker version Client: Version: 1.12.5 API version: 1.24 Go version: go1.6.4 Git commit: 7392c3b Built: Fri Dec 16 02:42:17 2016 OS/Arch: linux/amd64 Server: Version: 1.12.5 API version: 1.24 Go version: go1.6.4 Git commit: 7392c3b Built: Fri Dec 16 02:42:17 2016 OS/Arch: linux/amd64
至於Kubernetes版本,前面已經提到過了,我們就使用最新發布的Kubernetes 1.5.1版本。1.5.1是1.5.0的一個緊急fix版本,主要”to address default flag values which in isolation were not problematic, but in concert could result in an insecure cluster”。官方建議skip 1.5.0,直接用1.5.1。
這裡再重申一下:Kubernetes的安裝、配置和調通是很難的,在阿里雲上調通就更難了,有時還需要些運氣。Kubernetes、Docker、cni以及各種網路Add-ons都在active development中,也許今天還好用的step、tip和trick,明天就out-dated,因此在借鑑本文的操作步驟時,請謹記這些^0^。
二、安裝包準備
我們這次新開了兩個ECS例項,一個作為master node,一個作為minion node。Kubeadm預設安裝時,master node將不會參與Pod排程,不會承載work load,即不會有非核心元件的Pod在Master node上被創建出來。當然通過kubectl taint命令可以解除這一限制,不過這是後話了。
叢集拓撲:
master node:10.47.217.91,主機名:iZ25beglnhtZ minion node:10.28.61.30,主機名:iZ2ze39jeyizepdxhwqci6Z
本次安裝的主參考文件就是Kubernetes官方的那篇《Installing Kubernetes on Linux with kubeadm》。
本小節,我們將進行安裝包準備,即將kubeadm以及此次安裝所需要的k8s核心元件統統下載到上述兩個Node上。注意:如果你有加速器,那麼本節下面的安裝過程將尤為順利,反之,…
。以下命令,在兩個Node上均要執行。
1、新增apt-key
# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - OK
2、新增Kubernetes源並更新包資訊
新增Kubernetes源到sources.list.d目錄下:
# cat <<EOF > /etc/apt/sources.list.d/kubernetes.list deb http://apt.kubernetes.io/ kubernetes-xenial main EOF # cat /etc/apt/sources.list.d/kubernetes.list deb http://apt.kubernetes.io/ kubernetes-xenial main
更新包資訊:
# apt-get update ... ... Hit:2 http://mirrors.aliyun.com/ubuntu xenial InRelease Hit:3 https://apt.dockerproject.org/repo ubuntu-xenial InRelease Get:4 http://mirrors.aliyun.com/ubuntu xenial-security InRelease [102 kB] Get:1 https://packages.cloud.google.com/apt kubernetes-xenial InRelease [6,299 B] Get:5 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 Packages [1,739 B] Get:6 http://mirrors.aliyun.com/ubuntu xenial-updates InRelease [102 kB] Get:7 http://mirrors.aliyun.com/ubuntu xenial-proposed InRelease [253 kB] Get:8 http://mirrors.aliyun.com/ubuntu xenial-backports InRelease [102 kB] Fetched 568 kB in 19s (28.4 kB/s) Reading package lists... Done
3、下載Kubernetes核心元件
在此次安裝中,我們通過apt-get就可以下載Kubernetes的核心元件,包括kubelet、kubeadm、kubectl和kubernetes-cni等。
# apt-get install -y kubelet kubeadm kubectl kubernetes-cni Reading package lists... Done Building dependency tree Reading state information... Done The following package was automatically installed and is no longer required: libtimedate-perl Use 'apt autoremove' to remove it. The following additional packages will be installed: ebtables ethtool socat The following NEW packages will be installed: ebtables ethtool kubeadm kubectl kubelet kubernetes-cni socat 0 upgraded, 7 newly installed, 0 to remove and 0 not upgraded. Need to get 37.6 MB of archives. After this operation, 261 MB of additional disk space will be used. Get:2 http://mirrors.aliyun.com/ubuntu xenial/main amd64 ebtables amd64 2.0.10.4-3.4ubuntu1 [79.6 kB] Get:6 http://mirrors.aliyun.com/ubuntu xenial/main amd64 ethtool amd64 1:4.5-1 [97.5 kB] Get:7 http://mirrors.aliyun.com/ubuntu xenial/universe amd64 socat amd64 1.7.3.1-1 [321 kB] Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubernetes-cni amd64 0.3.0.1-07a8a2-00 [6,877 kB] Get:3 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubelet amd64 1.5.1-00 [15.1 MB] Get:4 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubectl amd64 1.5.1-00 [7,954 kB] Get:5 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.6.0-alpha.0-2074-a092d8e0f95f52-00 [7,120 kB] Fetched 37.6 MB in 36s (1,026 kB/s) ... ... Unpacking kubeadm (1.6.0-alpha.0-2074-a092d8e0f95f52-00) ... Processing triggers for systemd (229-4ubuntu13) ... Processing triggers for ureadahead (0.100.0-19) ... Processing triggers for man-db (2.7.5-1) ... Setting up ebtables (2.0.10.4-3.4ubuntu1) ... update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults Setting up ethtool (1:4.5-1) ... Setting up kubernetes-cni (0.3.0.1-07a8a2-00) ... Setting up socat (1.7.3.1-1) ... Setting up kubelet (1.5.1-00) ... Setting up kubectl (1.5.1-00) ... Setting up kubeadm (1.6.0-alpha.0-2074-a092d8e0f95f52-00) ... Processing triggers for systemd (229-4ubuntu13) ... Processing triggers for ureadahead (0.100.0-19) ... ... ...
下載後的kube元件並未自動執行起來。在 /lib/systemd/system下面我們能看到kubelet.service:
# ls /lib/systemd/system|grep kube kubelet.service //kubelet.service [Unit] Description=kubelet: The Kubernetes Node Agent Documentation=http://kubernetes.io/docs/ [Service] ExecStart=/usr/bin/kubelet Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target
kubelet的版本:
# kubelet --version Kubernetes v1.5.1
k8s的核心元件都有了,接下來我們就要boostrap kubernetes cluster了。同時,問題也就隨之而來了,而這些問題以及問題的解決才是本篇要說明的重點。
三、初始化叢集
前面說過,理論上通過kubeadm使用init和join命令即可建立一個叢集,這init就是在master節點對叢集進行初始化。和k8s 1.4之前的部署方式不同的是,kubeadm安裝的k8s核心元件都是以容器的形式運行於master node上的。因此在kubeadm init之前,最好給master node上的docker engine掛上加速器代理,因為kubeadm要從gcr.io/google_containers repository中pull許多核心元件的images,大約有如下一些:
gcr.io/google_containers/kube-controller-manager-amd64 v1.5.1 cd5684031720 2 weeks ago 102.4 MB gcr.io/google_containers/kube-apiserver-amd64 v1.5.1 8c12509df629 2 weeks ago 124.1 MB gcr.io/google_containers/kube-proxy-amd64 v1.5.1 71d2b27b03f6 2 weeks ago 175.6 MB gcr.io/google_containers/kube-scheduler-amd64 v1.5.1 6506e7b74dac 2 weeks ago 53.97 MB gcr.io/google_containers/etcd-amd64 3.0.14-kubeadm 856e39ac7be3 5 weeks ago 174.9 MB gcr.io/google_containers/kubedns-amd64 1.9 26cf1ed9b144 5 weeks ago 47 MB gcr.io/google_containers/dnsmasq-metrics-amd64 1.0 5271aabced07 7 weeks ago 14 MB gcr.io/google_containers/kube-dnsmasq-amd64 1.4 3ec65756a89b 3 months ago 5.13 MB gcr.io/google_containers/kube-discovery-amd64 1.0 c5e0c9a457fc 3 months ago 134.2 MB gcr.io/google_containers/exechealthz-amd64 1.2 93a43bfb39bf 3 months ago 8.375 MB gcr.io/google_containers/pause-amd64 3.0 99e59f495ffa 7 months ago 746.9 kB
在Kubeadm的文件中,Pod Network的安裝是作為一個單獨的步驟的。kubeadm init並沒有為你選擇一個預設的Pod network進行安裝。我們將首選Flannel 作為我們的Pod network,這不僅是因為我們的上一個叢集用的就是flannel,而且表現穩定。更是由於Flannel就是coreos為k8s打造的專屬overlay network add-ons。甚至於flannel repository的readme.md都這樣寫著:“flannel is a network fabric for containers, designed for Kubernetes”。如果我們要使用Flannel,那麼在執行init時,按照kubeadm文件要求,我們必須給init命令帶上option:–pod-network-cidr=10.244.0.0/16。
1、執行kubeadm init
執行kubeadm init命令:
# kubeadm init --pod-network-cidr=10.244.0.0/16 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. [preflight] Running pre-flight checks [preflight] Starting the kubelet service [init] Using Kubernetes version: v1.5.1 [tokens] Generated token: "2e7da9.7fc5668ff26430c7" [certificates] Generated Certificate Authority key and certificate. [certificates] Generated API Server key and certificate [certificates] Generated Service Account signing keys [certificates] Created keys and certificates in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf" [apiclient] Created API client, waiting for the control plane to become ready //如果沒有掛加速器,可能會在這裡hang住。 [apiclient] All control plane components are healthy after 54.789750 seconds [apiclient] Waiting for at least one node to register and become ready [apiclient] First node is ready after 1.003053 seconds [apiclient] Creating a test deployment [apiclient] Test deployment succeeded [token-discovery] Created the kube-discovery deployment, waiting for it to become ready [token-discovery] kube-discovery is ready after 62.503441 seconds [addons] Created essential addon: kube-proxy [addons] Created essential addon: kube-dns Your Kubernetes master has initialized successfully! You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: http://kubernetes.io/docs/admin/addons/ You can now join any number of machines by running the following on each node: kubeadm join --token=2e7da9.7fc5668ff26430c7 123.56.200.187
init成功後的master node有啥變化?k8s的核心元件均正常啟動:
# ps -ef|grep kube root 2477 2461 1 16:36 ? 00:00:04 kube-proxy --kubeconfig=/run/kubeconfig root 30860 1 12 16:33 ? 00:01:09 /usr/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=10.96.0.10 --cluster-domain=cluster.local root 30952 30933 0 16:33 ? 00:00:01 kube-scheduler --address=127.0.0.1 --leader-elect --master=127.0.0.1:8080 root 31128 31103 2 16:33 ? 00:00:11 kube-controller-manager --address=127.0.0.1 --leader-elect --master=127.0.0.1:8080 --cluster-name=kubernetes --root-ca-file=/etc/kubernetes/pki/ca.pem --service-account-private-key-file=/etc/kubernetes/pki/apiserver-key.pem --cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem --cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem --insecure-experimental-approve-all-kubelet-csrs-for-group=system:kubelet-bootstrap --allocate-node-cidrs=true --cluster-cidr=10.244.0.0/16 root 31223 31207 2 16:34 ? 00:00:10 kube-apiserver --insecure-bind-address=127.0.0.1 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota --service-cluster-ip-range=10.96.0.0/12 --service-account-key-file=/etc/kubernetes/pki/apiserver-key.pem --client-ca-file=/etc/kubernetes/pki/ca.pem --tls-cert-file=/etc/kubernetes/pki/apiserver.pem --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem --token-auth-file=/etc/kubernetes/pki/tokens.csv --secure-port=6443 --allow-privileged --advertise-address=123.56.200.187 --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --anonymous-auth=false --etcd-servers=http://127.0.0.1:2379 root 31491 31475 0 16:35 ? 00:00:00 /usr/local/bin/kube-discovery
而且是多以container的形式啟動:
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c16c442b7eca gcr.io/google_containers/kube-proxy-amd64:v1.5.1 "kube-proxy --kubecon" 6 minutes ago Up 6 minutes k8s_kube-proxy.36dab4e8_kube-proxy-sb4sm_kube-system_43fb1a2c-cb46-11e6-ad8f-00163e1001d7_2ba1648e 9f73998e01d7 gcr.io/google_containers/kube-discovery-amd64:1.0 "/usr/local/bin/kube-" 8 minutes ago Up 8 minutes k8s_kube-discovery.7130cb0a_kube-discovery-1769846148-6z5pw_kube-system_1eb97044-cb46-11e6-ad8f-00163e1001d7_fd49c2e3 dd5412e5e15c gcr.io/google_containers/kube-apiserver-amd64:v1.5.1 "kube-apiserver --ins" 9 minutes ago Up 9 minutes k8s_kube-apiserver.1c5a91d9_kube-apiserver-iz25beglnhtz_kube-system_eea8df1717e9fea18d266103f9edfac3_8cae8485 60017f8819b2 gcr.io/google_containers/etcd-amd64:3.0.14-kubeadm "etcd --listen-client" 9 minutes ago Up 9 minutes k8s_etcd.c323986f_etcd-iz25beglnhtz_kube-system_3a26566bb004c61cd05382212e3f978f_06d517eb 03c2463aba9c gcr.io/google_containers/kube-controller-manager-amd64:v1.5.1 "kube-controller-mana" 9 minutes ago Up 9 minutes k8s_kube-controller-manager.d30350e1_kube-controller-manager-iz25beglnhtz_kube-system_9a40791dd1642ea35c8d95c9e610e6c1_3b05cb8a fb9a724540a7 gcr.io/google_containers/kube-scheduler-amd64:v1.5.1 "kube-scheduler --add" 9 minutes ago Up 9 minutes k8s_kube-scheduler.ef325714_kube-scheduler-iz25beglnhtz_kube-system_dc58861a0991f940b0834f8a110815cb_9b3ccda2 .... ...
不過這些核心元件並不是跑在pod network中的(沒錯,此時的pod network還沒有建立),而是採用了host network。以kube-apiserver的pod資訊為例:
kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 1h 10.47.217.91 iz25beglnhtz
kube-apiserver的IP是host ip,從而推斷容器使用的是host網路,這從其對應的pause容器的network屬性就可以看出:
# docker ps |grep apiserver a5a76bc59e38 gcr.io/google_containers/kube-apiserver-amd64:v1.5.1 "kube-apiserver --ins" About an hour ago Up About an hour k8s_kube-apiserver.2529402_kube-apiserver-iz25beglnhtz_kube-system_25d646be9a0092138dc6088fae6f1656_ec0079fc ef4d3bf057a6 gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD.d8dbe16c_kube-apiserver-iz25beglnhtz_kube-system_25d646be9a0092138dc6088fae6f1656_bbfd8a31
inspect pause容器,可以看到pause container的NetworkMode的值:
"NetworkMode": "host",
如果kubeadm init執行過程中途出現了什麼問題,比如前期忘記掛加速器導致init hang住,你可能會ctrl+c退出init執行。重新配置後,再執行kubeadm init,這時你可能會遇到下面kubeadm的輸出:
# kubeadm init --pod-network-cidr=10.244.0.0/16 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. [preflight] Running pre-flight checks [preflight] Some fatal errors occurred: Port 10250 is in use /etc/kubernetes/manifests is not empty /etc/kubernetes/pki is not empty /var/lib/kubelet is not empty /etc/kubernetes/admin.conf already exists /etc/kubernetes/kubelet.conf already exists [preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`
kubeadm會自動檢查當前環境是否有上次命令執行的“殘留”。如果有,必須清理後再行執行init。我們可以通過”kubeadm reset”來清理環境,以備重來。
# kubeadm reset [preflight] Running pre-flight checks [reset] Draining node: "iz25beglnhtz" [reset] Removing node: "iz25beglnhtz" [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" [reset] Removing kubernetes-managed containers [reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/etcd] [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf]
2、安裝flannel pod網路
kubeadm init之後,如果你探索一下當前cluster的狀態或者核心元件的日誌,你會發現某些“異常”,比如:從kubelet的日誌中我們可以看到一直刷屏的錯誤資訊:
Dec 26 16:36:48 iZ25beglnhtZ kubelet[30860]: E1226 16:36:48.365885 30860 docker_manager.go:2201] Failed to setup network for pod "kube-dns-2924299975-pddz5_kube-system(43fd7264-cb46-11e6-ad8f-00163e1001d7)" using network plugins "cni": cni config unintialized; Skipping pod
通過命令kubectl get pod –all-namespaces -o wide,你也會發現kube-dns pod處於ContainerCreating狀態。
這些都不打緊,因為我們還沒有為cluster安裝Pod network呢。前面說過,我們要使用Flannel網路,因此我們需要執行如下安裝命令:
#kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml configmap "kube-flannel-cfg" created daemonset "kube-flannel-ds" created
稍等片刻,我們再來看master node上的cluster資訊:
# ps -ef|grep kube|grep flannel root 6517 6501 0 17:20 ? 00:00:00 /opt/bin/flanneld --ip-masq --kube-subnet-mgr root 6573 6546 0 17:20 ? 00:00:00 /bin/sh -c set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done # kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system dummy-2088944543-s0c5g 1/1 Running 0 50m kube-system etcd-iz25beglnhtz 1/1 Running 0 50m kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 50m kube-system kube-controller-manager-iz25beglnhtz 1/1 Running 0 50m kube-system kube-discovery-1769846148-6z5pw 1/1 Running 0 50m kube-system kube-dns-2924299975-pddz5 4/4 Running 0 49m kube-system kube-flannel-ds-5ww9k 2/2 Running 0 4m kube-system kube-proxy-sb4sm 1/1 Running 0 49m kube-system kube-scheduler-iz25beglnhtz 1/1 Running 0 49m
至少叢集的核心元件已經全部run起來了。看起來似乎是成功了。
3、minion node:join the cluster
接下來,就該minion node加入cluster了。這裡我們用到了kubeadm的第二個命令:kubeadm join。
在minion node上執行(注意:這裡要保證master node的9898埠在防火牆是開啟的):
# kubeadm join --token=2e7da9.7fc5668ff26430c7 123.56.200.187 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. [preflight] Running pre-flight checks [tokens] Validating provided token [discovery] Created cluster info discovery client, requesting info from "http://123.56.200.187:9898/cluster-info/v1/?token-id=2e7da9" [discovery] Cluster info object received, verifying signature using given token [discovery] Cluster info signature and contents are valid, will use API endpoints [https://123.56.200.187:6443] [bootstrap] Trying to connect to endpoint https://123.56.200.187:6443 [bootstrap] Detected server version: v1.5.1 [bootstrap] Successfully established connection with endpoint "https://123.56.200.187:6443" [csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request [csr] Received signed certificate from the API server: Issuer: CN=kubernetes | Subject: CN=system:node:iZ2ze39jeyizepdxhwqci6Z | CA: false Not before: 2016-12-26 09:31:00 +0000 UTC Not After: 2017-12-26 09:31:00 +0000 UTC [csr] Generating kubelet configuration [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf" Node join complete: * Certificate signing request sent to master and response received. * Kubelet informed of new secure connection details. Run 'kubectl get nodes' on the master to see this machine join.
也很順利。我們在minion node上看到的k8s元件情況如下:
d85cf36c18ed gcr.io/google_containers/kube-proxy-amd64:v1.5.1 "kube-proxy --kubecon" About an hour ago Up About an hour k8s_kube-proxy.36dab4e8_kube-proxy-lsn0t_kube-system_b8eddf1c-cb4e-11e6-ad8f-00163e1001d7_5826f32b a60e373b48b8 gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD.d8dbe16c_kube-proxy-lsn0t_kube-system_b8eddf1c-cb4e-11e6-ad8f-00163e1001d7_46bfcf67 a665145eb2b5 quay.io/coreos/flannel-git:v0.6.1-28-g5dde68d-amd64 "/bin/sh -c 'set -e -" About an hour ago Up About an hour k8s_install-cni.17d8cf2_kube-flannel-ds-tr8zr_kube-system_06eca729-cb72-11e6-ad8f-00163e1001d7_01e12f61 5b46f2cb0ccf gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD.d8dbe16c_kube-flannel-ds-tr8zr_kube-system_06eca729-cb72-11e6-ad8f-00163e1001d7_ac880d20
我們在master node上檢視當前cluster狀態:
# kubectl get nodes NAME STATUS AGE iz25beglnhtz Ready,master 1h iz2ze39jeyizepdxhwqci6z Ready 21s
k8s cluster建立”成功”!真的成功了嗎?“折騰”才剛剛開始:(!
三、Flannel Pod Network問題
Join成功所帶來的“餘溫”還未散去,我就發現了Flannel pod network的問題,troubleshooting正式開始:(。
1、minion node上的flannel時不時地報錯
剛join時還好好的,可過了沒一會兒,我們就發現在kubectl get pod –all-namespaces中有錯誤出現:
kube-system kube-flannel-ds-tr8zr 1/2 CrashLoopBackOff 189 16h
我們發現這是minion node上的flannel pod中的一個container出錯導致的,跟蹤到的具體錯誤如下:
# docker logs bc0058a15969 E1227 06:17:50.605110 1 main.go:127] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-tr8zr': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-tr8zr: dial tcp 10.96.0.1:443: i/o timeout
10.96.0.1是pod network中apiserver service的cluster ip,而minion node上的flannel元件居然無法訪問到這個cluster ip!這個問題的奇怪之處還在於,有些時候這個Pod在被排程restart N多次後或者被刪除重啟後,又突然變為running狀態了,行為十分怪異。
在flannel github.com issues中,至少有兩個open issue與此問題有密切關係:
https://github.com/coreos/flannel/issues/545
https://github.com/coreos/flannel/issues/535
這個問題暫無明確解。當minion node上的flannel pod自恢復為running狀態時,我們又可以繼續了。
2、minion node上flannel pod啟動失敗的一個應對方法
在下面issue中,很多developer討論了minion node上flannel pod啟動失敗的一種可能原因以及臨時應對方法:
https://github.com/kubernetes/kubernetes/issues/34101
這種說法大致就是minion node上的kube-proxy使用了錯誤的interface,通過下面方法可以fix這個問題。在minion node上執行:
# kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--cluster-cidr=10.244.0.0/16"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy' daemonset "kube-proxy" configured pod "kube-proxy-lsn0t" deleted pod "kube-proxy-sb4sm" deleted
執行後,flannel pod的狀態:
kube-system kube-flannel-ds-qw291 2/2 Running 8 17h kube-system kube-flannel-ds-x818z 2/2 Running 17 1h
經過17次restart,minion node上的flannel pod 啟動ok了。其對應的flannel container啟動日誌如下:
# docker logs 1f64bd9c0386 I1227 07:43:26.670620 1 main.go:132] Installing signal handlers I1227 07:43:26.671006 1 manager.go:133] Determining IP address of default interface I1227 07:43:26.670825 1 kube.go:233] starting kube subnet manager I1227 07:43:26.671514 1 manager.go:163] Using 59.110.67.15 as external interface I1227 07:43:26.671575 1 manager.go:164] Using 59.110.67.15 as external endpoint I1227 07:43:26.746811 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1227 07:43:26.749785 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1227 07:43:26.752343 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1227 07:43:26.755126 1 manager.go:246] Lease acquired: 10.244.1.0/24 I1227 07:43:26.755444 1 network.go:58] Watching for L3 misses I1227 07:43:26.755475 1 network.go:66] Watching for new subnet leases I1227 07:43:27.755830 1 network.go:153] Handling initial subnet events I1227 07:43:27.755905 1 device.go:163] calling GetL2List() dev.link.Index: 10 I1227 07:43:27.756099 1 device.go:168] calling NeighAdd: 123.56.200.187, ca:68:7c:9b:cc:67
issue中說到,在kubeadm init時,顯式地指定–advertise-address將會避免這個問題。不過目前不要在–advertise-address後面寫上多個IP,雖然文件上說是支援的,但實際情況是,當你顯式指定–advertise-address的值為兩個或兩個以上IP時,比如下面這樣:
#kubeadm init --api-advertise-addresses=10.47.217.91,123.56.200.187 --pod-network-cidr=10.244.0.0/16
master初始化成功後,當minion node執行join cluster命令時,會panic掉:
# kubeadm join --token=92e977.f1d4d090906fc06a 10.47.217.91 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. ... ... [bootstrap] Successfully established connection with endpoint "https://10.47.217.91:6443" [bootstrap] Successfully established connection with endpoint "https://123.56.200.187:6443" E1228 10:14:05.405294 28378 runtime.go:64] Observed a panic: "close of closed channel" (close of closed channel) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:70 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:63 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:49 /usr/local/go/src/runtime/asm_amd64.s:479 /usr/local/go/src/runtime/panic.go:458 /usr/local/go/src/runtime/chan.go:311 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:85 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:96 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:97 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:52 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:93 /usr/local/go/src/runtime/asm_amd64.s:2086 [csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request panic: close of closed channel [recovered] panic: close of closed channel goroutine 29 [running]: panic(0x1342de0, 0xc4203eebf0) /usr/local/go/src/runtime/panic.go:500 +0x1a1 k8s.io/kubernetes/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:56 +0x126 panic(0x1342de0, 0xc4203eebf0) /usr/local/go/src/runtime/panic.go:458 +0x243 k8s.io/kubernetes/cmd/kubeadm/app/node.EstablishMasterConnection.func1.1() /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:85 +0x29d k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc420563ee0) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:96 +0x5e k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc420563ee0, 0x12a05f200, 0x0, 0xc420022e01, 0xc4202c2060) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:97 +0xad k8s.io/kubernetes/pkg/util/wait.Until(0xc420563ee0, 0x12a05f200, 0xc4202c2060) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:52 +0x4d k8s.io/kubernetes/cmd/kubeadm/app/node.EstablishMasterConnection.func1(0xc4203a82f0, 0xc420269b90, 0xc4202c2060, 0xc4202c20c0, 0xc4203d8d80, 0x401, 0x480, 0xc4201e75e0, 0x17, 0xc4201e7560, ...) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:93 +0x100 created by k8s.io/kubernetes/cmd/kubeadm/app/node.EstablishMasterConnection /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:94 +0x3ed
關於join panic這個問題,在這個issue中有詳細討論:https://github.com/kubernetes/kubernetes/issues/36988
3、open /run/flannel/subnet.env: no such file or directory
前面說過,預設情況下,考慮安全原因,master node是不承擔work load的,不參與pod排程。我們這裡機器少,只能讓master node也辛苦一下。通過下面這個命令可以讓master node也參與pod排程:
# kubectl taint nodes --all dedicated- node "iz25beglnhtz" tainted
接下來,我們create一個deployment,manifest描述檔案如下:
//run-my-nginx.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-nginx spec: replicas: 2 template: metadata: labels: run: my-nginx spec: containers: - name: my-nginx image: nginx:1.10.1 ports: - containerPort: 80
create後,我們發現排程到master上的my-nginx pod啟動是ok的,但minion node上的pod則一直失敗,檢視到的失敗原因如下:
Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 28s 28s 1 {default-scheduler } Normal Scheduled Successfully assigned my-nginx-2560993602-0440x to iz2ze39jeyizepdxhwqci6z 27s 1s 26 {kubelet iz2ze39jeyizepdxhwqci6z} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-2560993602-0440x_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-2560993602-0440x_default(ba5ce554-cbf1-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": open /run/flannel/subnet.env: no such file or directory; Skipping pod"
在minion node上的確沒有找到/run/flannel/subnet.env該檔案。但master node上有這個檔案:
// /run/flannel/subnet.env FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.0.1/24 FLANNEL_MTU=1450 FLANNEL_IPMASQ=true
於是手動在minion node上建立一份/run/flannel/subnet.env,並複製master node同名檔案的內容,儲存。稍許片刻,minion node上的my-nginx pod從error變成running了。
4、no IP addresses available in network: cbr0
將之前的一個my-nginx deployment的replicas改為3,並建立基於該deployment中pods的my-nginx service:
//my-nginx-svc.yaml apiVersion: v1 kind: Service metadata: name: my-nginx labels: run: my-nginx spec: type: NodePort ports: - port: 80 nodePort: 30062 protocol: TCP selector: run: my-nginx
修改後,通過curl localhost:30062測試服務連通性。發現通過VIP負載均衡到master node上的my-nginx pod的request都成功得到了Response,但是負載均衡到minion node上pod的request,則阻塞在那裡,直到timeout。檢視pod資訊才發現,原來新排程到minion node上的my-nginx pod並沒有啟動ok,錯誤原因如下:
Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned my-nginx-1948696469-ph11m to iz2ze39jeyizepdxhwqci6z 2m 0s 177 {kubelet iz2ze39jeyizepdxhwqci6z} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-1948696469-ph11m_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-1948696469-ph11m_default(3700d74a-cc12-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": no IP addresses available in network: cbr0; Skipping pod"
檢視minion node上/var/lib/cni/networks/cbr0目錄,發現該目錄下有如下檔案:
10.244.1.10 10.244.1.12 10.244.1.14 10.244.1.16 10.244.1.18 10.244.1.2 10.244.1.219 10.244.1.239 10.244.1.3 10.244.1.5 10.244.1.7 10.244.1.9 10.244.1.100 10.244.1.120 10.244.1.140 10.244.1.160 10.244.1.180 10.244.1.20 10.244.1.22 10.244.1.24 10.244.1.30 10.244.1.50 10.244.1.70 10.244.1.90 10.244.1.101 10.244.1.121 10.244.1.141 10.244.1.161 10.244.1.187 10.244.1.200 10.244.1.220 10.244.1.240 10.244.1.31 10.244.1.51 10.244.1.71 10.244.1.91 10.244.1.102 10.244.1.122 10.244.1.142 10.244.1.162 10.244.1.182 10.244.1.201 10.244.1.221 10.244.1.241 10.244.1.32 10.244.1.52 10.244.1.72 10.244.1.92 10.244.1.103 10.244.1.123 10.244.1.143 10.244.1.163 10.244.1.183 10.244.1.202 10.244.1.222 10.244.1.242 10.244.1.33 10.244.1.53 10.244.1.73 10.244.1.93 10.244.1.104 10.244.1.124 10.244.1.144 10.244.1.164 10.244.1.184 10.244.1.203 10.244.1.223 10.244.1.243 10.244.1.34 10.244.1.54 10.244.1.74 10.244.1.94 10.244.1.105 10.244.1.125 10.244.1.145 10.244.1.165 10.244.1.185 10.244.1.204 10.244.1.224 10.244.1.244 10.244.1.35 10.244.1.55 10.244.1.75 10.244.1.95 10.244.1.106 10.244.1.126 10.244.1.146 10.244.1.166 10.244.1.186 10.244.1.205 10.244.1.225 10.244.1.245 10.244.1.36 10.244.1.56 10.244.1.76 10.244.1.96 10.244.1.107 10.244.1.127 10.244.1.147 10.244.1.167 10.244.1.187 10.244.1.206 10.244.1.226 10.244.1.246 10.244.1.37 10.244.1.57 10.244.1.77 10.244.1.97 10.244.1.108 10.244.1.128 10.244.1.148 10.244.1.168 10.244.1.188 10.244.1.207 10.244.1.227 10.244.1.247 10.244.1.38 10.244.1.58 10.244.1.78 10.244.1.98 10.244.1.109 10.244.1.129 10.244.1.149 10.244.1.169 10.244.1.189 10.244.1.208 10.244.1.228 10.244.1.248 10.244.1.39 10.244.1.59 10.244.1.79 10.244.1.99 10.244.1.11 10.244.1.13 10.244.1.15 10.244.1.17 10.244.1.19 10.244.1.209 10.244.1.229 10.244.1.249 10.244.1.4 10.244.1.6 10.244.1.8 last_reserved_ip 10.244.1.110 10.244.1.130 10.244.1.150 10.244.1.170 10.244.1.190 10.244.1.21 10.244.1.23 10.244.1.25 10.244.1.40 10.244.1.60 10.244.1.80 10.244.1.111 10.244.1.131 10.244.1.151 10.244.1.171 10.244.1.191 10.244.1.210 10.244.1.230 10.244.1.250 10.244.1.41 10.244.1.61 10.244.1.81 10.244.1.112 10.244.1.132 10.244.1.152 10.244.1.172 10.244.1.192 10.244.1.211 10.244.1.231 10.244.1.251 10.244.1.42 10.244.1.62 10.244.1.82 10.244.1.113 10.244.1.133 10.244.1.153 10.244.1.173 10.244.1.193 10.244.1.212 10.244.1.232 10.244.1.252 10.244.1.43 10.244.1.63 10.244.1.83 10.244.1.114 10.244.1.134 10.244.1.154 10.244.1.174 10.244.1.194 10.244.1.213 10.244.1.233 10.244.1.253 10.244.1.44 10.244.1.64 10.244.1.84 10.244.1.115 10.244.1.135 10.244.1.155 10.244.1.175 10.244.1.195 10.244.1.214 10.244.1.234 10.244.1.254 10.244.1.45 10.244.1.65 10.244.1.85 10.244.1.116 10.244.1.136 10.244.1.156 10.244.1.176 10.244.1.196 10.244.1.215 10.244.1.235 10.244.1.26 10.244.1.46 10.244.1.66 10.244.1.86 10.244.1.117 10.244.1.137 10.244.1.157 10.244.1.177 10.244.1.197 10.244.1.216 10.244.1.236 10.244.1.27 10.244.1.47 10.244.1.67 10.244.1.87 10.244.1.118 10.244.1.138 10.244.1.158 10.244.1.178 10.244.1.198 10.244.1.217 10.244.1.237 10.244.1.28 10.244.1.48 10.244.1.68 10.244.1.88 10.244.1.119 10.244.1.139 10.244.1.159 10.244.1.179 10.244.1.199 10.244.1.218 10.244.1.238 10.244.1.29 10.244.1.49 10.244.1.69 10.244.1.89
這已經將10.244.1.x段的所有ip佔滿,自然沒有available的IP可供新pod使用了。至於為何佔滿,這個原因尚不明朗。下面兩個open issue與這個問題相關:
https://github.com/containernetworking/cni/issues/306
https://github.com/kubernetes/kubernetes/issues/21656
進入到/var/lib/cni/networks/cbr0目錄下,執行下面命令可以釋放那些可能是kubelet leak的IP資源:
for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -irl $hash ./; fi; done | xargs rm
執行後,目錄下的檔案列表變成了:
ls -l total 32 drw-r--r-- 2 root root 12288 Dec 27 17:11 ./ drw-r--r-- 3 root root 4096 Dec 27 13:52 ../ -rw-r--r-- 1 root root 64 Dec 27 17:11 10.244.1.2 -rw-r--r-- 1 root root 64 Dec 27 17:11 10.244.1.3 -rw-r--r-- 1 root root 64 Dec 27 17:11 10.244.1.4 -rw-r--r-- 1 root root 10 Dec 27 17:11 last_reserved_ip
不過pod仍然處於失敗狀態,但這次失敗的原因又發生了變化:
Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 23s 23s 1 {default-scheduler } Normal Scheduled Successfully assigned my-nginx-1948696469-7p4nn to iz2ze39jeyizepdxhwqci6z 22s 1s 22 {kubelet iz2ze39jeyizepdxhwqci6z} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-1948696469-7p4nn_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-1948696469-7p4nn_default(a40fe652-cc14-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": \"cni0\" already has an IP address different from 10.244.1.1/24; Skipping pod"
而/var/lib/cni/networks/cbr0目錄下的檔案又開始迅速增加!問題陷入僵局。
5、flannel vxlan不通,後端換udp,仍然不通
折騰到這裡,基本筋疲力盡了。於是在兩個node上執行kubeadm reset,準備重新來過。
kubeadm reset後,之前flannel建立的bridge device cni0和網口裝置flannel.1依然健在。為了保證環境徹底恢復到初始狀態,我們可以通過下面命令刪除這兩個裝置:
# ifconfig cni0 down # brctl delbr cni0 # ip link delete flannel.1
有了前面幾個問題的“磨鍊”後,重新init和join的k8s cluster顯得格外順利。這次minion node沒有再出現什麼異常。
# kubectl get nodes -o wide NAME STATUS AGE EXTERNAL-IP iz25beglnhtz Ready,master 5m <none> iz2ze39jeyizepdxhwqci6z Ready 51s <none> # kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default my-nginx-1948696469-71h1l 1/1 Running 0 3m default my-nginx-1948696469-zwt5g 1/1 Running 0 3m default my-ubuntu-2560993602-ftdm6 1/1 Running 0 3m kube-system dummy-2088944543-lmlbh 1/1 Running 0 5m kube-system etcd-iz25beglnhtz 1/1 Running 0 6m kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 6m kube-system kube-controller-manager-iz25beglnhtz 1/1 Running 0 6m kube-system kube-discovery-1769846148-l5lfw 1/1 Running 0 5m kube-system kube-dns-2924299975-mdq5r 4/4 Running 0 5m kube-system kube-flannel-ds-9zwr1 2/2 Running 0 5m kube-system kube-flannel-ds-p7xh2 2/2 Running 0 1m kube-system kube-proxy-dwt5f 1/1 Running 0 5m kube-system kube-proxy-vm6v2 1/1 Running 0 1m kube-system kube-scheduler-iz25beglnhtz 1/1 Running 0 6m
接下來我們建立my-nginx deployment和service來測試flannel網路的連通性。通過curl my-nginx service的nodeport,發現可以reach master上的兩個nginx pod,但是minion node上的pod依舊不通。
在master上看flannel docker的日誌:
I1228 02:52:22.097083 1 network.go:225] L3 miss: 10.244.1.2 I1228 02:52:22.097169 1 device.go:191] calling NeighSet: 10.244.1.2, 46:6c:7a:a6:06:60 I1228 02:52:22.097335 1 network.go:236] AddL3 succeeded I1228 02:52:55.169952 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:00.801901 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:03.801923 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:04.801764 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:05.801848 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:06.888269 1 network.go:225] L3 miss: 10.244.1.2 I1228 02:53:06.888340 1 device.go:191] calling NeighSet: 10.244.1.2, 46:6c:7a:a6:06:60 I1228 02:53:06.888507 1 network.go:236] AddL3 succeeded I1228 02:53:39.969791 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:45.153770 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:48.154822 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:49.153774 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:50.153734 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:52.154056 1 network.go:225] L3 miss: 10.244.1.2 I1228 02:53:52.154110 1 device.go:191] calling NeighSet: 10.244.1.2, 46:6c:7a:a6:06:60 I1228 02:53:52.154256 1 network.go:236] AddL3 succeeded
日誌中有大量:“Ignoring not a miss”字樣的日誌,似乎vxlan網路有問題。這個問題與下面issue中描述頗為接近:
https://github.com/coreos/flannel/issues/427
Flannel預設採用vxlan作為backend,使用kernel vxlan預設的udp 8742埠。Flannel還支援udp的backend,使用udp 8285埠。於是試著更換一下flannel後端。更換flannel後端的步驟如下:
- 將https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml檔案下載到本地;
- 修改kube-flannel.yml檔案內容:主要是針對net-conf.json屬性,增加”Backend”欄位屬性:
--- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | { "name": "cbr0", "type": "flannel", "delegate": { "isDefaultGateway": true } } net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "udp", "Port": 8285 } } --- ... ...
- 解除安裝並重新安裝pod網路
# kubectl delete -f kube-flannel.yml configmap "kube-flannel-cfg" deleted daemonset "kube-flannel-ds" deleted # kubectl apply -f kube-flannel.yml configmap "kube-flannel-cfg" created daemonset "kube-flannel-ds" created # netstat -an|grep 8285 udp 0 0 123.56.200.187:8285 0.0.0.0:*
經過測試發現:udp埠是通的。在兩個node上tcpdump -i flannel0 可以看到udp資料包的傳送和接收。但是兩個node間的pod network依舊不通。
6、failed to register network: failed to acquire lease: node “iz25beglnhtz” not found
正常情況下master node和minion node上的flannel pod的啟動日誌如下:
master node flannel的執行:
I1227 04:56:16.577828 1 main.go:132] Installing signal handlers I1227 04:56:16.578060 1 kube.go:233] starting kube subnet manager I1227 04:56:16.578064 1 manager.go:133] Determining IP address of default interface I1227 04:56:16.578576 1 manager.go:163] Using 123.56.200.187 as external interface I1227 04:56:16.578616 1 manager.go:164] Using 123.56.200.187 as external endpoint E1227 04:56:16.579079 1 network.go:106] failed to register network: failed to acquire lease: node "iz25beglnhtz" not found I1227 04:56:17.583744 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1227 04:56:17.585367 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1227 04:56:17.587765 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1227 04:56:17.589943 1 manager.go:246] Lease acquired: 10.244.0.0/24 I1227 04:56:17.590203 1 network.go:58] Watching for L3 misses I1227 04:56:17.590255 1 network.go:66] Watching for new subnet leases I1227 07:43:27.164103 1 network.go:153] Handling initial subnet events I1227 07:43:27.164211 1 device.go:163] calling GetL2List() dev.link.Index: 5 I1227 07:43:27.164350 1 device.go:168] calling NeighAdd: 59.110.67.15, ca:50:97:1f:c2:ea
minion node上flannel的執行:
# docker logs 1f64bd9c0386 I1227 07:43:26.670620 1 main.go:132] Installing signal handlers I1227 07:43:26.671006 1 manager.go:133] Determining IP address of default interface I1227 07:43:26.670825 1 kube.go:233] starting kube subnet manager I1227 07:43:26.671514 1 manager.go:163] Using 59.110.67.15 as external interface I1227 07:43:26.671575 1 manager.go:164] Using 59.110.67.15 as external endpoint I1227 07:43:26.746811 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1227 07:43:26.749785 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1227 07:43:26.752343 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1227 07:43:26.755126 1 manager.go:246] Lease acquired: 10.244.1.0/24 I1227 07:43:26.755444 1 network.go:58] Watching for L3 misses I1227 07:43:26.755475 1 network.go:66] Watching for new subnet leases I1227 07:43:27.755830 1 network.go:153] Handling initial subnet events I1227 07:43:27.755905 1 device.go:163] calling GetL2List() dev.link.Index: 10 I1227 07:43:27.756099 1 device.go:168] calling NeighAdd: 123.56.200.187, ca:68:7c:9b:cc:67
但在進行上面問題5的測試過程中,我們發現flannel container的啟動日誌中有如下錯誤:
master node:
# docker logs c2d1cee3df3d I1228 06:53:52.502571 1 main.go:132] Installing signal handlers I1228 06:53:52.502735 1 manager.go:133] Determining IP address of default interface I1228 06:53:52.503031 1 manager.go:163] Using 123.56.200.187 as external interface I1228 06:53:52.503054 1 manager.go:164] Using 123.56.200.187 as external endpoint E1228 06:53:52.503869 1 network.go:106] failed to register network: failed to acquire lease: node "iz25beglnhtz" not found I1228 06:53:52.503899 1 kube.go:233] starting kube subnet manager I1228 06:53:53.522892 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1228 06:53:53.524325 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1228 06:53:53.526622 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1228 06:53:53.528438 1 manager.go:246] Lease acquired: 10.244.0.0/24 I1228 06:53:53.528744 1 network.go:58] Watching for L3 misses I1228 06:53:53.528777 1 network.go:66] Watching for new subnet leases
minion node:
# docker logs dcbfef45308b I1228 05:28:05.012530 1 main.go:132] Installing signal handlers I1228 05:28:05.012747 1 manager.go:133] Determining IP address of default interface I1228 05:28:05.013011 1 manager.go:163] Using 59.110.67.15 as external interface I1228 05:28:05.013031 1 manager.go:164] Using 59.110.67.15 as external endpoint E1228 05:28:05.013204 1 network.go:106] failed to register network: failed to acquire lease: node "iz2ze39jeyizepdxhwqci6z" not found I1228 05:28:05.013237 1 kube.go:233] starting kube subnet manager I1228 05:28:06.041602 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1228 05:28:06.042863 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1228 05:28:06.044896 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1228 05:28:06.046497 1 manager.go:246] Lease acquired: 10.244.1.0/24 I1228 05:28:06.046780 1 network.go:98] Watching for new subnet leases I1228 05:28:07.047052 1 network.go:191] Subnet added: 10.244.0.0/24
兩個Node都有“註冊網路”失敗的錯誤:failed to register network: failed to acquire lease: node “xxxx” not found。很難斷定是否是因為這兩個錯誤導致的兩個node間的網路不通。從整個測試過程來看,這個問題時有時無。在下面flannel issue中也有類似的問題討論:
https://github.com/coreos/flannel/issues/435
Flannel pod network的諸多問題讓我決定暫時放棄在kubeadm建立的kubernetes cluster中繼續使用Flannel。
四、Calico pod network
Kubernetes支援的pod network add-ons中,除了Flannel,還有calico、Weave net等。這裡我們試試基於邊界閘道器BGP協議實現的Calico pod network。Calico Project針對在kubeadm建立的K8s叢集的Pod網路安裝也有專門的文件。文件中描述的需求和約束我們均滿足,比如:
master node帶有kubeadm.alpha.kubernetes.io/role: master標籤:
# kubectl get nodes -o wide --show-labels NAME STATUS AGE EXTERNAL-IP LABELS iz25beglnhtz Ready,master 3m <none> beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubeadm.alpha.kubernetes.io/role=master,kubernetes.io/hostname=iz25beglnhtz
在安裝calico之前,我們還是要執行kubeadm reset重置環境,並將flannel建立的各種網路裝置刪除,可參考上面幾個小節中的命令。
1、初始化叢集
使用calico的kubeadm init無需再指定–pod-network-cidr=10.244.0.0/16 option:
# kubeadm init --api-advertise-addresses=10.47.217.91 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. [preflight] Running pre-flight checks [preflight] Starting the kubelet service [init] Using Kubernetes version: v1.5.1 [tokens] Generated token: "531b3f.3bd900d61b78d6c9" [certificates] Generated Certificate Authority key and certificate. [certificates] Generated API Server key and certificate [certificates] Generated Service Account signing keys [certificates] Created keys and certificates in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf" [apiclient] Created API client, waiting for the control plane to become ready [apiclient] All control plane components are healthy after 13.527323 seconds [apiclient] Waiting for at least one node to register and become ready [apiclient] First node is ready after 0.503814 seconds [apiclient] Creating a test deployment [apiclient] Test deployment succeeded [token-discovery] Created the kube-discovery deployment, waiting for it to become ready [token-discovery] kube-discovery is ready after 1.503644 seconds [addons] Created essential addon: kube-proxy [addons] Created essential addon: kube-dns Your Kubernetes master has initialized successfully! You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: http://kubernetes.io/docs/admin/addons/ You can now join any number of machines by running the following on each node: kubeadm join --token=531b3f.3bd900d61b78d6c9 10.47.217.91
2、建立calico network
# kubectl apply -f http://docs.projectcalico.org/v2.0/getting-started/kubernetes/installation/hosted/kubeadm/calico.yaml configmap "calico-config" created daemonset "calico-etcd" created service "calico-etcd" created daemonset "calico-node" created deployment "calico-policy-controller" created job "configure-calico" created
實際建立過程需要一段時間,因為calico需要pull 一些images:
# docker images REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/calico/node v1.0.0 74bff066bc6a 7 days ago 256.4 MB calico/ctl v1.0.0 069830246cf3 8 days ago 43.35 MB calico/cni v1.5.5 ada87b3276f3 12 days ago 67.13 MB gcr.io/google_containers/etcd 2.2.1 a6cd91debed1 14 months ago 28.19 MB
calico在master node本地建立了兩個network device:
# ip a ... ... 47: [email protected]: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1 link/ipip 0.0.0.0 brd 0.0.0.0 inet 192.168.91.0/32 scope global tunl0 valid_lft forever preferred_lft forever 48: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 62:39:10:55:44:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
3、minion node join
執行下面命令,將minion node加入cluster:
# kubeadm join --token=531b3f.3bd900d61b78d6c9 10.47.217.91
calico在minion node上也建立了一個network device:
57988: [email protected]: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1 link/ipip 0.0.0.0 brd 0.0.0.0 inet 192.168.136.192/32 scope global tunl0 valid_lft forever preferred_lft forever
join成功後,我們檢視一下cluster status:
# kubectl get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE kube-system calico-etcd-488qd 1/1 Running 0 18m 10.47.217.91 iz25beglnhtz kube-system calico-node-jcb3c 2/2 Running 0 18m 10.47.217.91 iz25beglnhtz kube-system calico-node-zthzp 2/2 Running 0 4m 10.28.61.30 iz2ze39jeyizepdxhwqci6z kube-system calico-policy-controller-807063459-f21q4 1/1 Running 0 18m 10.47.217.91 iz25beglnhtz kube-system dummy-2088944543-rtsfk 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system etcd-iz25beglnhtz 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-controller-manager-iz25beglnhtz 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-discovery-1769846148-51wdk 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-dns-2924299975-fhf5f 4/4 Running 0 23m 192.168.91.1 iz25beglnhtz kube-system kube-proxy-2s7qc 1/1 Running 0 4m 10.28.61.30 iz2ze39jeyizepdxhwqci6z kube-system kube-proxy-h2qds 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-scheduler-iz25beglnhtz 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz
所有元件都是ok的。似乎是好兆頭!但跨node的pod network是否聯通,還需進一步探究。
4、探究跨node的pod network聯通性
我們依舊利用上面測試flannel網路的my-nginx-svc.yaml和run-my-nginx.yaml,建立my-nginx service和my-nginx deployment。注意:這之前要先在master node上執行一下”kubectl taint nodes –all dedicated-”,以讓master node承載work load。
遺憾的是,結果和flannel很相似,分配到master node上http request得到了nginx的響應;minion node上的pod依舊無法聯通。
這次我不想在calico這塊過多耽擱,我要快速看看下一個候選者:weave net是否滿足要求。
五、weave network for pod
經過上面那麼多次嘗試,結果是令人掃興的。Weave network似乎是最後一顆救命稻草了。有了前面的鋪墊,這裡就不詳細列出各種命令的輸出細節了。Weave network也有專門的官方文件用於指導如何與kubernetes叢集整合,我們主要也是參考它。
1、安裝weave network add-on
在kubeadm reset後,我們重新初始化了叢集。接下來我們安裝weave network add-on:
# kubectl apply -f https://git.io/weave-kube daemonset "weave-net" created
前面無論是Flannel還是calico,在安裝pod network add-on時至少都還是順利的。不過在Weave network這次,我們遭遇“當頭棒喝”:(:
# kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE kube-system dummy-2088944543-4kxtk 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system etcd-iz25beglnhtz 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-controller-manager-iz25beglnhtz 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-discovery-1769846148-pzv8p 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-dns-2924299975-09dcb 0/4 ContainerCreating 0 42m <none> iz25beglnhtz kube-system kube-proxy-z465f 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-scheduler-iz25beglnhtz 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system weave-net-3wk9h 0/2 CrashLoopBackOff 16 17m 10.47.217.91 iz25beglnhtz
安裝後,weave-net pod提示:CrashLoopBackOff。追蹤其Container log,得到如下錯誤資訊:
docker logs cde899efa0af
time=”2016-12-28T08:25:29Z” level=info msg=”Starting Weaveworks NPC 1.8.2″
ti