1. 程式人生 > >kubernetes學習:7.部署master節點

kubernetes學習:7.部署master節點

k8s部署master節點

在之前的章節介紹過,k8s的控制節點扮演者整個排程和管理的角色,所以是非常關鍵的一部分。k8s的master節點主要包含三個部分:
1. kube-apiserver 提供了統一的資源操作入口;
2. kube-scheduler 是一個資源排程器,它根據特定的排程演算法把pod生成到指定的計算節點中;
3. kube-controller-manager 也是執行在控制節點上一個很關鍵的管理控制組件;

kube-scheduler、kube-controller-manager 和 kube-apiserver 三者的功能緊密相關;
同時只能有一個 kube-scheduler、kube-controller-manager 程序處於工作狀態,如果執行多個,則需要通過選舉產生一個 leader;

因為k8s各節點之間是進行加密傳輸的,所以確認證書檔案是否已經配置好:

[[email protected]1 ssl]# cd /etc/kubernetes/ssl/
[[email protected]1 ssl]# ls
admin-key.pem  admin.pem  ca-key.pem  ca.pem  kube-proxy-key.pem  kube-proxy.pem  kubernetes-key.pem  kubernetes.pem

部署master節點

下載最新版的二進位制檔案

為了方便部署,我們使用二進位制檔案進行部署,在官網下載指定版本的kubernetes-server包(server包中已經包含了client的二進位制檔案):

[root@wecloud-test-k8s-1 ~]# wget https://dl.k8s.io/v1.8.10/kubernetes-server-linux-amd64.tar.gz
[root@wecloud-test-k8s-1 ~]# tar xvf kubernetes-server-linux-amd64.tar.gz
[root@wecloud-test-k8s-1 ~]# cd kubernetes/
[root@wecloud-test-k8s-1 kubernetes]# cp server/bin/{kube-apiserver,kube-controller-manager,kube-scheduler,kubectl,kube-proxy,kubelet} /usr/local/bin/

配置和啟動 kube-apiserver

建立 kube-apiserver的service配置檔案

kube-apiserver的服務啟動檔案(/usr/lib/systemd/system/kube-apiserver.service)內容如下:

[Unit]
Description=Kubernetes API Service
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
After=etcd.service

[Service]
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/apiserver
ExecStart=/usr/local/bin/kube-apiserver \
        $KUBE_LOGTOSTDERR \
        $KUBE_LOG_LEVEL \
        $KUBE_ETCD_SERVERS \
        $KUBE_API_ADDRESS \
        $KUBE_API_PORT \
        $KUBELET_PORT \
        $KUBE_ALLOW_PRIV \
        $KUBE_SERVICE_ADDRESSES \
        $KUBE_ADMISSION_CONTROL \
        $KUBE_API_ARGS
Restart=on-failure
Type=notify
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

該服務管理檔案中涉及兩個配置檔案:/etc/kubernetes/config 和 /etc/kubernetes/apiserver, 其中/etc/kubernetes/config是kube-apiserver、kube-controller-manager、kube-scheduler、kubelet、kube-proxy共同使用。

/etc/kubernetes/config內容如下:

###
# kubernetes system config
#   
# The following values are used to configure various aspects of all
# kubernetes services, including
#
#   kube-apiserver.service
#   kube-controller-manager.service
#   kube-scheduler.service
#   kubelet.service
#   kube-proxy.service
# logging to stderr means we get it in the systemd journal
KUBE_LOGTOSTDERR="--logtostderr=true"

# journal message level, 0 is debug
KUBE_LOG_LEVEL="--v=0"

# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow-privileged=true"

# How the controller-manager, scheduler, and proxy find the apiserver
#KUBE_MASTER="--master=http://sz-pg-oam-docker-test-001.tendcloud.com:8080"
KUBE_MASTER="--master=http://192.168.99.183:8080"

另外一個/etc/kubernetes/apiserver是kube-apiserve的主配置檔案:

###
## kubernetes system config
##
## The following values are used to configure the kube-apiserver
##
#
## The address on the local server to listen to.
#KUBE_API_ADDRESS="--insecure-bind-address=sz-pg-oam-docker-test-001.tendcloud.com"
KUBE_API_ADDRESS="--advertise-address=192.168.99.183 --bind-address=192.168.99.183 --insecure-bind-address=192.168.99.183"
#
## The port on the local server to listen on.
#KUBE_API_PORT="--port=8080"
#
## Port minions listen on
#KUBELET_PORT="--kubelet-port=10250"
#
## Comma separated list of nodes in the etcd cluster
KUBE_ETCD_SERVERS="--etcd-servers=https://192.168.99.189:2379,https://192.168.99.185:2379,https://192.168.99.196:2379"
#
## Address range to use for services
KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16"
#
## default admission control policies
KUBE_ADMISSION_CONTROL="--admission-control=ServiceAccount,NamespaceLifecycle,NamespaceExists,LimitRanger,ResourceQuota"
#
## Add your own!
KUBE_API_ARGS="--authorization-mode=RBAC --runtime-config=rbac.authorization.k8s.io/v1beta1 --kubelet-https=true --experimental-bootstrap-token-auth --token-auth-file=/etc/kubernetes/token.csv --service-node-port-range=30000-32767 --tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem --tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem --client-ca-file=/etc/kubernetes/ssl/ca.pem --service-account-key-file=/etc/kubernetes/ssl/ca-key.pem --etcd-cafile=/etc/kubernetes/ssl/ca.pem --etcd-certfile=/etc/kubernetes/ssl/kubernetes.pem --etcd-keyfile=/etc/kubernetes/ssl/kubernetes-key.pem --enable-swagger-ui=true --apiserver-count=3 --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100 --audit-log-path=/var/lib/audit.log --event-ttl=1h"

kube-scheduler、kube-controller-manager 一般和 kube-apiserver 部署在同一臺機器上,它們使用非安全埠和 kube-apiserver通訊;
kubelet、kube-proxy、kubectl 部署在其它 Node 節點上,如果通過安全埠訪問 kube-apiserver,則必須先通過 TLS 證書認證,再通過 RBAC 授權;
kube-proxy、kubectl 通過在使用的證書裡指定相關的 User、Group 來達到通過 RBAC 授權的目的;
如果使用了 kubelet TLS Boostrap 機制,則不能再指定 –kubelet-certificate-authority、–kubelet-client-certificate 和 –kubelet-client-key 選項,否則後續 kube-apiserver 校驗 kubelet 證書時出現 ”x509: certificate signed by unknown authority“ 錯誤;

啟動kube-apiserver

啟動並設定kube-apiserver服務開機自啟動:

[[email protected]1 kubernetes]# systemctl daemon-reload
[[email protected]1 kubernetes]# systemctl enable kube-apiserver
[[email protected]1 kubernetes]# systemctl start kube-apiserver
[[email protected]1 kubernetes]# systemctl status kube-apiserver
● kube-apiserver.service - Kubernetes API Service
   Loaded: loaded (/usr/lib/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
   Active: active (running) since 二 2018-04-10 22:41:56 CST; 11min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 19418 (kube-apiserver)
   CGroup: /system.slice/kube-apiserver.service
           └─19418 /usr/local/bin/kube-apiserver --logtostderr=true --v=0 --etcd-servers=https://192.168.99.189:2379,https://192.168.99.1...

410 22:42:11 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:11.414685   19418 storage_rbac.go:257] created role...ystem
410 22:42:11 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:11.832312   19418 storage_rbac.go:257] created role...ystem
410 22:42:12 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:12.229856   19418 storage_rbac.go:257] created role...ystem
410 22:42:12 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:12.497168   19418 storage_rbac.go:257] created role...ublic
410 22:42:12 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:12.703731   19418 storage_rbac.go:287] created role...ublic
410 22:42:12 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:12.877033   19418 storage_rbac.go:287] created role...ystem
410 22:42:13 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:13.192097   19418 storage_rbac.go:287] created role...ystem
410 22:42:13 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:13.454727   19418 storage_rbac.go:287] created role...ystem
410 22:42:13 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:13.634617   19418 storage_rbac.go:287] created role...ystem
410 22:42:13 wecloud-test-k8s-1.novalocal kube-apiserver[19418]: I0410 22:42:13.913096   19418 storage_rbac.go:287] created role...ystem
Hint: Some lines were ellipsized, use -l to show in full.

配置和啟動 kube-controller-manager

建立kube-controller-manager服務的service檔案

kube-controller-manager服務的配置為/usr/lib/systemd/system/kube-controller-manager.service檔案:

[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/controller-manager
ExecStart=/usr/local/bin/kube-controller-manager \
        $KUBE_LOGTOSTDERR \
        $KUBE_LOG_LEVEL \
        $KUBE_MASTER \
        $KUBE_CONTROLLER_MANAGER_ARGS
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

該服務也使用了/etc/kubernetes/config配置檔案。

還需要配置/etc/kubernetes/controller-manager配置檔案:

###
# The following values are used to configure the kubernetes controller-manager

# defaults from config and apiserver should be adequate

# Add your own!
KUBE_CONTROLLER_MANAGER_ARGS="--address=127.0.0.1 --service-cluster-ip-range=10.254.0.0/16 --cluster-name=kubernetes --cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem --cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem  --service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem --root-ca-file=/etc/kubernetes/ssl/ca.pem --leader-elect=true"

–service-cluster-ip-range是指定 Cluster 中 Service 的CIDR範圍,必須和kube-apiserver 中的引數一致;
–root-ca-file 用來對 kube-apiserver 證書進行校驗,指定該引數後,才會在Pod 容器的 ServiceAccount 中放置該 CA 證書檔案;

啟動 kube-controller-manager

啟動並設定kube-controller-manager服務開機自啟動:

[[email protected]1 ~]# systemctl daemon-reload 
[[email protected]1 ~]# systemctl enable kube-controller-manager.service 
Created symlink from /etc/systemd/system/multi-user.target.wants/kube-controller-manager.service to /usr/lib/systemd/system/kube-controller-manager.service.
[[email protected]1 ~]# systemctl start kube-controller-manager.service 
[[email protected]1 ~]# systemctl status kube-controller-manager.service 
● kube-controller-manager.service - Kubernetes Controller Manager
   Loaded: loaded (/usr/lib/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled)
   Active: active (running) since 三 2018-04-11 09:25:32 CST; 4s ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 20400 (kube-controller)
   CGroup: /system.slice/kube-controller-manager.service
           └─20400 /usr/local/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://192.168.99.183:8080 --address=127.0.0....

411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.625674   20400 controller_utils.go:1048]...ller
411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.644221   20400 controller_utils.go:1048]...ller
411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.645379   20400 controller_utils.go:1048]...ller
411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.646144   20400 controller_utils.go:1048]...ller
411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.710293   20400 controller_utils.go:1048]...ller
411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.719435   20400 controller_utils.go:1048]...ller
411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.719475   20400 garbagecollector.go:145] ...bage
411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.723843   20400 controller_utils.go:1048]...ller
411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.723870   20400 disruption.go:296] Sendin...ver.
411 09:25:33 wecloud-test-k8s-1.novalocal kube-controller-manager[20400]: I0411 09:25:33.726803   20400 controller_utils.go:1048]...ller
Hint: Some lines were ellipsized, use -l to show in full.

配置和啟動 kube-scheduler

建立kube-scheduler的service啟動檔案和配置檔案

kube-scheduler的服務啟動檔案為/usr/lib/systemd/system/kube-scheduler.service:

cat /usr/lib/systemd/system/kube-scheduler.service
[Unit]
Description=Kubernetes Scheduler Plugin
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/scheduler
ExecStart=/usr/local/bin/kube-scheduler \
            $KUBE_LOGTOSTDERR \
            $KUBE_LOG_LEVEL \
            $KUBE_MASTER \
            $KUBE_SCHEDULER_ARGS
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

kube-scheduler的配置檔案為/etc/kubernetes/scheduler, 內容如下:

###
# kubernetes scheduler config

# default config should be adequate

# Add your own!
KUBE_SCHEDULER_ARGS="--leader-elect=true --address=127.0.0.1"

啟動kube-scheduler服務

啟動並設定kube-scheduler服務開機自啟動:

[[email protected]1 ~]# systemctl daemon-reload 
[[email protected]1 ~]# systemctl enable kube-scheduler.service 
Created symlink from /etc/systemd/system/multi-user.target.wants/kube-scheduler.service to /usr/lib/systemd/system/kube-scheduler.service.
[[email protected]1 ~]# systemctl start kube-scheduler.service 
[[email protected]1 ~]# systemctl status kube-scheduler.service 
● kube-scheduler.service - Kubernetes Scheduler Plugin
   Loaded: loaded (/usr/lib/systemd/system/kube-scheduler.service; enabled; vendor preset: disabled)
   Active: active (running) since 三 2018-04-11 09:30:38 CST; 3s ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 20536 (kube-scheduler)
   CGroup: /system.slice/kube-scheduler.service
           └─20536 /usr/local/bin/kube-scheduler --logtostderr=true --v=0 --master=http://192.168.99.183:8080 --leader-elect=true --addre...

411 09:30:38 wecloud-test-k8s-1.novalocal systemd[1]: Started Kubernetes Scheduler Plugin.
411 09:30:38 wecloud-test-k8s-1.novalocal systemd[1]: Starting Kubernetes Scheduler Plugin...
411 09:30:38 wecloud-test-k8s-1.novalocal kube-scheduler[20536]: I0411 09:30:38.844579   20536 controller_utils.go:1041] Waiting...oller
411 09:30:38 wecloud-test-k8s-1.novalocal kube-scheduler[20536]: I0411 09:30:38.944956   20536 controller_utils.go:1048] Caches ...oller
411 09:30:38 wecloud-test-k8s-1.novalocal kube-scheduler[20536]: I0411 09:30:38.945311   20536 leaderelection.go:174] attempting...se...
411 09:30:39 wecloud-test-k8s-1.novalocal kube-scheduler[20536]: I0411 09:30:39.014761   20536 leaderelection.go:184] successful...duler
411 09:30:39 wecloud-test-k8s-1.novalocal kube-scheduler[20536]: I0411 09:30:39.015057   20536 event.go:218] Event(v1.ObjectReference...
Hint: Some lines were ellipsized, use -l to show in full.

驗證 master 節點功能

通過kubectl命令可以檢視k8s各元件的狀態:

[root@wecloud-test-k8s-1 ~]# kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
controller-manager   Healthy   ok                   
scheduler            Healthy   ok                   
etcd-2               Healthy   {"health": "true"}   
etcd-1               Healthy   {"health": "true"}   
etcd-0               Healthy   {"health": "true"}   

這裡分享一個問題的解決方法,我再多次執行檢視狀態的時候發現etcd的狀態總是會有部分節點出現Unhealthy的狀態。

[root@wecloud-test-k8s-1 ~]# kubectl get componentstatuses
NAME                 STATUS      MESSAGE                                  ERROR
controller-manager   Healthy     ok                                       
scheduler            Healthy     ok                                       
etcd-0               Healthy     {"health": "true"}                       
etcd-2               Healthy     {"health": "true"}                       
etcd-1               Unhealthy   HTTP probe failed with statuscode: 503   

[root@wecloud-test-k8s-1 ~]# kubectl get componentstatuses
NAME                 STATUS      MESSAGE                                  ERROR
scheduler            Healthy     ok                                       
controller-manager   Healthy     ok                                       
etcd-0               Healthy     {"health": "true"}                       
etcd-2               Unhealthy   HTTP probe failed with statuscode: 503   
etcd-1               Unhealthy   HTTP probe failed with statuscode: 503   

現象是etcd的監控狀態非常不穩定,檢視日誌發現etcd服務的各節點之間的心跳檢測出現了問題:

[email protected]:~# ssh 192.168.99.189
[[email protected]2 ~]# systemctl status etcd
● etcd.service - Etcd Server
   Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: active (running) since 一 2018-04-09 22:56:31 CST; 1 day 10h ago
     Docs: https://github.com/coreos
 Main PID: 17478 (etcd)
   CGroup: /system.slice/etcd.service
           └─17478 /usr/local/bin/etcd --name infra1 --cert-file=/etc/kubernetes/ssl/kubernetes.pem --key-file=/etc/kubernetes/ssl/kubern...

411 09:33:35 wecloud-test-k8s-2.novalocal etcd[17478]: e23bf6fd185b2dc5 [quorum:2] has received 1 MsgVoteResp votes and 1 vote ...ctions
411 09:33:36 wecloud-test-k8s-2.novalocal etcd[17478]: e23bf6fd185b2dc5 received MsgVoteResp from c9b9711086e865e3 at term 337
411 09:33:36 wecloud-test-k8s-2.novalocal etcd[17478]: e23bf6fd185b2dc5 [quorum:2] has received 2 MsgVoteResp votes and 1 vote ...ctions
411 09:33:36 wecloud-test-k8s-2.novalocal etcd[17478]: e23bf6fd185b2dc5 became leader at term 337
411 09:33:36 wecloud-test-k8s-2.novalocal etcd[17478]: raft.node: e23bf6fd185b2dc5 elected leader e23bf6fd185b2dc5 at term 337
411 09:33:41 wecloud-test-k8s-2.novalocal etcd[17478]: timed out waiting for read index response
411 09:33:46 wecloud-test-k8s-2.novalocal etcd[17478]: failed to send out heartbeat on time (exceeded the 100ms timeout for 401...516ms)
411 09:33:46 wecloud-test-k8s-2.novalocal etcd[17478]: server is likely overloaded
411 09:33:46 wecloud-test-k8s-2.novalocal etcd[17478]: failed to send out heartbeat on time (exceeded the 100ms timeout for 401.80886ms)
411 09:33:46 wecloud-test-k8s-2.novalocal etcd[17478]: server is likely overloaded
Hint: Some lines were ellipsized, use -l to show in full.

報錯資訊主要為:failed to send out heartbeat on time (exceeded the 100ms timeout for 401.80886ms)

心跳檢測報錯主要與以下因素有關(磁碟速度、cpu效能和網路不穩定問題):

etcd使用了raft演算法,leader會定時地給每個follower傳送心跳,如果leader連續兩個心跳時間沒有給follower傳送心跳,etcd會列印這個log以給出告警。通常情況下這個issue是disk執行過慢導致的,leader一般會在心跳包裡附帶一些metadata,leader需要先把這些資料固化到磁碟上,然後才能傳送。寫磁碟過程可能要與其他應用競爭,或者因為磁碟是一個虛擬的或者是SATA型別的導致執行過慢,此時只有更好更快磁碟硬體才能解決問題。etcd暴露給Prometheus的metrics指標walfsyncduration_seconds就顯示了wal日誌的平均花費時間,通常這個指標應低於10ms。

第二種原因就是CPU計算能力不足。如果是通過監控系統發現CPU利用率確實很高,就應該把etcd移到更好的機器上,然後通過cgroups保證etcd程序獨享某些核的計算能力,或者提高etcd的priority。

第三種原因就可能是網速過慢。如果Prometheus顯示是網路服務質量不行,譬如延遲太高或者丟包率過高,那就把etcd移到網路不擁堵的情況下就能解決問題。但是如果etcd是跨機房部署的,長延遲就不可避免了,那就需要根據機房間的RTT調整heartbeat-interval,而引數election-timeout則至少是heartbeat-interval的5倍。

本次實驗是在openstack雲主機上進行的,所以磁碟io不足是已知的問題,所以需要修改hearheat-interval的值(調大一些)。

在etcd服務節點上修改/etc/etcd/etcd.conf檔案,新增如下內容:

6秒檢測頻率

ETCD_HEARTBEAT_INTERVAL=6000     
ETCD_ELECTION_TIMEOUT=30000

然後重啟etcd服務

小結

k8s的master節點起著排程、管理和對外提供服務的功能,所以需要設計成高可用方式,但是k8s的master本身是不支援高可用的,我們可以藉助haproxy、keepalived等工具實現高可用負載均衡。