1. 程式人生 > 其它 >阿里雲部署K8s叢集

阿里雲部署K8s叢集

首先說一下我的環境和配置:阿里雲1核2G,系統是Ubuntu18.04(最好是2核,因為master有限制,不是的話也沒關係,因為到時候可以忽略掉),node也是1核2G

好了開始進入正題吧

1,更新系統源

如果系統本身自帶得映象地址,伺服器在國外,下載速度會很慢,可以開啟 /etc/apt/sources.list 替換為國內得映象源。

apt upgrade

2,更新軟體包

將系統得軟體元件更新至最新穩定版本。

apt update
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libcurl4
The following packages will be upgraded:
  curl libcurl4
2 upgraded, 0 newly installed, 0 to remove and 46 not upgraded.
Need to get 378 kB of archives.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] y
Ign:1 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 curl amd64 7.58.0-2ubuntu3.14
Ign:2 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 libcurl4 amd64 7.58.0-2ubuntu3.14
Err:1 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 curl amd64 7.58.0-2ubuntu3.14
  404  Not Found [IP: 100.100.2.148 80]
Err:2 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 libcurl4 amd64 7.58.0-2ubuntu3.14
  404  Not Found [IP: 100.100.2.148 80]
E: Failed to fetch http://mirrors.cloud.aliyuncs.com/ubuntu/pool/main/c/curl/curl_7.58.0-2ubuntu3.14_amd64.deb  404  Not Found [IP: 100.100.2.148 80]
E: Failed to fetch http://mirrors.cloud.aliyuncs.com/ubuntu/pool/main/c/curl/libcurl4_7.58.0-2ubuntu3.14_amd64.deb  404  Not Found [IP: 100.100.2.148 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

不更新的話會遇到這個問題,所以記得更新哦,而且上邊已給了提示run apt-get update or try with --fix-missing

3,安裝 Docker

也可以參考其它過程安裝

apt-get install docker.io

如果需要配置為開機啟動,可執行以下命令

systemcd enable docker

systemcd start docker

如果要配置 Docker 映象加速,開啟 /etc/docker/daemon.json 檔案,registry-mirrors 增加或修改,加入https://registry.docker-cn.com 這個地址,也可以填寫阿里雲騰訊雲等映象加速地址。

示例

{
	"registry-mirrors": [

		"https://registry.docker-cn.com"

	]

}

重啟 Docker,使配置生效

sudo systemctl daemon-reload

sudo systemctl restart docker

4,安裝 K8S

禁用 swapoff

# 暫時關閉SWAP分割槽
swapoff -a
# 永久禁用SWAP分割槽
swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

將系統中橋接的IPv4以及IPv6的流量串通:

cat >/etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system

執行以下命令安裝 https 工具以及 k8s。

apt-get update && apt-get install -y apt-transport-https curl
apt-get install -y kubelet kubeadm kubectl --allow-unauthenticated
    
#常用命令
重啟kubelet服務:
systemctl daemon-reload
systemctl restart kubelet
sudo systemctl restart kubelet.service
sudo systemctl daemon-reload
sudo systemctl stop kubelet
sudo systemctl enable kubelet
sudo systemctl start kubelet

執行下面命令測試是否正常

kubeadm version

#結果示例
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:37:34Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

如果安裝時,出現下面情況,說明系統得映象源中,找不到 k8s 的軟體包。

E: Unable to locate package kubelet
E: Unable to locate package kubeadm
E: Unable to locate package kubectl

可以開啟 /etc/apt/sources.list 檔案,新增一行

deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main

先執行更新軟體包命令,再次執行安裝 K8s 的命令。

這一步安裝curl時可能會遇到這個問題

The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
Reading package lists... Done
W: GPG error: https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
E: The repository 'https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

只需執行即可(key就是NO_PUBKEY後的值,根據你自己的key進行替換)

 apt-key adv --keyserver keyserver.ubuntu.com --recv-keys  FEEA9169307EA071

上面命令,安裝了 kubeletkubeadmkubectlkubelet 是 k8s 相關服務,kubectlk8s 管理客戶端,kubeadm 是部署工具。

如果只是node的話到這裡就可以了

另一臺阿里雲加入叢集只需執行(這個在下面會告訴你怎麼弄出來的,等全看完再回來搞就行)

kubeadm join 39.96.46.96:6443 --token 9vbzuf.vtzj1w5vefjlwi0t         --discovery-token-ca-cert-hash sha256:b6e6fffb6b0e11d2db374ce21f6d86de3e09e1e13075e1bf01055130c2c5e060

可能會遇到的錯

[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

#解決:
#是因為docker和kubernetes所使用的cgroup不一致導致
#在docker中修改配置檔案
cat > /etc/docker/daemon.json <<EOF
{"exec-opts": ["native.cgroupdriver=systemd"]}
EOF
#重啟docker
systemctl restart docker

[kubelet-check] Initial timeout of 40s passed.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

#解決:
swapoff -a
kubeadm reset
systemctl daemon-reload
systemctl restart kubelet
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X  
#再次執行join命令,node加入成功
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

#但是會檢視到node節點是NotReady狀態。 因為在K8S中,整個系統處於一個扁平的網路之下,其中在谷歌內部的網路架構中,這個是天然存在的,但是在我們自己的環境裡這個需要設定。所以下一步就是部署網路環境,此處我們使用的就是由Ubuntu開發的flannel網路元件。(這步可能會報錯,先繼續向下看弄好master再看這裡就會懂了)
#執行後面拉取docker映象的指令碼,拉取映象,再回到master上重新執行
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#使用“kubectl get node”可檢視所有節點都處於ready狀態了

root@ubuntu:~# kubectl get nodes
NAME      STATUS   ROLES                  AGE   VERSION
master    Ready    control-plane,master   26m   v1.22.2
node      Ready    <none>                 15s   v1.22.2

5,Master初始化

執行下面命令進行初始化,會自動從網路中下載需要的 Docker 映象。

此命令是用來部署主節點的Master

執行 kubeadm version 檢視版本,GitVersion:"v1.17.2" 中即為版本號。

執行以下命令初始化(記得把ip換了)

kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96

--ignore-preflight-errors=NumCPU 是在只有一個 CPU 的時候使用,例如 1G1M 的學生伺服器。

但是因為需要連線到 Google ,所以可能無法下載內容。

我們可以通過使用 kubeadm config images list 命令,列舉需要拉取的映象。我們來手動通過 Docker 拉取。這個過程比較麻煩,還需要手動修改映象名稱。

拉取方法 docker pull {映象名稱}

Google 訪問不了,不過 DockerHub 已經備份好需要的映象。

mirrorgooglecontainers 這個倉庫備份了相應的映象。遺憾的是,映象不一定都是最新的備份。阿里雲上面的 google_containers 倉庫應該是備份最新的。

例如需要以下映象

k8s.gcr.io/kube-apiserver:v1.22.2
k8s.gcr.io/kube-controller-manager:v1.22.2
k8s.gcr.io/kube-scheduler:v1.22.2
k8s.gcr.io/kube-proxy:v1.22.2
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/coredns:1.8.4

則拉取對應的映象

docker pull mirrorgooglecontainers/kube-apiserver:v1.22.2
docker pull mirrorgooglecontainers/kube-controller-manager:v1.22.2
docker pull mirrorgooglecontainers/kube-scheduler:v1.22.2
docker pull mirrorgooglecontainers/kube-proxy:v1.22.2
docker pull mirrorgooglecontainers/pause:3.5
docker pull mirrorgooglecontainers/etcd:3.5.0-0
docker pull coredns/coredns:1.8.4

使用 docker tag {舊名稱:版本}:{新名稱:版本} ,將映象改名。

考慮到各種情況和可能會出現問題,筆者這裡給出一個別人寫的一鍵指令碼,可以直接一鍵完成這一步。

touch pullk8s.sh	# 建立指令碼檔案
nano pullk8s.sh		# 編輯指令碼

然後將以下內容複製進去

for  i  in  `kubeadm config images list`;  do
    imageName=${i#k8s.gcr.io/}
    docker pull registry.aliyuncs.com/google_containers/$imageName
    docker tag registry.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
    docker rmi registry.aliyuncs.com/google_containers/$imageName
done;

儲存檔案

Ctrl + O
回車鍵
Ctrl + x

給指令碼檔案賦許可權

chmod +x pullk8s.sh

執行指令碼

sh pullk8s.sh

然後執行 docker images 命令檢視需要的映象是否都準備好了。

root@ubuntu:~# docker images
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-proxy                v1.22.2             cba2a99699bd        2 weeks ago         116MB
k8s.gcr.io/kube-apiserver            v1.22.2             41ef50a5f06a        2 weeks ago         171MB
k8s.gcr.io/kube-controller-manager   v1.22.2             da5fd66c4068        2 weeks ago         161MB
k8s.gcr.io/kube-scheduler            v1.22.2             f52d4c527ef2        2 weeks ago         94.4MB
k8s.gcr.io/coredns                   1.8.4               70f311871ae1        3 months ago        41.6MB
k8s.gcr.io/etcd                      3.5.0-0             303ce5db0e90        3 months ago        288MB
k8s.gcr.io/pause                     3.5                 da86e6ba6ca1        2 years ago         742kB

也可能會報錯,報錯的話就手動拉取

Error response from daemon: pull access denied for registry.aliyuncs.com/google_containers/coredns/coredns, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Error response from daemon: No such image: registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.4
Error: No such image: registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.4
docker pull coredns/coredns:1.8.4

#映象改名命令格式:
docker  tag  舊映象名  新映象名

最後執行 開頭的初始化命令。

kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96

因為阿里雲ecs裡沒有配置公網ip,etcd無法啟動,所以kubeadm在初始化會出現”timeout“的錯誤。

解決辦法:

1.建立兩個ssh對話,即用ssh工具新建兩個標籤,一個用來初始化節點,另一個在初始化過程中修改配置檔案。 注意是初始化過程中,每次執行kubeadm init,kubeadm都會生成etcd的配置檔案,如果提前修改了配置檔案,在執行kubeadm init時會把修改的結果覆蓋,那麼也就沒有作用了。

2.執行”kubeadm init …“上述的初始化命令,此時會卡在

Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed

3.在輸入上述命令後,kubeadm即開始了master節點的初始化,但是由於etcd配置檔案不正確,所以etcd無法啟動,要對該檔案進行修改。
檔案路徑"/etc/kubernetes/manifests/etcd.yaml"。

#對檔案這兩行進行修改
--listen-client-urls=https://127.0.0.1:2379,https://39.96.46.96:2379
--listen-peer-urls=https://39.96.46.96:2380
#修改後
--listen-client-urls=https://127.0.0.1:2379
--listen-peer-urls=https://127.0.0.1:2380

4.要關注的是"–listen-client-urls"和"–listen-peer-urls"。需要把"–listen-client-urls"後面的公網ip刪除,把"–listen-peer-urls"改為本地的地址。

稍等後master節點初始化就會完成

可能遇到的問題

[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

#執行指令
swapoff -a && kubeadm reset  && systemctl daemon-reload && systemctl restart kubelet  && iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
#再執行初始化就可以了
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96

可能遇到的問題

error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns/coredns:v1.8.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1

只需開啟這個網址https://www.ipaddress.com/,搜尋https://k8s.gcr.io得到它的 ip 142.250.113.82,開啟本機hosts檔案,Linux是

vim /etc/hosts,將上面的網址和ip按下面的形式加入進去即可,不是root使用者記得sudo

142.250.113.82  k8s.gcr.io

還是有問題

[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

是因為docker和kubernetes所使用的cgroup不一致導致

解決方法
在docker中修改配置檔案

cat > /etc/docker/daemon.json <<EOF
{"exec-opts": ["native.cgroupdriver=systemd"]}
EOF

重啟docker

systemctl restart docker

之後還是會有問題,這些就簡單了報什麼錯就解決什麼(在此我附上我遇到的問題)

[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists

#解決方法(此處省略了幾個,步驟都一樣我就不寫了)
cd /etc/kubernetes/manifests/
rm kube-apiserver.yaml
[ERROR Port-10250]: Port 10250 is in use

#解決方法(此處省略了幾個,步驟都一樣我就不寫了)
lsof -i:10250
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
kubelet 22055 root   27u  IPv6 773301      0t0  TCP *:10250 (LISTEN)
kill -9 22055
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty

#解決方法
cd /var/lib/etcd/
rm -r member/

再次執行初始化命令就會成功

#成功後的結果
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 39.96.46.96:6443 --token 9vbzuf.vtzj1w5vefjlwi0t \
        --discovery-token-ca-cert-hash sha256:b6e6fffb6b0e11d2db374ce21f6d86de3e09e1e13075e1bf01055130c2c5e060

在master節點上執行如下

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#檢查 master 
kubectl get nodes
root@ubuntu:~# kubectl get nodes
NAME     STATUS      ROLES                  AGE   VERSION
master   NotReady    control-plane,master   26h   v1.22.2
node     Ready       <none>                 15s   v1.22.2

#新增網路外掛
sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#結果
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

kubectl get pods --all-namespaces
#如果顯示這樣,個別的Pod是Pending狀態
NAMESPACE     NAME                             READY   STATUS     RESTARTS      AGE
kube-system   coredns-78fcd69978-fkkmh         0/1     Pending    0             17m
kube-system   coredns-78fcd69978-qrx2c         0/1     Pending    0             17m
kube-system   etcd-ubuntu                      1/1     Running    0             17m
kube-system   kube-apiserver-ubuntu            1/1     Running    1 (19m ago)   17m
kube-system   kube-controller-manager-ubuntu   1/1     Running    2 (20m ago)   17m
kube-system   kube-flannel-ds-g97gm            0/1     Init:0/1   0             80s
kube-system   kube-proxy-f6ctf                 1/1     Running    0             17m
kube-system   kube-scheduler-ubuntu            1/1     Running    2 (19m ago)   17m

#只需把  185.199.111.133 raw.githubusercontent.com  加到hosts檔案就可以,再次執行新增網路外掛的指令就OK了
NAMESPACE     NAME                             READY   STATUS    RESTARTS      AGE
kube-system   coredns-78fcd69978-fkkmh         1/1     Running   0             28m
kube-system   coredns-78fcd69978-qrx2c         1/1     Running   0             28m
kube-system   etcd-ubuntu                      1/1     Running   0             28m
kube-system   kube-apiserver-ubuntu            1/1     Running   1 (30m ago)   28m
kube-system   kube-controller-manager-ubuntu   1/1     Running   2 (31m ago)   28m
kube-system   kube-flannel-ds-g97gm            1/1     Running   0             11m
kube-system   kube-proxy-f6ctf                 1/1     Running   0             28m
kube-system   kube-scheduler-ubuntu            1/1     Running   2 (30m ago)   28m

kubectl get nodes
NAME     STATUS   ROLES                  AGE   VERSION
master   Ready    control-plane,master   26h   v1.22.2
node     Ready    <none>                 15s   v1.22.2

此處為止,k8s叢集基本安裝已完成,因為目前我暫時沒有dashboard的需求,所以暫時沒有安裝,等有需求了我再回來更新哈哈