阿里雲部署K8s叢集
首先說一下我的環境和配置:阿里雲1核2G,系統是Ubuntu18.04(最好是2核,因為master有限制,不是的話也沒關係,因為到時候可以忽略掉),node也是1核2G
好了開始進入正題吧
1,更新系統源
如果系統本身自帶得映象地址,伺服器在國外,下載速度會很慢,可以開啟 /etc/apt/sources.list
替換為國內得映象源。
apt upgrade
2,更新軟體包
將系統得軟體元件更新至最新穩定版本。
apt update
Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libcurl4 The following packages will be upgraded: curl libcurl4 2 upgraded, 0 newly installed, 0 to remove and 46 not upgraded. Need to get 378 kB of archives. After this operation, 0 B of additional disk space will be used. Do you want to continue? [Y/n] y Ign:1 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 curl amd64 7.58.0-2ubuntu3.14 Ign:2 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 libcurl4 amd64 7.58.0-2ubuntu3.14 Err:1 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 curl amd64 7.58.0-2ubuntu3.14 404 Not Found [IP: 100.100.2.148 80] Err:2 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 libcurl4 amd64 7.58.0-2ubuntu3.14 404 Not Found [IP: 100.100.2.148 80] E: Failed to fetch http://mirrors.cloud.aliyuncs.com/ubuntu/pool/main/c/curl/curl_7.58.0-2ubuntu3.14_amd64.deb 404 Not Found [IP: 100.100.2.148 80] E: Failed to fetch http://mirrors.cloud.aliyuncs.com/ubuntu/pool/main/c/curl/libcurl4_7.58.0-2ubuntu3.14_amd64.deb 404 Not Found [IP: 100.100.2.148 80] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
不更新的話會遇到這個問題,所以記得更新哦,而且上邊已給了提示run apt-get update or try with --fix-missing
3,安裝 Docker
也可以參考其它過程安裝
apt-get install docker.io
如果需要配置為開機啟動,可執行以下命令
systemcd enable docker
systemcd start docker
如果要配置 Docker 映象加速,開啟 /etc/docker/daemon.json
檔案,registry-mirrors 增加或修改,加入https://registry.docker-cn.com
這個地址,也可以填寫阿里雲騰訊雲等映象加速地址。
示例
{
"registry-mirrors": [
"https://registry.docker-cn.com"
]
}
重啟 Docker,使配置生效
sudo systemctl daemon-reload
sudo systemctl restart docker
4,安裝 K8S
禁用 swapoff
# 暫時關閉SWAP分割槽
swapoff -a
# 永久禁用SWAP分割槽
swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
將系統中橋接的IPv4以及IPv6的流量串通:
cat >/etc/sysctl.d/k8s.conf << EOF net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 EOF sysctl --system
執行以下命令安裝 https 工具以及 k8s。
apt-get update && apt-get install -y apt-transport-https curl
apt-get install -y kubelet kubeadm kubectl --allow-unauthenticated
#常用命令
重啟kubelet服務:
systemctl daemon-reload
systemctl restart kubelet
sudo systemctl restart kubelet.service
sudo systemctl daemon-reload
sudo systemctl stop kubelet
sudo systemctl enable kubelet
sudo systemctl start kubelet
執行下面命令測試是否正常
kubeadm version
#結果示例
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:37:34Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
如果安裝時,出現下面情況,說明系統得映象源中,找不到 k8s 的軟體包。
E: Unable to locate package kubelet
E: Unable to locate package kubeadm
E: Unable to locate package kubectl
可以開啟 /etc/apt/sources.list
檔案,新增一行
deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main
先執行更新軟體包命令,再次執行安裝 K8s 的命令。
這一步安裝curl時可能會遇到這個問題
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
Reading package lists... Done
W: GPG error: https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
E: The repository 'https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
只需執行即可(key就是NO_PUBKEY後的值,根據你自己的key進行替換)
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys FEEA9169307EA071
上面命令,安裝了 kubelet
、kubeadm
、kubectl
,kubelet
是 k8s 相關服務,kubectl
是 k8s
管理客戶端,kubeadm
是部署工具。
如果只是node的話到這裡就可以了
另一臺阿里雲加入叢集只需執行(這個在下面會告訴你怎麼弄出來的,等全看完再回來搞就行)
kubeadm join 39.96.46.96:6443 --token 9vbzuf.vtzj1w5vefjlwi0t --discovery-token-ca-cert-hash sha256:b6e6fffb6b0e11d2db374ce21f6d86de3e09e1e13075e1bf01055130c2c5e060
可能會遇到的錯
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
#解決:
#是因為docker和kubernetes所使用的cgroup不一致導致
#在docker中修改配置檔案
cat > /etc/docker/daemon.json <<EOF
{"exec-opts": ["native.cgroupdriver=systemd"]}
EOF
#重啟docker
systemctl restart docker
[kubelet-check] Initial timeout of 40s passed.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher
#解決:
swapoff -a
kubeadm reset
systemctl daemon-reload
systemctl restart kubelet
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
#再次執行join命令,node加入成功
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
#但是會檢視到node節點是NotReady狀態。 因為在K8S中,整個系統處於一個扁平的網路之下,其中在谷歌內部的網路架構中,這個是天然存在的,但是在我們自己的環境裡這個需要設定。所以下一步就是部署網路環境,此處我們使用的就是由Ubuntu開發的flannel網路元件。(這步可能會報錯,先繼續向下看弄好master再看這裡就會懂了)
#執行後面拉取docker映象的指令碼,拉取映象,再回到master上重新執行
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#使用“kubectl get node”可檢視所有節點都處於ready狀態了
root@ubuntu:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 26m v1.22.2
node Ready <none> 15s v1.22.2
5,Master初始化
執行下面命令進行初始化,會自動從網路中下載需要的 Docker 映象。
此命令是用來部署主節點的Master。
執行 kubeadm version
檢視版本,GitVersion:"v1.17.2"
中即為版本號。
執行以下命令初始化(記得把ip換了)
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96
--ignore-preflight-errors=NumCPU
是在只有一個 CPU 的時候使用,例如 1G1M 的學生伺服器。
但是因為需要連線到 Google ,所以可能無法下載內容。
我們可以通過使用 kubeadm config images list
命令,列舉需要拉取的映象。我們來手動通過 Docker 拉取。這個過程比較麻煩,還需要手動修改映象名稱。
拉取方法 docker pull {映象名稱}
。
Google 訪問不了,不過 DockerHub 已經備份好需要的映象。
mirrorgooglecontainers 這個倉庫備份了相應的映象。遺憾的是,映象不一定都是最新的備份。阿里雲上面的 google_containers 倉庫應該是備份最新的。
例如需要以下映象
k8s.gcr.io/kube-apiserver:v1.22.2
k8s.gcr.io/kube-controller-manager:v1.22.2
k8s.gcr.io/kube-scheduler:v1.22.2
k8s.gcr.io/kube-proxy:v1.22.2
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/coredns:1.8.4
則拉取對應的映象
docker pull mirrorgooglecontainers/kube-apiserver:v1.22.2
docker pull mirrorgooglecontainers/kube-controller-manager:v1.22.2
docker pull mirrorgooglecontainers/kube-scheduler:v1.22.2
docker pull mirrorgooglecontainers/kube-proxy:v1.22.2
docker pull mirrorgooglecontainers/pause:3.5
docker pull mirrorgooglecontainers/etcd:3.5.0-0
docker pull coredns/coredns:1.8.4
使用 docker tag {舊名稱:版本}:{新名稱:版本}
,將映象改名。
考慮到各種情況和可能會出現問題,筆者這裡給出一個別人寫的一鍵指令碼,可以直接一鍵完成這一步。
touch pullk8s.sh # 建立指令碼檔案
nano pullk8s.sh # 編輯指令碼
然後將以下內容複製進去
for i in `kubeadm config images list`; do
imageName=${i#k8s.gcr.io/}
docker pull registry.aliyuncs.com/google_containers/$imageName
docker tag registry.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
docker rmi registry.aliyuncs.com/google_containers/$imageName
done;
儲存檔案
Ctrl + O
回車鍵
Ctrl + x
給指令碼檔案賦許可權
chmod +x pullk8s.sh
執行指令碼
sh pullk8s.sh
然後執行 docker images
命令檢視需要的映象是否都準備好了。
root@ubuntu:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.22.2 cba2a99699bd 2 weeks ago 116MB
k8s.gcr.io/kube-apiserver v1.22.2 41ef50a5f06a 2 weeks ago 171MB
k8s.gcr.io/kube-controller-manager v1.22.2 da5fd66c4068 2 weeks ago 161MB
k8s.gcr.io/kube-scheduler v1.22.2 f52d4c527ef2 2 weeks ago 94.4MB
k8s.gcr.io/coredns 1.8.4 70f311871ae1 3 months ago 41.6MB
k8s.gcr.io/etcd 3.5.0-0 303ce5db0e90 3 months ago 288MB
k8s.gcr.io/pause 3.5 da86e6ba6ca1 2 years ago 742kB
也可能會報錯,報錯的話就手動拉取
Error response from daemon: pull access denied for registry.aliyuncs.com/google_containers/coredns/coredns, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Error response from daemon: No such image: registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.4
Error: No such image: registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.4
docker pull coredns/coredns:1.8.4
#映象改名命令格式:
docker tag 舊映象名 新映象名
最後執行 開頭的初始化命令。
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96
因為阿里雲ecs裡沒有配置公網ip,etcd無法啟動,所以kubeadm在初始化會出現”timeout“的錯誤。
解決辦法:
1.建立兩個ssh對話,即用ssh工具新建兩個標籤,一個用來初始化節點,另一個在初始化過程中修改配置檔案。 注意是初始化過程中,每次執行kubeadm init,kubeadm都會生成etcd的配置檔案,如果提前修改了配置檔案,在執行kubeadm init時會把修改的結果覆蓋,那麼也就沒有作用了。
2.執行”kubeadm init …“上述的初始化命令,此時會卡在
Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed
3.在輸入上述命令後,kubeadm即開始了master節點的初始化,但是由於etcd配置檔案不正確,所以etcd無法啟動,要對該檔案進行修改。
檔案路徑"/etc/kubernetes/manifests/etcd.yaml"。
#對檔案這兩行進行修改
--listen-client-urls=https://127.0.0.1:2379,https://39.96.46.96:2379
--listen-peer-urls=https://39.96.46.96:2380
#修改後
--listen-client-urls=https://127.0.0.1:2379
--listen-peer-urls=https://127.0.0.1:2380
4.要關注的是"–listen-client-urls"和"–listen-peer-urls"。需要把"–listen-client-urls"後面的公網ip刪除,把"–listen-peer-urls"改為本地的地址。
稍等後master節點初始化就會完成
可能遇到的問題
[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher
#執行指令
swapoff -a && kubeadm reset && systemctl daemon-reload && systemctl restart kubelet && iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
#再執行初始化就可以了
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96
可能遇到的問題
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns/coredns:v1.8.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
只需開啟這個網址https://www.ipaddress.com/,搜尋https://k8s.gcr.io得到它的 ip 142.250.113.82,開啟本機hosts檔案,Linux是
vim /etc/hosts,將上面的網址和ip按下面的形式加入進去即可,不是root使用者記得sudo
142.250.113.82 k8s.gcr.io
還是有問題
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
是因為docker和kubernetes所使用的cgroup不一致導致
解決方法
在docker中修改配置檔案
cat > /etc/docker/daemon.json <<EOF
{"exec-opts": ["native.cgroupdriver=systemd"]}
EOF
重啟docker
systemctl restart docker
之後還是會有問題,這些就簡單了報什麼錯就解決什麼(在此我附上我遇到的問題)
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
#解決方法(此處省略了幾個,步驟都一樣我就不寫了)
cd /etc/kubernetes/manifests/
rm kube-apiserver.yaml
[ERROR Port-10250]: Port 10250 is in use
#解決方法(此處省略了幾個,步驟都一樣我就不寫了)
lsof -i:10250
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kubelet 22055 root 27u IPv6 773301 0t0 TCP *:10250 (LISTEN)
kill -9 22055
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
#解決方法
cd /var/lib/etcd/
rm -r member/
再次執行初始化命令就會成功
#成功後的結果
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 39.96.46.96:6443 --token 9vbzuf.vtzj1w5vefjlwi0t \
--discovery-token-ca-cert-hash sha256:b6e6fffb6b0e11d2db374ce21f6d86de3e09e1e13075e1bf01055130c2c5e060
在master節點上執行如下
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#檢查 master
kubectl get nodes
root@ubuntu:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master NotReady control-plane,master 26h v1.22.2
node Ready <none> 15s v1.22.2
#新增網路外掛
sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#結果
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
kubectl get pods --all-namespaces
#如果顯示這樣,個別的Pod是Pending狀態
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-78fcd69978-fkkmh 0/1 Pending 0 17m
kube-system coredns-78fcd69978-qrx2c 0/1 Pending 0 17m
kube-system etcd-ubuntu 1/1 Running 0 17m
kube-system kube-apiserver-ubuntu 1/1 Running 1 (19m ago) 17m
kube-system kube-controller-manager-ubuntu 1/1 Running 2 (20m ago) 17m
kube-system kube-flannel-ds-g97gm 0/1 Init:0/1 0 80s
kube-system kube-proxy-f6ctf 1/1 Running 0 17m
kube-system kube-scheduler-ubuntu 1/1 Running 2 (19m ago) 17m
#只需把 185.199.111.133 raw.githubusercontent.com 加到hosts檔案就可以,再次執行新增網路外掛的指令就OK了
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-78fcd69978-fkkmh 1/1 Running 0 28m
kube-system coredns-78fcd69978-qrx2c 1/1 Running 0 28m
kube-system etcd-ubuntu 1/1 Running 0 28m
kube-system kube-apiserver-ubuntu 1/1 Running 1 (30m ago) 28m
kube-system kube-controller-manager-ubuntu 1/1 Running 2 (31m ago) 28m
kube-system kube-flannel-ds-g97gm 1/1 Running 0 11m
kube-system kube-proxy-f6ctf 1/1 Running 0 28m
kube-system kube-scheduler-ubuntu 1/1 Running 2 (30m ago) 28m
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 26h v1.22.2
node Ready <none> 15s v1.22.2
此處為止,k8s叢集基本安裝已完成,因為目前我暫時沒有dashboard的需求,所以暫時沒有安裝,等有需求了我再回來更新哈哈