KubeSphere排錯實戰
概述:近期在使用QingCloud的Kubesphere,極好的使用者體驗,私有化部署,無基礎設施依賴,無 Kubernetes 依賴,支援跨物理機、虛擬機器器、雲平臺部署,可以納管不同版本、不同廠商的 Kubernetes 叢集。在k8s上層進行了封裝實現了基於角色的許可權控制,DevOPS流水線快速實現CI/CD,內建harbor/gitlab/jenkins/sonarqube等常用工具,基於基於 OpenPitrix 提供應用的全生命週期管理,包含開發、測試、釋出、升級,下架等應用相關操作自己體驗還是非常的棒。 同樣作為開源專案,難免存在一些bug,在自己的使用中遇到下排錯思路,非常感謝qingcloud社群提供的技術協助,對k8s有興趣的可以去體驗下國產的平臺,如絲般順滑的體驗,rancher的使用者也可以來對不體驗下。
一 清理退出狀態的容器
在叢集執行一段時間後,有些container由於異常狀態退出Exited,需要去及時清理釋放磁碟,可以將其設定成定時任務執行
docker rm `docker ps -a |grep Exited |awk '{print $1}'`
複製程式碼
二 清理異常或被驅逐的pod
- 清理kubesphere-devops-system的ns下清理
kubectl delete pods -n kubesphere-devops-system $(kubectl get pods -n kubesphere-devops-system |grep Evicted|awk '{print $1}' )
kubectl delete pods -n kubesphere-devops-system $(kubectl get pods -n kubesphere-devops-system |grep CrashLoopBackOff|awk '{print $1}')
複製程式碼
- 為方便清理指定ns清理evicted/crashloopbackoff的pod/清理exited的容器
#!/bin/bash
# auth:kaliarch
clear_evicted_pod() {
ns=$1
kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep Evicted|awk '{print $1}')
}
clear_crash_pod() {
ns=$1
kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep CrashLoopBackOff|awk '{print $1}')
}
clear_exited_container() {
docker rm `docker ps -a |grep Exited |awk '{print $1}'`
}
echo "1.clear exicted pod"
echo "2.clear crash pod"
echo "3.clear exited container"
read -p "Please input num:" num
case ${num} in
"1")
read -p "Please input oper namespace:" ns
clear_evicted_pod ${ns}
;;
"2")
read -p "Please input oper namespace:" ns
clear_crash_pod ${ns}
;;
"3")
clear_exited_container
;;
"*")
echo "input error"
;;
esac
複製程式碼
- 清理全部ns中evicted/crashloopbackoff的pod
# 獲取所有ns
kubectl get ns|grep -v "NAME"|awk '{print $1}'
# 清理驅逐狀態的pod
for ns in `kubectl get ns|grep -v "NAME"|awk '{print $1}'`;do kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep Evicted|awk '{print $1}');done
# 清理異常pod
for ns in `kubectl get ns|grep -v "NAME"|awk '{print $1}'`;do kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep CrashLoopBackOff|awk '{print $1}');done
複製程式碼
三 將docker資料遷移
在安裝過程中未指定docker資料目錄,系統盤50G,隨著時間推移磁碟不夠用,需要遷移docker資料,使用軟連線方式: 首選掛載新磁碟到/data目錄
systemctl stop docker
mkdir -p /data/docker/
rsync -avz /var/lib/docker/ /data/docker/
mv /var/lib/docker /data/docker_bak
ln -s /data/docker /var/lib/
systemctl daemon-reload
systemctl start docker
複製程式碼
四 kubesphere網路排錯
- 問題描述:
在kubesphere的node節點或master節點,手動去啟動容器,在容器裡面無法連通公網,是我的配置哪裡不對麼,之前預設使用calico,現在改成fluannel也不行,在kubesphere中部署deployment中的pod的容器上可以出公網,在node或master單獨手動啟動的訪問不了公網
檢視手動啟動的容器網路上走的docker0
root@fd1b8101475d:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
105: eth0@if106: <BROADCAST,MULTICAST,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
複製程式碼
在pods中的容器網路用的是kube-ipvs0
1: lo: <LOOPBACK,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if18: <BROADCAST,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether c2:27:44:13:df:5d brd ff:ff:ff:ff:ff:ff
inet 10.233.97.175/32 scope global eth0
valid_lft forever preferred_lft forever
複製程式碼
- 解決方案:
檢視docker啟動配置
修改檔案/etc/systemd/system/docker.service.d/docker-options.conf中去掉引數:--iptables=false 這個引數等於false時會不寫iptables
[Service]
Environment="DOCKER_OPTS= --registry-mirror=https://registry.docker-cn.com --data-root=/var/lib/docker --log-opt max-size=10m --log-opt max-file=3 --insecure-registry=harbor.devops.kubesphere.local:30280"
複製程式碼
五 kubesphere 應用路由異常
在kubesphere中應用路由ingress使用的是nginx,在web介面配置會導致兩個host使用同一個ca證書,可以通過註釋檔案配置
⚠️注意:ingress控制deployment在:
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
name: prod-app-ingress
namespace: prod-net-route
resourceVersion: '8631859'
labels:
app: prod-app-ingress
annotations:
desc: 生產環境應用路由
nginx.ingress.kubernetes.io/client-body-buffer-size: 1024m
nginx.ingress.kubernetes.io/proxy-body-size: 2048m
nginx.ingress.kubernetes.io/proxy-read-timeout: '3600'
nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
nginx.ingress.kubernetes.io/service-upstream: 'true'
spec:
tls:
- hosts:
- smartms.tools.anchnet.com
secretName: smartms-ca
- hosts:
- smartsds.tools.anchnet.com
secretName: smartsds-ca
rules:
- host: smartms.tools.anchnet.com
http:
paths:
- path: /
backend:
serviceName: smartms-frontend-svc
servicePort: 80
- host: smartsds.tools.anchnet.com
http:
paths:
- path: /
backend:
serviceName: smartsds-frontend-svc
servicePort: 80
複製程式碼
六 kubesphere更新jenkins的agent
使用者在自己的使用場景當中,可能會使用不同的語言版本活不同的工具版本。這篇檔案主要介紹如何替換內建的 agent。
預設base-build映象中沒有sonar-scanner工具,Kubesphere Jenkins 的每一個 agent 都是一個Pod,如果要替換內建的agent,就需要替換 agent 的相應映象。
構建最新 kubesphere/builder-base:advanced-1.0.0 版本的 agent 映象
更新為指定的自定義映象:ccr.ccs.tencentyun.com/testns/base:v1
參考連結:kubesphere.io/docs/advanc…
在 KubeSphere 修改 jenkins-casc-config 以後,您需要在 Jenkins Dashboard 系統管理下的 configuration-as-code 頁面重新載入您更新過的系統配置。
參考:
jenkins中更新base映象 ⚠️先修改kubesphere中jenkins的配置,jenkins-casc-config七 Devops中Mail傳送
參考:www.cloudbees.com/blog/mail-s…
內建變數:
變數名 | 解釋 |
---|---|
BUILD_NUMBER | The current build number,such as "153" |
BUILD_ID | The current build ID,identical to BUILD_NUMBER for builds created in 1.597+,but a YYYY-MM-DD_hh-mm-ss timestamp for older builds |
BUILD_DISPLAY_NAME | The display name of the current build,which is something like "#153" by default. |
JOB_NAME | Name of the project of this build,such as "foo" or "foo/bar". (To strip off folder paths from a Bourne shell script,try: ${JOB_NAME##*/}) |
BUILD_TAG | String of "jenkins-{BUILD_NUMBER}". Convenient to put into a resource file,a jar file,etc for easier identification. |
EXECUTOR_NUMBER | The unique number that identifies the current executor (among executors of the same machine) that’s carrying out this build. This is the number you see in the "build executor status",except that the number starts from 0,not 1. |
NODE_NAME | Name of the slave if the build is on a slave,or "master" if run on master |
NODE_LABELS | Whitespace-separated list of labels that the node is assigned. |
WORKSPACE | The absolute path of the directory assigned to the build as a workspace. |
JENKINS_HOME | The absolute path of the directory assigned on the master node for Jenkins to store data. |
JENKINS_URL | Full URL of Jenkins,like http://server:port/jenkins/ (note: only available if Jenkins URL set in system configuration) |
BUILD_URL | Full URL of this build,like http://server:port/jenkins/job/foo/15/ (Jenkins URL must be set) |
SVN_REVISION | Subversion revision number that's currently checked out to the workspace,such as "12345" |
SVN_URL | Subversion URL that's currently checked out to the workspace. |
JOB_URL | Full URL of this job,like http://server:port/jenkins/job/foo/ (Jenkins URL must be set) |
最終自己寫了適應自己業務的模版,可以直接使用
mail to: '[email protected]',charset:'UTF-8',// or GBK/GB18030
mimeType:'text/plain',// or text/html
subject: "Kubesphere ${env.JOB_NAME} [${env.BUILD_NUMBER}] 釋出正常Running Pipeline: ${currentBuild.fullDisplayName}",body: """
---------Anchnet Devops Kubesphere Pipeline job--------------------
專案名稱 : ${env.JOB_NAME}
構建次數 : ${env.BUILD_NUMBER}
掃描資訊 : 地址:${SONAR_HOST}
映象地址 : ${REGISTRY}/${QHUB_NAMESPACE}/${APP_NAME}:${IMAGE_TAG}
構建詳情:SUCCESSFUL: Job ${env.JOB_NAME} [${env.BUILD_NUMBER}]
構建狀態 : ${env.JOB_NAME} jenkins 釋出執行正常
構建URL : ${env.BUILD_URL}"""
複製程式碼