RKE搭建k8s叢集&Helm3安裝Rancher2.5.8高可用
HPA控制器介紹
當系統資源過高的時候,我們可以使用如下命令來實現 Pod 的擴縮容功能
$ kubectl -n luffy scale deployment myblog --replicas=2
但是這個過程是手動操作的。在實際專案中,我們需要做到是的是一個自動化感知並自動擴容的操作。Kubernetes 也為提供了這樣的一個資源物件:Horizontal Pod Autoscaling(Pod 水平自動伸縮),簡稱HPA
基本原理:HPA 通過監控分析控制器控制的所有 Pod 的負載變化情況來確定是否需要調整 Pod 的副本數量
HPA的實現有兩個版本:
- autoscaling/v1,只包含了根據CPU指標的檢測,穩定版本
- autoscaling/v2beta1,支援根據memory或者使用者自定義指標進行伸縮
如何獲取Pod的監控資料?
- k8s 1.8以下:使用heapster,1.11版本完全廢棄
- k8s 1.8以上:使用metric-server
思考:為什麼之前用 heapster ,現在廢棄了專案,改用 metric-server ?
heapster時代,apiserver 會直接將metric請求通過apiserver proxy 的方式轉發給叢集內的 hepaster 服務,採用這種 proxy 方式是有問題的:
-
http://kubernetes_master_address/api/v1/namespaces/namespace_name/services/service_name[:port_name]/proxy
-
proxy只是代理請求,一般用於問題排查,不夠穩定,且版本不可控
-
heapster的介面不能像apiserver一樣有完整的鑑權以及client整合
-
pod 的監控資料是核心指標(HPA排程),應該和 pod 本身擁有同等地位,即 metric應該作為一種資源存在,如metrics.k8s.io 的形式,稱之為 Metric Api
於是官方從 1.8 版本開始逐步廢棄 heapster,並提出了上邊 Metric api 的概念,而 metrics-server 就是這種概念下官方的一種實現,用於從 kubelet獲取指標,替換掉之前的 heapster。
Metrics Server
https://192.168.136.10:6443/apis/metrics.k8s.io/v1beta1/namespaces/<namespace-name>/pods/<pod-name>
# https://192.168.136.10:6443/api/v1/namespaces/luffy/pods?limit=500
目前的採集流程:
Metric Server
...
Metric server collects metrics from the Summary API, exposed by Kubelet on each node.
Metrics Server registered in the main API server through Kubernetes aggregator, which was introduced in Kubernetes 1.7
...
安裝
官方程式碼倉庫地址:https://github.com/kubernetes-sigs/metrics-server
Depending on your cluster setup, you may also need to change flags passed to the Metrics Server container. Most useful flags:
--kubelet-preferred-address-types
- The priority of node address types used when determining an address for connecting to a particular node (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])--kubelet-insecure-tls
- Do not verify the CA of serving certificates presented by Kubelets. For testing purposes only.--requestheader-client-ca-file
- Specify a root certificate bundle for verifying client certificates on incoming requests.
$ wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
修改args引數:
...
84 containers:
85 - name: metrics-server
86 image: registry.aliyuncs.com/google_containers/metrics-server-amd64:v0.3.6
87 imagePullPolicy: IfNotPresent
88 args:
89 - --cert-dir=/tmp
90 - --secure-port=4443
91 - --kubelet-insecure-tls
92 - --kubelet-preferred-address-types=InternalIP
...
執行安裝:
$ kubectl create -f components.yaml
$ kubectl -n kube-system get pods
$ kubectl top nodes
kubelet的指標採集
無論是 heapster還是 metric-server,都只是資料的中轉和聚合,兩者都是呼叫的 kubelet 的 api 介面獲取的資料,而 kubelet 程式碼中實際採集指標的是 cadvisor 模組,你可以在 node 節點訪問 10250 埠獲取監控資料:
- Kubelet Summary metrics: https://127.0.0.1:10250/metrics,暴露 node、pod 彙總資料
- Cadvisor metrics: https://127.0.0.1:10250/metrics/cadvisor,暴露 container 維度資料
呼叫示例:
$ curl -k -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6InhXcmtaSG5ZODF1TVJ6dUcycnRLT2c4U3ZncVdoVjlLaVRxNG1wZ0pqVmcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi10b2tlbi1xNXBueiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJhZG1pbiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImViZDg2ODZjLWZkYzAtNDRlZC04NmZlLTY5ZmE0ZTE1YjBmMCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbiJ9.iEIVMWg2mHPD88GQ2i4uc_60K4o17e39tN0VI_Q_s3TrRS8hmpi0pkEaN88igEKZm95Qf1qcN9J5W5eqOmcK2SN83Dd9dyGAGxuNAdEwi0i73weFHHsjDqokl9_4RGbHT5lRY46BbIGADIphcTeVbCggI6T_V9zBbtl8dcmsd-lD_6c6uC2INtPyIfz1FplynkjEVLapp_45aXZ9IMy76ljNSA8Uc061Uys6PD3IXsUD5JJfdm7lAt0F7rn9SdX1q10F2lIHYCMcCcfEpLr4Vkymxb4IU4RCR8BsMOPIO_yfRVeYZkG4gU2C47KwxpLsJRrTUcUXJktSEPdeYYXf9w" https://localhost:10250/metrics
kubelet雖然提供了 metric 介面,但實際監控邏輯由內建的cAdvisor模組負責,早期的時候,cadvisor是單獨的元件,從k8s 1.12開始,cadvisor 監聽的埠在k8s中被刪除,所有監控資料統一由Kubelet的API提供。
cadvisor獲取指標時實際呼叫的是 runc/libcontainer庫,而libcontainer是對 cgroup檔案 的封裝,即 cadvsior也只是個轉發者,它的資料來自於cgroup檔案。
cgroup檔案中的值是監控資料的最終來源,如
-
mem usage的值,
-
對於docker容器來講,來源於
/sys/fs/cgroup/memory/docker/[containerId]/memory.usage_in_bytes
-
對於pod來講,
/sys/fs/cgroup/memory/kubepods/besteffort/pod[podId]/memory.usage_in_bytes
或者/sys/fs/cgroup/memory/kubepods/burstable/pod[podId]/memory.usage_in_bytes
-
-
如果沒限制記憶體,Limit = machine_mem,否則來自於
/sys/fs/cgroup/memory/docker/[id]/memory.limit_in_bytes
-
記憶體使用率 = memory.usage_in_bytes/memory.limit_in_bytes
Metrics資料流:
思考:
Metrics Server是獨立的一個服務,只能服務內部實現自己的api,是如何做到通過標準的kubernetes 的API格式暴露出去的?
kube-aggregator
kube-aggregator聚合器及Metric-Server的實現
kube-aggregator是對 apiserver 的api的一種拓展機制,它允許開發人員編寫一個自己的服務,並把這個服務註冊到k8s的api裡面,即擴充套件 API 。
定義一個APIService物件:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.luffy.k8s.io
spec:
group: luffy.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: service-A # 必須https訪問
namespace: luffy
port: 443
version: v1beta1
versionPriority: 100
k8s會自動幫我們代理如下url的請求:
proxyPath := "/apis/" + apiService.Spec.Group + "/" + apiService.Spec.Version
即:https://192.168.136.10:6443/apis/luffy.k8s.io/v1beta1/xxxx轉到我們的service-A服務中,service-A中只需要實現 https://service-A/luffy.k8s.io/v1beta1/xxxx
即可。
看下metric-server的實現:
$ kubectl get apiservice
NAME SERVICE AVAILABLE
v1beta1.metrics.k8s.io kube-system/metrics-server True
$ kubectl get apiservice v1beta1.metrics.k8s.io -oyaml
...
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
port: 443
version: v1beta1
versionPriority: 100
...
$ kubectl -n kube-system get svc metrics-server
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
metrics-server ClusterIP 10.110.111.146 <none> 443/TCP 11h
$ curl -k -H "Authorization: Bearer xxxx" https://10.110.111.146
{
"paths": [
"/apis",
"/apis/metrics.k8s.io",
"/apis/metrics.k8s.io/v1beta1",
"/healthz",
"/healthz/healthz",
"/healthz/log",
"/healthz/ping",
"/healthz/poststarthook/generic-apiserver-start-informers",
"/metrics",
"/openapi/v2",
"/version"
]
# https://192.168.136.10:6443/apis/metrics.k8s.io/v1beta1/namespaces/<namespace-name>/pods/<pod-name>
#
$ curl -k -H "Authorization: Bearer xxxx" https://10.110.111.146/apis/metrics.k8s.io/v1beta1/namespaces/luffy/pods/myblog-5d9ff54d4b-4rftt
$ curl -k -H "Authorization: Bearer xxxx" https://192.168.136.10:6443/apis/metrics.k8s.io/v1beta1/namespaces/luffy/pods/myblog-5d9ff54d4b-4rftt
HPA實踐
基於CPU的動態伸縮
建立hpa物件:
# 方式一
$ cat hpa-myblog.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-myblog-cpu
namespace: luffy
spec:
maxReplicas: 3
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myblog
targetCPUUtilizationPercentage: 10
# 方式二
$ kubectl -n luffy autoscale deployment myblog --cpu-percent=10 --min=1 --max=3
Deployment物件必須配置requests的引數,不然無法獲取監控資料,也無法通過HPA進行動態伸縮
驗證:
$ yum -y install httpd-tools
$ kubectl -n luffy get svc myblog
myblog ClusterIP 10.104.245.225 <none> 80/TCP 6d18h
# 為了更快看到效果,先調整副本數為1
$ kubectl -n luffy scale deploy myblog --replicas=1
# 模擬1000個使用者併發訪問頁面10萬次
$ ab -n 100000 -c 1000 http://10.104.245.225/blog/index/
$ kubectl get hpa
$ kubectl -n luffy get pods
壓力降下來後,會有預設5分鐘的scaledown的時間,可以通過controller-manager的如下引數設定:
--horizontal-pod-autoscaler-downscale-stabilization
The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).
是一個逐步的過程,當前的縮放完成後,下次縮放的時間間隔,比如從3個副本降低到1個副本,中間大概會等待2*5min = 10分鐘
基於記憶體的動態伸縮
建立hpa物件
$ cat hpa-demo-mem.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-mem
namespace: luffy
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-mem
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: memory
targetAverageUtilization: 30
加壓演示指令碼:
$ cat increase-mem-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: increase-mem-config
namespace: luffy
data:
increase-mem.sh: |
#!/bin/bash
mkdir /tmp/memory
mount -t tmpfs -o size=40M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block
sleep 60
rm /tmp/memory/block
umount /tmp/memory
rmdir /tmp/memory
測試deployment:
$ cat hpa-demo-mem-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-mem
namespace: luffy
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: increase-mem-script
configMap:
name: increase-mem-config
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
volumeMounts:
- name: increase-mem-script
mountPath: /etc/script
resources:
requests:
memory: 50Mi
cpu: 50m
securityContext:
privileged: true
測試:
$ kubectl create -f increase-mem-config.yaml
$ kubectl create -f hpa-demo-mem.yaml
$ kubectl create -f hpa-demo-mem-deploy.yaml
$ kubectl -n luffy exec -ti hpa-demo-mem-7fc75bf5c8-xx424 sh
#/ sh /etc/script/increase-mem.sh
# 觀察hpa及pod
$ kubectl -n luffy get hpa
$ kubectl -n luffy get po
基於自定義指標的動態伸縮
除了基於 CPU 和記憶體來進行自動擴縮容之外,我們還可以根據自定義的監控指標來進行。這個我們就需要使用 Prometheus Adapter
,Prometheus 用於監控應用的負載和叢集本身的各種指標,Prometheus Adapter
可以幫我們使用 Prometheus 收集的指標並使用它們來制定擴充套件策略,這些指標都是通過 APIServer 暴露的,而且 HPA 資源物件也可以很輕易的直接使用。
架構圖: