利用prometheus監控K8S
此處只講如何用prometheus來監控K8S集群,關於prometheus的知識參考官方文檔。
部署前提: 準備好所需要的文件
$ ls -l Prometheus/prometheus#:/data/Prometheus/prometheus# ls -l total 28 drwxr-xr-x 2 root root 4096 Jan 15 02:53 grafana drwxr-xr-x 2 root root 4096 Jan 15 03:11 kube-state-metrics -rw-r--r-- 1 root root 60 Jan 14 06:48 namespace.yaml drwxr-xr-x 2 root root 4096 Jan 15 03:22 node-directory-size-metrics drwxr-xr-x 2 root root 4096 Jan 15 03:02 node-exporter drwxr-xr-x 2 root root 4096 Jan 15 02:55 prometheus drwxr-xr-x 2 root root 4096 Jan 15 02:37 rbac $ ls grafana/ grafana-configmap.yaml grafana-core-deployment.yaml grafana-import-dashboards-job.yaml grafana-pvc-claim.yaml grafana-pvc-volume.yaml grafana-service.yaml $ ls prometheus/ configmap.yaml deployment.yaml prometheus-rules.yaml service.yaml
grafana和 prometheus 都是部署文件,node-exporter、kube-state-metrics、node-directory-size-metrics這三個是采集器,相當於prometheus的agent
文件準備好了,現在開始一步一步來部署:
1,,創建所需Namespace
因為prometheus 部署的所有的deploy、pod、svc都是在monitoring完成的,所以需要事先創建之。
$ cat namespace.yaml apiVersion: v1 kind: Namespace metadata: name: monitoring $ kubectl create -f namespace.yaml namespace "monitoring" created
2,創建grafana的pv、 pvc
grafana# cat grafana-pvc-volume.yaml kind: PersistentVolume apiVersion: v1 metadata: name: grafana-pv-volume labels: type: local spec: storageClassName: grafana-pv-volume capacity: storage: 10Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle hostPath: path: "/data/volume/grafana" grafana# cat grafana-pvc-claim.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: grafana-pvc-volume namespace: "monitoring" spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi storageClassName: grafana-pv-volume $ kubectl create -f grafana/grafana-pvc-volume.yaml -f grafana/grafana-pvc-claim.yaml persistentvolume "grafana-pv-volume" created persistentvolumeclaim "grafana-pvc-volume" created $ kubectl get pvc -n monitoring NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE grafana-pvc-volume Bound grafana-pv-volume 10Gi RWO grafana-pv-volume 52s 狀態bound已綁定到了 grafana-pv-volume
3,創建grafana應用,這些應用都是第三方的,都會有自已的配置,通過configmap來定義
grafana# ls grafana-configmap.yaml grafana-core-deployment.yaml grafana-import-dashboards-job.yaml grafana-pvc-claim.yaml grafana-pvc-volume.yaml grafana-service.yaml grafana# kubectl create -f ./ #grafana目錄下所有文件都創建 configmap "grafana-import-dashboards" created deployment "grafana-core" created job "grafana-import-dashboards" created service "grafana" created grafana# kubectl get deployment,pod -n monitoring NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deploy/grafana-core 1 1 1 0 1m NAME READY STATUS RESTARTS AGE po/grafana-core-9c7f66868-7q8lx 0/1 ContainerCreating 0 1m 運行po/grafana-core 容器時會下載鏡像: grafana/grafana:4.2.0
grafana創建的應用 簡單的自已描述了下:
grafana-pv-volume=/data/volume/grafana =10G grafana-pvc-volume=5G--->grafana-pv-volume ---configmap=grafana-import-dashboards Job=grafana-import-dashboards Deployment=grafana-core replicas: 1 containers=grafana-core mount: grafana-pvc-volume:/var service=grafana port: 3000 = nodePort: 30161 (3000是grafana服務的默認端口)
4, 現在grafana的核心應用已部署好了,現在來部署prometheus的RBAC
prometheus/rbac# ls grant_serviceAccount.sh prometheus_rbac.yaml #先創建RBAC文件: prometheus/rbac# kubectl create -f prometheus_rbac.yaml clusterrolebinding "prometheus-k8s" created clusterrolebinding "kube-state-metrics" created clusterrole "kube-state-metrics" created serviceaccount "kube-state-metrics" created clusterrolebinding "prometheus" created clusterrole "prometheus" created serviceaccount "prometheus-k8s" created prometheus/rbac#
5,創建prometheus的deloyment,service
prometheus/prometheus# ls configmap.yaml deployment.yaml prometheus-rules.yaml service.yaml prometheus/prometheus# 在configmap.yaml中要註意的是在1.7以後,獲取cadvsion監控pod等的信息時,用的是kubelet的4194端口, 註意以下這段:這是采集cadvision信息,必須是通過kubelet的4194端口,所以Kubelet必須監聽著,4194部署了cadvsion來獲取pod中容器信息 prometheus/prometheus#cat configmap.yaml # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L37 - job_name: 'kubernetes-nodes' tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:10255' target_label: __address__ - job_name: 'kubernetes-cadvisor' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc.cluster.local:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}:4194/proxy/metrics # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L79
prometheus-rules.yaml 這是它的發現規則文件
deployment.yaml service.yaml 這兩個是部署的文件, deployment部署中資源限制建議放大一點
現在部署prometheus目錄下所有文件:
prometheus/prometheus# kubectl create -f ./ configmap "prometheus-core" created deployment "prometheus-core" created configmap "prometheus-rules" created service "prometheus" created prometheus/prometheus# prometheus/prometheus# kubectl get deployment,pod -n monitoring NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deploy/grafana-core 1 1 1 1 16m deploy/prometheus-core 1 1 1 1 1m NAME READY STATUS RESTARTS AGE po/grafana-core-9c7f66868-wm68j 1/1 Running 0 16m po/prometheus-core-6dc6777c5b-5nc7j 1/1 Running 0 1m
prometheus應用的部署,簡單描述下創建的內容:
Deployment= prometheus-core replicas: 1 containers=prometheus image: prom/prometheus:v1.7.0 containerPort: 9090(webui) Service name: prometheus NodePort-->port: 9090 -webui
6,prometheus部署完了現在來部署它的agent,也就是采集器:
Prometheus/prometheus# ls node-directory-size-metrics/ daemonset.yaml Prometheus/prometheus# ls kube-state-metrics/ deployment.yaml service.yaml Prometheus/prometheus# ls node-exporter/ exporter-daemonset.yaml exporter-service.yaml Prometheus/prometheus# #其中兩個用的是daemonset Prometheus/prometheus# kubectl create -f node-exporter/ -f kube-state-metrics/ -f node-directory-size-metrics/ daemonset "prometheus-node-exporter" created service "prometheus-node-exporter" created deployment "kube-state-metrics" created service "kube-state-metrics" created daemonset "node-directory-size-metrics" created Prometheus/prometheus# Prometheus/prometheus# kubectl get deploy,pod,svc -n monitoring NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deploy/grafana-core 1 1 1 1 26m deploy/kube-state-metrics 2 2 2 2 1m deploy/prometheus-core 1 1 1 1 11m NAME READY STATUS RESTARTS AGE po/grafana-core-9c7f66868-wm68j 1/1 Running 0 26m po/kube-state-metrics-694fdcf55f-bqcp8 1/1 Running 0 1m po/kube-state-metrics-694fdcf55f-nnqqd 1/1 Running 0 1m po/node-directory-size-metrics-n9wx7 2/2 Running 0 1m po/node-directory-size-metrics-ppscw 2/2 Running 0 1m po/prometheus-core-6dc6777c5b-5nc7j 1/1 Running 0 11m po/prometheus-node-exporter-kchmb 1/1 Running 0 1m po/prometheus-node-exporter-lks5m 1/1 Running 0 1m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/grafana NodePort 10.254.231.25 <none> 3000:30161/TCP 26m svc/kube-state-metrics ClusterIP 10.254.156.51 <none> 8080/TCP 1m svc/prometheus NodePort 10.254.239.90 <none> 9090:37318/TCP 10m svc/prometheus-node-exporter ClusterIP None <none> 9100/TCP 1m Prometheus/prometheus# -------- Prometheus/prometheus# kubectl get pod -o wide -n monitoring NAME READY STATUS RESTARTS AGE IP NODE prometheus-node-exporter-kchmb 1/1 Running 0 4m 10.3.1.16 10.3.1.16 prometheus-node-exporter-lks5m 1/1 Running 0 4m 10.3.1.17 10.3.1.17 #這兩個是exporter,用的是daemonset 分別在這兩個node上運行了。這樣就可以采集到所有數據了。
如上部署完成,以下是用自已的話簡單描述下:
node-exporter/exporter-daemonset.yaml 文件: DaemonSet=prometheus-node-exporter containers: name: prometheus-node-exporter image: prom/node-exporter:v0.14.0 containerPort: 9100 hostPort: 9100 hostNetwork: true #它用的是主機的9100端口 Prometheus/prometheus/node-exporter# kubectl get daemonset,pod -n monitoring NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE ds/node-directory-size-metrics 2 2 2 2 2 <none> 16h ds/prometheus-node-exporter 2 2 2 2 2 <none> 16h 因為它是daemonset,所以相應的也會運行著兩個Pod: prometheus-node-exporter Service=prometheus-node-exporter clusterIP: None port: 9100 type: ClusterIP #它沒有clusterIP # kubectl get service -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus-node-exporter ClusterIP None <none> 9100/TCP 16h
kube-state-metrics/deployment.yaml 文件: Deployment=kube-state-metrics replicas: 2 containers-->name: kube-state-metrics image: gcr.io/google_containers/kube-state-metrics:v0.5.0 containerPort: 8080 Service name: kube-state-metrics port: 8080 #沒有映射 #kubectl get deployment,pod,svc -n monitoring NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deploy/kube-state-metrics 2 2 2 2 16h NAME READY STATUS RESTARTS AGE po/kube-state-metrics-694fdcf55f-2mmd5 1/1 Running 0 11h po/kube-state-metrics-694fdcf55f-bqcp8 1/1 Running 0 16h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/kube-state-metrics ClusterIP 10.254.156.51 <none> 8080/TCP 16h
node-directory-size-metrics/daemonset.yaml 文件: #因為是daemonset,所以未定義replicas數量,直接運行在每個node之上,但是它沒有創建service DaemonSet : name: node-directory-size-metrics containers-->name: read-du image: giantswarm/tiny-tools mountPath: /mnt/var mountPath: /tmp containers--> name: caddy image: dockermuenster/caddy:0.9.3 containerPort: 9102 mountPath: /var/www hostPath /var kubectl get daemonset,pod,svc -n monitoring NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE ds/node-directory-size-metrics 2 2 2 2 2 <none> 16h NAME READY STATUS RESTARTS AGE po/node-directory-size-metrics-n9wx7 2/2 Running 0 16h po/node-directory-size-metrics-ppscw 2/2 Running 0 16h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 沒有node-directory-size-metrics的service
到此 prometheus算是部署完成了,最後來看下它暴露的端口:
Prometheus/prometheus# kubectl get svc -o wide -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR grafana NodePort 10.254.231.25 <none> 3000:30161/TCP 31m app=grafana,component=core kube-state-metrics ClusterIP 10.254.156.51 <none> 8080/TCP 6m app=kube-state-metrics prometheus NodePort 10.254.239.90 <none> 9090:37318/TCP 16m app=prometheus,component=core prometheus-node-exporter ClusterIP None <none> 9100/TCP 6m app=prometheus,component=node-exporter Prometheus/prometheus#
7,訪問、使用prometheus
如上可以看到grafana的端口號是30161,NodeIP:30161 就可以打開grafana,默認admin/admin
登錄後,添加數據源:
添加Prometheus的數據源:
將Prometheus的作為數據源的相關參數如下圖所示:
添加完後,導入模板文件:
部署完成。
利用prometheus監控K8S