1. 程式人生 > >如何實現prometheus對k8的監控

如何實現prometheus對k8的監控

Kubenetes是一款由Google開發的開源容器編排工具,其作為容器領域事實的標準,可以極大的簡化應用的管理和部署的複雜度。如何對Kubernetes自身的各種元件還有執行在Kubernetes叢集上的各種容器做到更好的監控,prometheus的出現提供了一個很好的解決方案。

Prometheus是一個開源監控系統,它會根據配置的任務週期性的採集指定目標上的指標,並將資料儲存為時間序列資料儲存在本地或者遠端儲存上(預設使用本地儲存),實時分析系統執行的狀態,為效能優化提供依據。 其主要特徵包括:

擁有多維度的資料模型。

提供了靈活的查詢語言,可以使使用者實時查詢和聚合採集到的時間序列資料。

通過基於HTTP的pull方式採集時序資料,也可以通過中間閘道器支援push方式。

通過服務發現或靜態配置來發現目標服務物件。

服務獨立執行,沒有其他依賴。

其技術架構如下圖所示:

在這裡插入圖片描述

核心元件

Prometheus Server : 根據配置完成資料的獲取,儲存以及查詢。

Exporters : Prometheus中資料採集元件的總稱。提供採集介面,用於暴露已有第三方服務的metrics給Prometheus。它負責從目標處蒐集資料,並將其轉化為Prometheus支援的格式。與傳統的資料採集元件不同的是,它並不主動傳送資料,而是等待Prometheus Server主動前來抓取。

Push Gateway : 為應對部分push場景提供的元件,監控資料先推送到Push Gateway 上,然後再由Prometheus Server端採集。

AlertManager : 在Prometheus Server 中支援基於PromQL建立告警規則,如果滿足PromQL定義的規則,則會產生一條告警。

資料模型

Prometheus 中儲存的資料為時間序列,每一個時間序列資料由metric名稱和它的標籤labels鍵值對集合唯一確定。

格式:

< metric name >{ < label name > = < label value >, … }
標籤:使同一個時間序列有了不同維度的標識。例如metric名稱是api_http_requests_total, 標籤為code=”200”, handler=”prometheus”, instance=”node2”, job=”kubernetes-nodes”, method=”get”的指標可以表示為:

http_requests_total{code=“200”,handler=“prometheus”,instance=“node2”,job=“kubernetes-nodes”,method=“get”}
樣本:實際的時間序列,每個序列由一個64位的浮點值和一個精確到毫秒級的時間戳組成。

我們關注的指標主要分成兩個部分,分別是狀態指標和效能指標。

效能指標

主要包括CPU、Memory、Load-負載、磁碟、網路等,具體有:

  • 容器、Pod相關的效能指標資料
  • 單純主機節點的效能指標資料
  • 節點上與k8s叢集相關的指標資料
  • k8s上應用的網路效能資料

狀態指標

  • k8s資源物件(Deployment、Daemonset、Pod等)的執行狀態指標
  • k8s平臺元件(如kube-apiserver、kube-scheduler等)的執行狀態指標

主流的監控方案是:通過各種exporter採集不同維度的監控指標,並通過Prometheus支援的資料格式暴露出來,Prometheus定期pull資料並用Grafana展示,異常情況使用AlertManager告警。

總體採集方式如下:

  • 通過cadvisor採集容器、Pod相關的效能指標資料。
  • 通過node-exporter採集單純主機節點的效能指標資料。
  • 通過kube-state-metrics採集k8s資源物件以及k8s元件的(健康)狀態指標資料。
  • 通過kubelet自身暴露的介面採集節點上與k8s叢集相關的指標資料。
  • 通過blackbox-exporter採集應用的網路效能(http、tcp等)資料
    在這裡插入圖片描述

實現prometheus對k8的監控分為以下幾個步驟(針對有k8s的環境的情況)。

1.1、建立名稱空間

首先我們建立一個名稱空間monitoring。後面所有的內容都會安裝在這個名稱空間中。

apiVersion: v1
kind: Namespace
metadata:
  name: monitoring

建立命令:kubectl create -f namespace.yaml
與之相對的,如果修改了yaml檔案,則需要執行 kubectl delete -f namespace.yaml 後再重新create使修改生效。

1.2、建立node-exporter

node-exporter 探針用於收集主機的效能指標資料,如記憶體資訊等。預設埠為9100。指標字首名為node_ (注:node_ 開頭的指標不全是node-exporter 探測出來的)。

編輯 node-exporter-service.yaml 檔案。
在Service中定義標註 prometheus.io/scrape: ‘true’,表明該Service需要被promethues發現並採集資料。

如果監控的是叢集,需要在每臺主機上都存在node-exporter映象。

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: prometheus-node-exporter
  namespace: monitoring
  labels:
    app: prometheus
    component: node-exporter
spec:
  clusterIP: None
  ports:
    - name: prometheus-node-exporter
      port: 9100
      protocol: TCP
  selector:
    app: prometheus
    component: node-exporter
  type: ClusterIP

編輯 node-exporter-daemonset.yaml 檔案。

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: prometheus-node-exporter
  namespace: monitoring
  labels:
    app: prometheus
    component: node-exporter
spec:
  template:
    metadata:
      name: prometheus-node-exporter
      labels:
        app: prometheus
        component: node-exporter
    spec:
      containers:
      - image: yourip/prometheus/node-exporter:latest
        name: prometheus-node-exporter
        ports:
        - name: prom-node-exp
          containerPort: 9100
          hostPort: 9100
      hostNetwork: true
      hostPID: true

建立命令:

kubectl create -f node-exporter-service.yaml   -f  node-exporter-daemonset.yaml

1.3、建立kube-state-metrics

編輯 state-metrics-deployment.yaml 檔案。
採集到的指標的指標名字首為kube_。採集K8S資源物件以及K8S元件的健康狀態指標資料,主要是Kubernetes叢集上Pod, DaemonSet, Deployment, Job等各種資源物件的狀態。預設埠為 8080 。

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: monitoring
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: yourip/prometheus/kube-state-metrics:latest
        ports:
        - containerPort: 8080

編輯 state-metrics-service.yaml 檔案。

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: kube-state-metrics
  namespace: monitoring
  labels:
    app: kube-state-metrics
spec:
  ports:
  - name: kube-state-metrics
    port: 8080
    protocol: TCP
  selector:
    app: kube-state-metrics

建立命令:

kubectl create -f state-metrics-deployment.yaml   -f state-metrics-service.yaml

1.4、建立node-directory-size-metrics(忽略)

編輯 node-directory-size-metrics-daemonset.yaml 檔案。
主要用於讀取節點目錄,獲取磁碟使用metric資料。注意,它只檢測/mnt目錄下磁碟的使用情況,可以用於檢測某個機器上某個目錄的變化,前提是需要把那個目錄掛載到 /mnt目錄下。
檢索方式為:{app="node-directory-size-metrics"} 。預設埠為9102 。指標名字首也是node_

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-directory-size-metrics
  namespace: monitoring
  annotations:
    description: |
      This `DaemonSet` provides metrics in Prometheus format about disk usage on the nodes.
      The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
      The other container `caddy` just hands out the contents of that file on request via `http` on `/metrics` at port `9102` which are the defaults for Prometheus.
      These are scheduled on every node in the Kubernetes cluster.
      To choose directories from the node to check, just mount them on the `read-du` container below `/mnt`.
spec:
  template:
    metadata:
      labels:
        app: node-directory-size-metrics
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9102'
        description: |
          This `Pod` provides metrics in Prometheus format about disk usage on the node.
          The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
          The other container `caddy` just hands out the contents of that file on request on `/metrics` at port `9102` which are the defaults for Prometheus.
          This `Pod` is scheduled on every node in the Kubernetes cluster.
          To choose directories from the node to check just mount them on `read-du` below `/mnt`.
    spec:
      containers:
      - name: read-du
        image: yourip/prometheus/tiny-tools:latest
        imagePullPolicy: Always
        # FIXME threshold via env var
        # The
        command:
        - fish
        - --command
        - |
          touch /tmp/metrics-temp
          while true
            for directory in (du --bytes --separate-dirs --threshold=100M /mnt)
              echo $directory | read size path
              echo "node_directory_size_bytes{path=\"$path\"} $size" \
                >> /tmp/metrics-temp
            end
            mv /tmp/metrics-temp /tmp/metrics
            sleep 300
          end
        volumeMounts:
        - name: host-fs-var
          mountPath: /mnt/var
          readOnly: true
        - name: metrics
          mountPath: /tmp
      - name: caddy
        image: yourip/prometheus/caddy:latest
        command:
        - "caddy"
        - "-port=9102"
        - "-root=/var/www"
        ports:
        - containerPort: 9102
        volumeMounts:
        - name: metrics
          mountPath: /var/www
      volumes:
      - name: host-fs-var
        hostPath:
          path: /var
      - name: metrics
        emptyDir:
          medium: Memory

建立命令:

kubectl create -f node-directory-size-metrics-daemonset.yaml

1.5、建立configmap

編輯 node-directory-size-metrics-daemonset.yaml 檔案。
在configMap中設定的五個分類分別是nodes、endpoints、services、pods、cadvisor 。這五個分類就是 prometheus web 端 targets 頁面中顯示的五個型別(jobs型別)。
真正在prometheus配置過程中生效的是configmap中 prometheus.yaml下的部分:

apiVersion: v1
data:
  prometheus.yaml: |
    global:
      scrape_interval: 10s
      scrape_timeout: 10s
      evaluation_interval: 10s
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["alertmanager:9093"]
    rule_files:
      - "/etc/prometheus/rules/test.yml"
    scrape_configs:
      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L37
      - job_name: 'kubernetes-nodes'
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:10255'
            target_label: __address__
      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L79
      - job_name: 'kubernetes-endpoints'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: (.+)(?::\d+);(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L119
      - job_name: 'kubernetes-services'
        metrics_path: /probe
        params:
          module: [http_2xx]
        kubernetes_sd_configs:
          - role: service
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
            action: keep
            regex: true
          - source_labels: [__address__]
            target_label: __param_target
          - target_label: __address__
            replacement: blackbox
          - source_labels: [__param_target]
            target_label: instance
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            target_label: kubernetes_name
      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L156
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: (.+):(?:\d+);(\d+)
            replacement: ${1}:${2}
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name
          - source_labels: [__meta_kubernetes_pod_container_port_number]
            action: keep
            regex: 9\d{3}
      - job_name: 'kubernetes-cadvisor'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
          
        metric_relabel_configs:
          - action: replace
            source_labels: [id]
            regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
            target_label: rkt_container_name
            replacement: '${2}-${1}'
          - action: replace
            source_labels: [id]
            regex: '^/system\.slice/(.+)\.service$'
            target_label: systemd_service_name
            replacement: '${1}'

      - job_name: 'discovery-node'
        file_sd_configs:
          - files: ['/etc/prometheus/test_sd_config/*.yml']
            refresh_interval: 5s  

      - job_name: 'consul-prometheus'
        consul_sd_configs:    
          - server: '10.4.**.**:8500'
            services: []
        relabel_configs:
          - source_labels: ['__meta_consul_service']
            regex: .*node.*
            action: keep

kind: ConfigMap
metadata:
  creationTimestamp: null
  name: prometheus-core
  namespace: monitoring

建立命令:

kubectl create -f configmap.yaml 

1.6、建立prometheus-core

編輯 deployment.yaml 。

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: prometheus-core
  namespace: monitoring
  labels:
    app: prometheus
    component: core
spec:
  replicas: 1
  template:
    metadata:
      name: prometheus-main
      labels:
        app: prometheus
        component: core
    spec:
      serviceAccountName: prometheus-k8s
      containers:
      - name: prometheus
        image: 10.4.41.221/prometheus/prometheus:latest
        args: ['--config.file=/etc/prometheus/prometheus.yaml','--storage.tsdb.path=/prometheus/data/','--storage.tsdb.retention=1d']
        ports:
        - name: webui
          containerPort: 9090
        resources:
          requests:
            cpu: 500m
            memory: 500M
          limits:
            cpu: 500m
            memory: 500M
        volumeMounts:
        - name: config-volume
          mountPath: /etc/prometheus
        - name: rules-volume
          mountPath: /etc/prometheus/rules
        - name: discovery-volume
          mountPath: /etc/prometheus/test_sd_config
      volumes:
      - name: config-volume
        configMap:
          name: prometheus-core
      - name: rules-volume
        configMap:
          name: prometheus-rules
      - name: discovery-volume
        configMap:
          name: prometheus-discovery

建立命令:

kubectl create -f deployment.yaml 

注: args: ['--config.file=/etc/prometheus/prometheus.yaml'] 新增引數前面是-- ,如果只寫一個- ,就會報錯:

Error parsing commandline arguments: unknown short flag '-c'
prometheus: error: unknown short flag '-c'

1.7、建立 prometheus service

編輯 service.yaml 檔案。

apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
    component: core
  annotations:
    prometheus.io/scrape: 'true'
spec:
  type: NodePort
  ports:
    - port: 9090
      protocol: TCP
      name: webui
  selector:
    app: prometheus
    component: core

建立命令:

kubectl create -f service.yaml 

1.8、建立 rbac

檢視檔案 rbac.yaml 。
如果這個檔案出錯,可能導致在k8s的 web 端檢視時各 targets 都已經起來了,但是 prometheus 的web端的targets頁面什麼都沒有。
rbac 用於對上面建立的名稱空間分配叢集的讀取許可權,以便Prometheus可以通過Kubernetes的API獲取叢集的資源指標。

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-k8s
  namespace: monitoring

建立命令:

kubectl create -f rbac.yaml

1.9、建立告警規則

apiVersion: v1
data:
  test.yml: |        
    groups:
      - name: goroutines_monitoring
        rules:
        - alert: TooMuchGoroutines
          expr: go_goroutines > 20
          for: 1m
          labels:
            severity: warning
          annotations:
            summary: "too much goroutines of job prometheus."
            description: "testing"

kind: ConfigMap
metadata:
  creationTimestamp: null
  name: prometheus-rules
  namespace: monitoring

建立命令:

kubectl create -f rbac.yaml

1.10、進入prometheus 的 web 頁面

查詢 prometheus 暴露的埠號,查詢到為32075:

kubectl -n monitoring  get svc 

這裡用到的ip不是所謂的叢集ip,而是部署 prometheus 的主機的 ip ,訪問路徑為 http://yourip:32075/targets

注1:如果訪問不到網址,可以去k8s的介面上檢視日誌,盤查錯誤。

注2:訪問到prometheus的頁面後,node報錯:
Get http://10.4.41.161:10255/metrics: dial tcp ip:10255: connect: connection refused

解決方法:

cd /etc/kubernetes/
vi kubelet.env 
#修改
--read-only-port=10255 \

service kubelet restart

注3:在監控叢集的過程中,通過docker ps -a 發現node1上沒有/prometheus/prometheus 等容器,此時可以檢視node2,可能是叢集將這個分配到其他節點上去了。

1.11、配置AlterManager

kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: monitoring
data:
  config.yml: |-
    global:     
      resolve_timeout: 1m      
      smtp_smarthost: 'smtp.neusoft.com:587'
      smtp_from: 'mail.neusoft.com'
      smtp_auth_username: '[email protected]'
      smtp_auth_password: 'BABYNJSWDw1'     
    templates:
      - '/etc/alertmanager-templates/*.tmpl'   
    route:     
      group_by: ['alertname']     
      group_wait: 30s     
      group_interval: 30s     
      repeat_interval: 1m      
      receiver: 'webhook'    
    receivers:
      - name: 'webhook'
        webhook_configs:
        - url: '[email protected]'
    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']

建立命令:kubectl create -f configmap.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: alertmanager
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alertmanager
  template:
    metadata:
      name: alertmanager
      labels:
        app: alertmanager
    spec:
      containers:
      - name: alertmanager
        image: 10.4.41.221/prometheus/alertmanager:latest
        args: ['--config.file=/etc/alertmanager/config.yml','--storage.path=/alertmanager']        
        ports:
        - name: alertmanager
          containerPort: 9093
        volumeMounts:
        - name: config-volume
          mountPath: /etc/alertmanager
        - name: templates-volume
          mountPath: /etc/alertmanager-templates
        - name: alertmanager
          mountPath: /alertmanager
      volumes:
      - name: config-volume
        configMap:
          name: alertmanager
      - name: templates-volume
        configMap:
          name: alertmanager-templates
      - name: alertmanager
        emptyDir: {}

建立命令: kubectl create -f deployment.yaml

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/path: '/metrics'
  labels:
    name: alertmanager
  name: alertmanager
  namespace: monitoring
spec:
  selector:
    app: alertmanager
  type: NodePort
  ports:
  - name: alertmanager
    protocol: TCP
    port: 9093
    targetPort: 9093

建立命令: kubectl create -f service.yaml

1.12、進入AlterManager 的 web 頁面

http://ip:port/#/alerts

其中ip為部署altermanager的主機,port是通過kubectl -n monitoring get svc 查詢出來的,這裡是31555。通過9093訪問會出現無法訪問的情況。

下圖為prometheus中的Alert告警頁面

在這裡插入圖片描述

下圖為altermanager的告警頁面

在這裡插入圖片描述

======================================================

cAdvisor介紹

cAdvisor是谷歌開源的一個容器監控工具,cadvisor採集了主機上容器相關的效能指標資料,通過容器的指標還可進一步計算出pod的指標。

cadvisor提供的一些主要指標有:

container_cpu_*	
container_fs_*	
container_memory_*	
container_network_*	
container_spec_*(cpu/memory)		
container_start_time_*	
container_tasks_state_*

目前cAdvisor整合到了kubelet元件內,可以在kubernetes叢集中每個啟動了kubelet的節點使用cAdvisor提供的metrics介面獲取該節點所有容器相關的效能指標資料。cAdvisor介面暴露的監控指標資料是按prometheus的格式輸出的,是Prometheus認可的資料模型的監控資料。

1.7.3版本以前,cadvisor的metrics資料整合在kubelet的metrics中,在1.7.3以後版本中cadvisor的metrics被從kubelet的metrics獨立出來了,在prometheus採集的時候變成兩個scrape的job。

configmap.yaml 中cAdvisor的相關內容是用來配置Prometheus定期拉取cAdvisor的metrics 。Prometheus通過apiserver提供的api做代理獲取cAdvisor的監控指標。

注:通過NodeExporter暴露的metrics介面採集到的資料,也是按Prometheus的格式輸出的。其他的也同理。

參考文件:
1、cadvisor 與 kubelet 的區別
https://www.cnblogs.com/aguncn/p/9929684.html
2、cAdvisor 詳解(3.1章節,重點參考)
https://blog.csdn.net/liukuan73/article/details/78881008

======================================================
單詞註解:
spec 規範
metadata 元資料
DaemonSet 守護程序
regex 正則表示式
annotation 註釋
timeout 超時

=======================================================
注:
1、instance是指收集資料的目標端點,一般對應於一個程序;而job表示實現同一功能或目標的一組instance。

2、得到名稱空間為monitoring中pod的名字,狀態等資訊

kubectl get pod -n monitoring

根據上一句得到的pod名字,可以檢視他的日誌:

kubectl  logs prometheus-core-69f86f78d7-xgclc -n monitoring 

=======================================================
參考文獻:

1、主要參考(必看):
https://blog.csdn.net/wenwst/article/details/76624019
https://blog.csdn.net/liukuan73/article/details/78881008

2、主要參考的GitHub
https://github.com/giantswarm/kubernetes-prometheus/tree/master/manifests/prometheus

3、prometheus 條件查詢(官網)
https://prometheus.io/docs/prometheus/latest/querying/basics/

4、參考的Github
https://github.com/liukuan73/kubernetes-addons/blob/master/monitor/prometheus%2Bgrafana/node-exporter-daemonset.yaml

5、參考的GitHub
https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml