Kubernetes更優雅的監控工具Prometheus Operator

阿新 • • 發佈：2018-11-02

Kubernetes更優雅的監控工具Prometheus Operator

[TOC]

1. Kubernetes Operator 介紹

在 Kubernetes 的支援下，管理和伸縮 Web 應用、移動應用後端以及 API 服務都變得比較簡單了。其原因是這些應用一般都是無狀態的，所以 Deployment 這樣的基礎 Kubernetes API 物件就可以在無需附加操作的情況下，對應用進行伸縮和故障恢復了。

而對於資料庫、快取或者監控系統等有狀態應用的管理，就是個挑戰了。這些系統需要應用領域的知識，來正確的進行伸縮和升級，當資料丟失或不可用的時候，要進行有效的重新配置。我們希望這些應用相關的運維技能可以編碼到軟體之中，從而藉助 Kubernetes 的能力，正確的執行和管理複雜應用。

Operator 這種軟體，使用 TPR(第三方資源，現在已經升級為 CRD) 機制對 Kubernetes API 進行擴充套件，將特定應用的知識融入其中，讓使用者可以建立、配置和管理應用。和 Kubernetes 的內建資源一樣，Operator 操作的不是一個單例項應用，而是叢集範圍內的多例項。

2. Prometheus Operator介紹

Kubernetes的Prometheus Operator為Kubernetes服務和Prometheus例項的部署和管理提供了簡單的監控定義。

安裝完畢後，Prometheus Operator提供了以下功能：

建立/毀壞: 在Kubernetes namespace中更容易啟動一個Prometheus例項，一個特定的應用程式或團隊更容易使用Operator。

簡單配置: 配置Prometheus的基礎東西，比如在Kubernetes的本地資源versions, persistence, retention policies, 和replicas。
Target Services通過標籤: 基於常見的Kubernetes label查詢，自動生成監控target 配置；不需要學習普羅米修斯特定的配置語言。

Prometheus Operator 架構圖如下：

Prometheus Operator架構

以上架構中的各組成部分以不同的資源方式執行在 Kubernetes 叢集中，它們各自有不同的作用：

Operator： Operator 資源會根據自定義資源（Custom Resource Definition / CRDs）來部署和管理 Prometheus Server，同時監控這些自定義資源事件的變化來做相應的處理，是整個系統的控制中心。
Prometheus

： Prometheus 資源是宣告性地描述 Prometheus 部署的期望狀態。
Prometheus Server： Operator 根據自定義資源 Prometheus 型別中定義的內容而部署的 Prometheus Server 叢集，這些自定義資源可以看作是用來管理 Prometheus Server 叢集的 StatefulSets 資源。
ServiceMonitor： ServiceMonitor 也是一個自定義資源，它描述了一組被 Prometheus 監控的 targets 列表。該資源通過 Labels 來選取對應的 Service Endpoint，讓 Prometheus Server 通過選取的 Service 來獲取 Metrics 資訊。
Service： Service 資源主要用來對應 Kubernetes 叢集中的 Metrics Server Pod，來提供給 ServiceMonitor 選取讓 Prometheus Server 來獲取資訊。簡單的說就是 Prometheus 監控的物件，例如 Node Exporter Service、Mysql Exporter Service 等等。
Alertmanager： Alertmanager 也是一個自定義資源型別，由 Operator 根據資源描述內容來部署 Alertmanager 叢集。

3. Prometheus Operator部署

環境：

Kubernetes version: kubeadm安裝的1.12
helm version: v2.11.0

我們使用helm安裝。helm chart根據實際使用修改。prometheus-operator

裡面整合了grafana和監控kubernetes的exporter。需要注意的是，grafana我配置使用了mysql儲存資料，相關說明在另一篇文章中《使用Helm部署Prometheus和Grafana監控Kubernetes》。

cd helm/prometheus-operator/
helm install --name prometheus-operator --namespace monitoring -f values.yaml ./

為了更加靈活的的使用Prometheus Operator，新增自定義監控是必不可少的。這裡我們使用ceph-exporter做示例。

values.yaml中這一段即是使用servicemonitor來新增監控：

serviceMonitor:
  enabled: true  # 開啟監控
  # on what port are the metrics exposed by etcd
  exporterPort: 9128
  # for apps that have deployed outside of the cluster, list their adresses here
  endpoints: []
  # Are we talking http or https?
  scheme: http
  # service selector label key to target ceph exporter pods
  serviceSelectorLabelKey: app
  # default rules are in templates/ceph-exporter.rules.yaml
  prometheusRules: {}
  # Custom Labels to be added to ServiceMonitor
  # 經過測試，servicemonitor標籤新增prometheus operator的release標籤即可正常監控
  additionalServiceMonitorLabels: 
    release: prometheus-operator
  #Custom Labels to be added to Prometheus Rules CRD
  additionalRulesLabels: {}

最重要的是這個引數additionalServiceMonitorLabels，經過測試，servicemonitor需要新增prometheus operator已有的標籤，才能成功新增監控。

[[email protected] prometheus-operator]# kubectl get servicemonitor ceph-exporter -n monitoring -o yaml
[[email protected] templates]# kubectl get servicemonitor -n monitoring ceph-exporter -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: 2018-10-30T06:51:12Z
  generation: 1
  labels:
    app: ceph-exporter
    chart: ceph-exporter-0.1.0
    heritage: Tiller
    prometheus: ceph-exporter
    release: prometheus-operator
  name: ceph-exporter
  namespace: monitoring
  resourceVersion: "13937459"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/ceph-exporter
  uid: 30569173-dc10-11e8-bcf3-000c293d66a5
spec:
  endpoints:
  - interval: 30s
    port: http
  namespaceSelector:
    matchNames:
    - monitoring
  selector:
    matchLabels:
      app: ceph-exporter
      release: ceph-exporter

[[email protected] prometheus-operator]# kubectl get pod -n monitoring  prometheus-operator-operator-7459848949-8dddt -o yaml|more
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: 2018-10-30T00:39:37Z
  generateName: prometheus-operator-operator-7459848949-
  labels:
    app: prometheus-operator-operator
    chart: prometheus-operator-0.1.6
    heritage: Tiller
    pod-template-hash: "745984894
    release: prometheus-operator

要點說明：

ServiceMonitor的標籤中至少需要有和prometheus-operator POD中標籤相匹配；
ServiceMonitor的spec引數
service能被prometheus訪問，各端點正常；
遇到問題，可以開啟prometheus operator和prometheus的除錯日誌。雖然日誌沒有什麼其它資訊，但是prometheus operator除錯日誌可以看到當前監控到的servicemonitor，這樣可以確認安裝的servicemonitor是否被匹配到。

安裝成功後，檢視相關資源：

[[email protected] prometheus-operator]# kubectl get service,servicemonitor,ep -n monitoring
NAME                                                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/alertmanager-operated                          ClusterIP   None             <none>        9093/TCP,6783/TCP   12d
service/ceph-exporter                                  ClusterIP   10.100.57.62     <none>        9128/TCP            46h
service/monitoring-mysql-mysql                         ClusterIP   10.108.93.155    <none>        3306/TCP            42d
service/prometheus-operated                            ClusterIP   None             <none>        9090/TCP            12d
service/prometheus-operator-alertmanager               ClusterIP   10.98.42.209     <none>        9093/TCP            6d19h
service/prometheus-operator-grafana                    ClusterIP   10.103.100.150   <none>        80/TCP              6d19h
service/prometheus-operator-kube-state-metrics         ClusterIP   10.110.76.250    <none>        8080/TCP            6d19h
service/prometheus-operator-operator                   ClusterIP   None             <none>        8080/TCP            6d19h
service/prometheus-operator-prometheus                 ClusterIP   10.111.24.83     <none>        9090/TCP            6d19h
service/prometheus-operator-prometheus-node-exporter   ClusterIP   10.97.126.74     <none>        9100/TCP            6d19h

NAME                                                                               AGE
servicemonitor.monitoring.coreos.com/ceph-exporter                                 1d
servicemonitor.monitoring.coreos.com/prometheus-operator                           8d
servicemonitor.monitoring.coreos.com/prometheus-operator-alertmanager              6d
servicemonitor.monitoring.coreos.com/prometheus-operator-apiserver                 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-coredns                   6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-controller-manager   6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-etcd                 6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-scheduler            6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kube-state-metrics        6d
servicemonitor.monitoring.coreos.com/prometheus-operator-kubelet                   6d
servicemonitor.monitoring.coreos.com/prometheus-operator-node-exporter             6d
servicemonitor.monitoring.coreos.com/prometheus-operator-operator                  6d
servicemonitor.monitoring.coreos.com/prometheus-operator-prometheus                6d

NAME                                                     ENDPOINTS                                                                 AGE
endpoints/alertmanager-operated                          10.244.6.174:9093,10.244.6.174:6783                                       12d
endpoints/ceph-exporter                                  10.244.2.59:9128                                                          46h
endpoints/monitoring-mysql-mysql                         10.244.6.171:3306                                                         42d
endpoints/prometheus-operated                            10.244.2.60:9090,10.244.6.175:9090                                        12d
endpoints/prometheus-operator-alertmanager               10.244.6.174:9093                                                         6d19h
endpoints/prometheus-operator-grafana                    10.244.6.106:3000                                                         6d19h
endpoints/prometheus-operator-kube-state-metrics         10.244.2.163:8080                                                         6d19h
endpoints/prometheus-operator-operator                   10.244.6.113:8080                                                         6d19h
endpoints/prometheus-operator-prometheus                 10.244.2.60:9090,10.244.6.175:9090                                        6d19h
endpoints/prometheus-operator-prometheus-node-exporter   192.168.105.92:9100,192.168.105.93:9100,192.168.105.94:9100 + 4 more...   6d19h

4. Grafana新增dashboard

上面的prometheus-operator裡的_dashboards有我修改過的dashboard，比較全面，使用手動在grafana介面匯入，後續可以隨意修改dashboard，使用過程中非常方便。而如果將dashboard json檔案放到dashboards目錄中，helm安裝的話，安裝的dashboard不支援grafana中直接修改，使用過程中比較麻煩。

5. Alertmanager新增報警

新增prometheusrule，以下是一個示例：

[[email protected] ceph-exporter]# kubectl get prometheusrule -n monitoring ceph-exporter -o yaml 
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: 2018-10-30T06:51:12Z
  generation: 1
  labels:
    app: prometheus
    chart: ceph-exporter-0.1.0
    heritage: Tiller
    prometheus: ceph-exporter
    release: ceph-exporter
  name: ceph-exporter
  namespace: monitoring
  resourceVersion: "13965150"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheusrules/ceph-exporter
  uid: 30543ec9-dc10-11e8-bcf3-000c293d66a5
spec:
  groups:
  - name: ceph-exporter.rules
    rules:
    - alert: Ceph
      annotations:
        description: There is no running ceph exporter.
        summary: Ceph exporter is down
      expr: absent(up{job="ceph-exporter"} == 1)
      for: 5m
      labels:
        severity: critical

預設監控k8s的rule已經很多很全面了，可以自行調整prometheus-operator/templates/all-prometheus-rules.yaml。

報警規則可修改values.yaml中alertmanager:下面這段

  config:
    global:
      resolve_timeout: 5m
      # The smarthost and SMTP sender used for mail notifications.
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: '[email protected]'
      smtp_auth_username: '[email protected]'
      smtp_auth_password: 'xxxxxx'
      # The API URL to use for Slack notifications.
      slack_api_url: 'https://hooks.slack.com/services/some/api/token'
    route:
      group_by: ["job", "alertname"]
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'noemail'
      routes:
      - match:
          severity: critical
        receiver: critical_email_alert
      - match_re:
          alertname: "^KubeJob*"
        receiver: default_email

    receivers:
      - name: 'default_email'
        email_configs:
        - to : '[email protected]'
          send_resolved: true

      - name: 'critical_email_alert'
        email_configs:
        - to : '[email protected]'
          send_resolved: true

      - name: 'noemail'
        email_configs:
        - to : '[email protected]'
          send_resolved: false

  ## Alertmanager template files to format alerts
  ## ref: https://prometheus.io/docs/alerting/notifications/
  ##      https://prometheus.io/docs/alerting/notification_examples/
  ##
  templateFiles:
    template_1.tmpl: |-
      {{ define "cluster" }}{{ .ExternalURL | reReplaceAll ".*alertmanager\\.(.*)" "$1" }}{{ end }}

      {{ define "slack.k8s.text" }}
      {{- $root := . -}}
      {{ range .Alerts }}
       *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
       *Cluster:*  {{ template "cluster" $root }}
       *Description:* {{ .Annotations.description }}
       *Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:>
       *Runbook:* <{{ .Annotations.runbook }}|:spiral_note_pad:>
       *Details:*
         {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
         {{ end }}

6. 小結

Prometheus Operator通過定義servicemonitor和prometheusrule就能動態調整prometheus和alertmanager配置，更加符合Kubernetes的操作習慣，使Kubernetes監控更優雅。

參考資料：
[1] https://www.kancloud.cn/huyipow/prometheus/527093
[2] https://coreos.com/blog/introducing-operators.html
[3] https://coreos.com/blog/the-prometheus-operator.html
[4] https://github.com/coreos/prometheus-operator
[5] https://prometheus.io/docs/introduction/overview/
[6] https://prometheus.io/docs/alerting/alertmanager/
[7] https://github.com/1046102779/prometheus

Kubernetes更優雅的監控工具Prometheus Operator

Kubernetes更優雅的監控工具Prometheus Operator

1. Kubernetes Operator 介紹

2. Prometheus Operator介紹

3. Prometheus Operator部署

4. Grafana新增dashboard

5. Alertmanager新增報警

6. 小結

Kubernetes更優雅的監控工具Prometheus Operator

部署 Prometheus Operator 監控 Kubernetes 叢集

Kubernetes 監控方案之 Prometheus Operator(十九)

kubernetes部署 Prometheus Operator監控系統

prometheus-operator監控Kubernetes

Prometheus Operator 監控Kubernetes

k8s與監控--解讀prometheus監控kubernetes的配置文件

k8s全棧監控之metrics-server和prometheus-operator

prometheus-operator結合grafana展示k8s監控

如何更優雅的在kubernetes平臺下記錄日誌

Docker監控：最佳實踐以及cAdvisor和Prometheus監控工具的對比

Java小工具Lombok安裝和使用，讓JAVA程式碼更優雅

使用prometheus operator監控envoy

容器監控工具heapster與Prometheus的簡要分析

使用 Prometheus + Grafana 對 Kubernetes 進行效能監控的實踐

k8s與監控--從kubernetes監控談prometheus的federation機制

Kubernetes 系列（五）:Prometheus監控框架簡介

更效率、更優雅 | 阿里巴巴開發者工具不完全盤點

Prometheus Grafana實現對Kubernetes Nginx 的監控系統

helm 安裝prometheus operator 並監控ingress

Kubernetes更優雅的監控工具Prometheus Operator

Kubernetes更優雅的監控工具Prometheus Operator

1. Kubernetes Operator 介紹

2. Prometheus Operator介紹

3. Prometheus Operator部署

4. Grafana新增dashboard

5. Alertmanager新增報警

6. 小結

相關推薦