Prometheus Operator 教程：根據服務維度對 Prometheus 分片

阿新 • • 發佈：2020-08-10

> 原文連結：[https://fuckcloudnative.io/posts/aggregate-metrics-user-prometheus-operator/](https://fuckcloudnative.io/posts/aggregate-metrics-user-prometheus-operator/) `Promtheus` 本身只支援單機部署，沒有自帶支援叢集部署，也不支援高可用以及水平擴容，它的儲存空間受限於本地磁碟的容量。同時隨著資料採集量的增加，單臺 `Prometheus` 例項能夠處理的時間序列數會達到瓶頸，這時 CPU 和記憶體都會升高，一般記憶體先達到瓶頸，主要原因有： + Prometheus 的記憶體消耗主要是因為每隔 2 小時做一個 `Block` 資料落盤，落盤之前所有資料都在記憶體裡面，因此和採集量有關。 + 載入歷史資料時，是從磁碟到記憶體的，查詢範圍越大，記憶體越大。這裡面有一定的優化空間。 + 一些不合理的查詢條件也會加大記憶體，如 `Group` 或大範圍 `Rate`。這個時候要麼加記憶體，要麼通過叢集分片來減少每個例項需要採集的指標。本文就來討論通過 `Prometheus Operator` 部署的 Prometheus 如何根據服務維度來拆分例項。 ## 1. 根據服務維度拆分 Prometheus Prometheus 主張根據功能或服務維度進行拆分，即如果要採集的服務比較多，一個 Prometheus 例項就配置成僅採集和儲存某一個或某一部分服務的指標，這樣根據要採集的服務將 Prometheus 拆分成多個例項分別去採集，也能一定程度上達到水平擴容的目的。 ![](https://img2020.cnblogs.com/other/1737323/202008/1737323-20200810101043566-299903748.png) 在 Kubernetes 叢集中，我們可以根據 namespace 來拆分 Prometheus 例項，例如將所有 Kubernetes 叢集元件相關的監控傳送到一個 Prometheus 例項，將其他所有監控傳送到另一個 Prometheus 例項。 Prometheus Operator 通過 CRD 資源名 `Prometheus` 來控制 Prometheus 例項的部署，其中可以通過在配置項 `serviceMonitorNamespaceSelector` 和 `podMonitorNamespaceSelector` 中指定標籤來限定抓取 target 的 namespace。例如，將 namespace kube-system 打上標籤 `monitoring-role=system`，將其他的 namespace 打上標籤 `monitoring-role=others`。 ![](https://img2020.cnblogs.com/other/1737323/202008/1737323-20200810101043865-1840444842.png) ## 2. 告警規則拆分將 Prometheus 拆分成多個例項之後，就不能再使用預設的告警規則了，因為預設的告警規則是針對所有 target 的監控指標的，每一個 Prometheus 例項都無法獲取所有 target 的監控指標，勢必會一直報警。為了解決這個問題，需要對告警規則進行拆分，使其與每個 Prometheus 例項的服務維度一一對應，按照上文的拆分邏輯，這裡只需要拆分成兩個告警規則，打上不同的標籤，然後在 CRD 資源 `Prometheus` 中通過配置項 `ruleSelector` 指定規則標籤來選擇相應的告警規則。 ![](https://img2020.cnblogs.com/other/1737323/202008/1737323-20200810101044141-620803473.png) ## 3. 集中資料儲存解決了告警問題之後，還有一個問題，現在監控資料比較分散，使用 Grafana 查詢監控資料時我們也需要新增許多資料來源，而且不同資料來源之間的資料還不能聚合查詢，監控頁面也看不到全域性的檢視，造成查詢混亂的局面。為了解決這個問題，我們可以讓 Prometheus 不負責儲存資料，只將採集到的樣本資料通過 `Remote Write` 的方式寫入遠端儲存的 `Adapter`，然後將 Grafana 的資料來源設為遠端儲存的地址，就可以在 Grafana 中檢視全域性檢視了。這裡選擇 [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) 來作為遠端儲存。[VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) 是一個高效能，低成本，可擴充套件的時序資料庫，可以用來做 Prometheus 的長期儲存，分為單機版本和叢集版本，均已開源。如果資料寫入速率低於每秒一百萬個數據點，官方建議使用單節點版本而不是叢集版本。本文作為演示，僅使用單機版本，架構如圖： ![](https://img2020.cnblogs.com/other/1737323/202008/1737323-20200810101044392-1348686213.png) ## 4. 實踐確定好了方案之後，下面來進行動手實踐。 ### 部署 VictoriaMetrics 首先部署一個單例項的 `VictoriaMetrics`，完整的 yaml 如下： ```yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: victoriametrics namespace: kube-system spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi --- apiVersion: apps/v1 kind: StatefulSet metadata: labels: app: victoriametrics name: victoriametrics namespace: kube-system spec: serviceName: pvictoriametrics selector: matchLabels: app: victoriametrics replicas: 1 template: metadata: labels: app: victoriametrics spec: nodeSelector: blog: "true" containers: - args: - --storageDataPath=/storage - --httpListenAddr=:8428 - --retentionPeriod=1 image: victoriametrics/victoria-metrics imagePullPolicy: IfNotPresent name: victoriametrics ports: - containerPort: 8428 protocol: TCP readinessProbe: httpGet: path: /health port: 8428 initialDelaySeconds: 30 timeoutSeconds: 30 livenessProbe: httpGet: path: /health port: 8428 initialDelaySeconds: 120 timeoutSeconds: 30 resources: limits: cpu: 2000m memory: 2000Mi requests: cpu: 2000m memory: 2000Mi volumeMounts: - mountPath: /storage name: storage-volume restartPolicy: Always priorityClassName: system-cluster-critical volumes: - name: storage-volume persistentVolumeClaim: claimName: victoriametrics --- apiVersion: v1 kind: Service metadata: labels: app: victoriametrics name: victoriametrics namespace: kube-system spec: ports: - name: http port: 8428 protocol: TCP targetPort: 8428 selector: app: victoriametrics type: ClusterIP ``` 有幾個啟動引數需要注意： + **storageDataPath** : 資料目錄的路徑。 VictoriaMetrics 將所有資料儲存在此目錄中。 + **retentionPeriod** : 資料的保留期限（以月為單位）。舊資料將自動刪除。預設期限為1個月。 + **httpListenAddr** : 用於監聽 HTTP 請求的 TCP 地址。預設情況下，它在所有網路介面上監聽埠 `8428`。 ### 給 namespace 打標籤為了限定抓取 target 的 `namespace`，我們需要給 `namespace` 打上標籤，使每個 Prometheus 例項只抓取特定 namespace 的指標。根據上文的方案，需要給 kube-system 打上標籤 `monitoring-role=system`： ```bash $ kubectl label ns kube-system monitoring-role=system ``` 給其他的 namespace 打上標籤 `monitoring-role=others`。例如： ```bash $ kubectl label ns monitoring monitoring-role=others $ kubectl label ns default monitoring-role=others ``` ### 拆分 PrometheusRule 告警規則需要根據監控目標拆分成兩個 `PrometheusRule`。具體做法是將 kube-system namespace 相關的規則整合到一個 PrometheusRule 中，並修改名稱和標籤： ```yaml # prometheus-rules-system.yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: system role: alert-rules name: prometheus-system-rules namespace: monitoring spec: groups: ... ... ``` 剩下的放到另外一個 PrometheusRule 中，並修改名稱和標籤： ```yaml # prometheus-rules-others.yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: others role: alert-rules name: prometheus-others-rules namespace: monitoring spec: groups: ... ... ``` 然後刪除預設的 PrometheusRule： ```bash $ kubectl -n monitoring delete prometheusrule prometheus-k8s-rules ``` 新增兩個 PrometheusRule： ```bash $ kubectl apply -f prometheus-rules-system.yaml $ kubectl apply -f prometheus-rules-others.yaml ``` 如果你實在不知道如何拆分規則，或者不想拆分，想做一個伸手黨，可以看這裡： + [prometheus-rules-system.yaml](https://gist.github.com/yangchuansheng/4310ae9f41513899dc5f0176cdf804b1) + [prometheus-rules-others.yaml](https://gist.github.com/yangchuansheng/102595fc50436cf4a2ce18744467718c) ### 拆分 Prometheus 下一步是拆分 Prometheus 例項，根據上面的方案需要拆分成兩個例項，一個用來監控 `kube-system` namespace，另一個用來監控其他 namespace： ```yaml # prometheus-prometheus-system.yaml apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: prometheus: system name: system namespace: monitoring spec: remoteWrite: - url: http://victoriametrics.kube-system.svc.cluster.local:8428/api/v1/write queueConfig: maxSamplesPerSend: 10000 retention: 2h alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web image: quay.io/prometheus/prometheus:v2.17.2 nodeSelector: beta.kubernetes.io/os: linux podMonitorNamespaceSelector: matchLabels: monitoring-role: system podMonitorSelector: {} replicas: 1 resources: requests: memory: 400Mi limits: memory: 2Gi ruleSelector: matchLabels: prometheus: system role: alert-rules securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: matchLabels: monitoring-role: system serviceMonitorSelector: {} version: v2.17.2 --- apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: prometheus: others name: others namespace: monitoring spec: remoteWrite: - url: http://victoriametrics.kube-system.svc.cluster.local:8428/api/v1/write queueConfig: maxSamplesPerSend: 10000 retention: 2h alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web image: quay.io/prometheus/prometheus:v2.17.2 nodeSelector: beta.kubernetes.io/os: linux podMonitorNamespaceSelector: matchLabels: monitoring-role: others podMonitorSelector: {} replicas: 1 resources: requests: memory: 400Mi limits: memory: 2Gi ruleSelector: matchLabels: prometheus: others role: alert-rules securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: matchLabels: monitoring-role: others serviceMonitorSelector: {} additionalScrapeConfigs: name: additional-scrape-configs key: prometheus-additional.yaml version: v2.17.2 ``` 需要注意的配置： + 通過 `remoteWrite` 指定 remote write 寫入的遠端儲存。 + 通過 `ruleSelector` 指定 PrometheusRule。 + 限制記憶體使用上限為 `2Gi`，可根據實際情況自行調整。 + 通過 `retention` 指定資料在本地磁碟的儲存時間為 2 小時。因為指定了遠端儲存，本地不需要儲存那麼長時間，儘量縮短。 + Prometheus 的自定義配置可以通過 `additionalScrapeConfigs` 在 others 例項中指定，當然你也可以繼續拆分，放到其他例項中。刪除預設的 Prometheus 例項： ```bash $ kubectl -n monitoring delete prometheus k8s ``` 建立新的 Prometheus 例項： ```bash $ kubectl apply -f prometheus-prometheus.yaml ``` 檢視執行狀況： ```bash $ kubectl -n monitoring get prometheus NAME VERSION REPLICAS AGE system v2.17.2 1 29h others v2.17.2 1 29h $ kubectl -n monitoring get sts NAME READY AGE prometheus-system 1/1 29h prometheus-others 1/1 29h alertmanager-main 1/1 25d ``` 檢視每個 Prometheus 例項的記憶體佔用： ```bash $ kubectl -n monitoring top pod -l app=prometheus NAME CPU(cores) MEMORY(bytes) prometheus-others-0 12m 110Mi prometheus-system-0 121m 1182Mi ``` 最後還要修改 Prometheus 的 `Service`，yaml 如下： ```yaml apiVersion: v1 kind: Service metadata: labels: prometheus: system name: prometheus-system namespace: monitoring spec: ports: - name: web port: 9090 targetPort: web selector: app: prometheus prometheus: system sessionAffinity: ClientIP --- apiVersion: v1 kind: Service metadata: labels: prometheus: others name: prometheus-others namespace: monitoring spec: ports: - name: web port: 9090 targetPort: web selector: app: prometheus prometheus: others sessionAffinity: ClientIP ``` 刪除預設的 Service： ```bash $ kubectl -n monitoring delete svc prometheus-k8s ``` 建立新的 Service： ```bash $ kubectl apply -f prometheus-service.yaml ``` ### 修改 Grafana 資料來源 Prometheus 拆分成功之後，最後還要修改 Grafana 的資料來源為 `VictoriaMetrics` 的地址，這樣就可以在 Grafana 中檢視全域性檢視，也能聚合查詢。開啟 Grafana 的設定頁面，將資料來源修改為 `http://victoriametrics.kube-system.svc.cluster.local:8428`： ![](https://img2020.cnblogs.com/other/1737323/202008/1737323-20200810101044830-808500594.png) 點選 Explore 選單： ![](https://img2020.cnblogs.com/other/1737323/202008/1737323-20200810101045251-364097158.webp) 在查詢框內輸入 `up`，然後按下 Shift+Enter 鍵查詢： ![](https://img2020.cnblogs.com/other/1737323/202008/1737323-20200810101045665-3443023.webp) 可以看到查詢結果中包含了所有的 `namespace`。如果你對我的 Grafana 主題配色很感興趣，可以關注公眾號『雲原生實驗室』，後臺回覆 **grafana** 即可獲取祕訣。寫這篇文章的起因是我的 k3s 叢集每臺節點的資源很緊張，而且監控的 target 很多，導致 Prometheus 直接把節點的記憶體資源消耗完了，不停地 `OOM`。為了充分利用我的雲主機，不得不另謀他路，這才有了這篇文章。 ---- Kubernetes 1.18.2 1.17.5 1.16.9 1.15.12離線安裝包釋出地址http://store.lameleg.com ，歡迎體驗。使用了最新的sealos v3.3.6版本。作了主機名解析配置優化，lvscare 掛載/lib/module解決開機啟動ipvs載入問題，修復lvscare社群netlink與3.10核心不相容問題,sealos生成百年證書等特性。更多特性 https://github.com/fanux/sealos 。歡迎掃描下方的二維碼加入釘釘群，釘釘群已經整合sealos的機器人實時可以看到sealos的動態。 ![](https://img2020.cnblogs.com/other/1737323/202008/1737323-20200810101046966-2009870

Prometheus Operator 教程：根據服務維度對 Prometheus 分片

Prometheus Operator 教程：根據服務維度對 Prometheus 分片

教程：node服務端阿里雲配製0到完成

Kendo UI使用教程：CDN服務

Spring Cloud Alibaba基礎教程：Nacos服務發現與配置管理

Kubernetes 教程：根據 PID 獲取 Pod 名稱

《物聯網框架ServerSuperIO教程》-22.Web端對傳感器實時監測與控制。附：v3.6.8版本，支持WebSocket

Java基礎教程：面向對象編程

Java基礎教程：面向對象編程[2]

Java8系列教程：Java8編程入門、面向對象編程、高級編程、核心設計模式 DAO設計模式

解決：一個服務器oracle多實例有一個實例沒啟動，需手動啟動操作指引教程

Spring Cloud系列教程 | 第一篇：微服務架構演進

VARCHART XGantt系列教程：對外觀使用顏色的技巧

一：springCloud服務發現者，服務消費者（方誌朋《史上最簡單的 SpringCloud 教程》專欄讀後感）

最新蘋果退款教程：蘋果ios使用者如何根據政策幫助自己挽回損失，100%真實有效

《物聯網框架ServerSuperIO教程》-22.Web端對感測器實時監測與控制。附：v3.6.8版本，支援WebSocket

深度學習入門教程UFLDL學習實驗筆記二：使用向量化對MNIST資料集做稀疏自編碼

Spring Cloud Alibaba基礎教程：使用Nacos實現服務註冊與發現

Spring Cloud Alibaba基礎教程：支持的幾種服務消費方式（RestTemplate、WebClient、Feign）

開發者應該瞭解的 web 效能 and Web效能優化教程：如何對網站圖片優化？

Spring Cloud Alibaba基礎教程：支持的幾種服務消費方式

Prometheus Operator 教程：根據服務維度對 Prometheus 分片

相關推薦