1. 程式人生 > 其它 >六、prometheus高可用之thanos

六、prometheus高可用之thanos

一、thanos架構詳解

1.1、thanos是什麼?

thanos是prometheus的高可用解決方案之一,thanos與prometheus無縫整合,並提高了一些高階特性,滿足了長期儲存 + 無限拓展 + 全域性檢視 + 無侵入性的需求

1.2、thanos架構

 這張圖中包含了 Thanos 的幾個核心元件,但並不包括所有元件,簡單介紹下圖中幾個元件:

Thanos Sidecar:連線 Prometheus,將其資料提供給 Thanos Query 查詢,並且/或者將其上傳到物件儲存,以供長期儲存

Thanos Query:實現了 Prometheus API,提供全域性查詢檢視,將來StoreAPI提供的資料進行聚合最終返回給查詢資料的client(如grafana)

Thanos Store Gateway:將物件儲存的資料暴露給 Thanos Query 去查詢。

Thanos Ruler:對監控資料進行評估和告警,還可以計算出新的監控資料,將這些新資料提供給 Thanos Query 查詢並且/或者上傳到物件儲存,以供長期儲存。

Thanos Compact:將物件儲存中的資料進行壓縮和降低取樣率,加速大時間區間監控資料查詢的速度

Thanos Receiver:從 Prometheus 的遠端寫入 WAL 接收資料,將其公開和/或上傳到雲端儲存。

1.3、架構設計剖析

Query 與 Sidecar

首先,監控資料的查詢肯定不能直接查 Prometheus 了,因為會存在許多個 Prometheus 例項,每個 Prometheus 例項只能感知它自己所採集的資料

Thanos Query 實現了 Prometheus 的 HTTP API,能夠 “看懂” PromQL。這樣,查詢 Prometheus 監控資料的 client 就不直接查詢 Prometheus 本身了,而是去查詢 Thanos Query,Thanos Query 再去下游多個儲存了資料的地方查資料,最後將這些資料聚合去重後返回給 client,也就實現了分散式 Prometheus 的資料查詢

那麼 Thanos Query 又如何去查下游分散的資料呢?Thanos 為此抽象了一套叫 Store API 的內部 gRPC 介面,其它一些元件通過這個介面來暴露資料給 Thanos Query,它自身也就可以做到完全無狀態部署,實現高可用與動態擴充套件。

這些分散的資料可能來自哪些地方呢?

首先,Prometheus 會將採集的資料存到本機磁碟上,如果我們直接用這些分散在各個磁碟上的資料,可以給每個 Prometheus 附帶部署一個 Sidecar,這個 Sidecar 實現 Thanos Store API,當 Thanos Query 對其發起查詢時,Sidecar 就讀取跟它繫結部署的 Prometheus 例項上的監控資料返回給 Thanos Query

由於 Thanos Query 可以對資料進行聚合與去重,所以可以很輕鬆實現高可用:相同的 Prometheus 部署多個副本(都附帶 Sidecar),然後 Thanos Query 去所有 Sidecar 查資料,即便有一個 Prometheus 例項掛掉過一段時間,資料聚合與去重後仍然能得到完整資料

不過因為磁碟空間有限,Prometheus 儲存監控資料的能力也是有限的,通常會給 Prometheus 設定一個數據過期時間(預設 15 天)或者最大資料量大小,不斷清理舊資料以保證磁碟不被撐爆。因此,我們無法看到時間比較久遠的監控資料,有時候這也給我們的問題排查和資料統計造成一些困難

對於需要長期儲存的資料,並且使用頻率不那麼高,最理想的方式是存進物件儲存

 

 Store Gateway

那麼這些被上傳到了物件儲存裡的監控資料該如何查詢呢?理論上 Thanos Query 也可以直接去物件儲存查,但這會讓 Thanos Query 的邏輯變的很重。我們剛才也看到了,Thanos 抽象出了 Store API,只要實現了該介面的元件都可以作為 Thanos Query 查詢的資料來源,Thanos Store Gateway 這個元件也實現了 Store API,向 Thanos Query 暴露物件儲存的資料。Thanos Store Gateway 內部還做了一些加速資料獲取的優化邏輯,一是快取了 TSDB 索引,二是優化了物件儲存的請求 (用盡可能少的請求量拿到所有需要的資料)

 這樣就實現了監控資料的長期儲存,由於物件儲存容量無限,所以理論上我們可以存任意時長的資料,監控歷史資料也就變得可追溯查詢,便於問題排查與統計分析

Ruler

有一個問題,Prometheus 不僅僅只支援將採集的資料進行儲存和查詢的功能,還可以配置一些 rules:

  • 根據配置不斷計算出新指標資料並存儲,後續查詢時直接使用計算好的新指標,這樣可以減輕查詢時的計算壓力,加快查詢速度。
  • 不斷計算和評估是否達到告警閥值,當達到閥值時就通知 AlertManager 來觸發告警。

由於我們將 Prometheus 進行分散式部署,每個 Prometheus 例項本地並沒有完整資料,有些有關聯的資料可能存在多個 Prometheus 例項中,單機 Prometheus 看不到資料的全域性檢視,這種情況我們就不能依賴 Prometheus 來做這些工作

這時,Thanos Ruler 就能大顯身手了。它通過查詢 Thanos Query 獲取全域性資料,然後根據 rules 配置計算新指標並存儲,同時也通過 Store API 將資料暴露給 Thanos Query,同樣還可以將資料上傳到物件儲存以供長期儲存(這裡上傳到物件儲存中的資料一樣也是通過 Thanos Store Gateway 暴露給 Thanos Query)

看起來 Thanos Query 跟 Thanos Ruler 之間會相互查詢,不過這個不衝突,Thanos Ruler 為 Thanos Query 提供計算出的新指標資料,而 Thanos Query 為 Thanos Ruler 提供計算新指標所需要的全域性原始指標資料。

至此,Thanos 的核心能力基本實現了,完全相容 Prometheus 情況下提供資料查詢的全域性檢視、高可用以及資料的長期儲存。

那我們還可以怎麼進一步做優化呢?

Compact

由於我們有資料長期儲存的能力,也就可以實現查詢較大時間範圍的監控資料,當時間範圍很大時,查詢的資料量也會很大,這會導致查詢速度非常慢。

通常在檢視較大時間範圍的監控資料時,我們並不需要那麼詳細的資料,只需要看到大致就行。這時我們可以用到 Thanos Compact,它可以讀取物件儲存的資料,對其進行壓縮以及降取樣再上傳到物件儲存,這樣在查詢大時間範圍資料時就可以只讀取壓縮和降取樣後的資料,極大地減少了查詢的資料量,從而加速查詢

 1.4、Sidecar模式和Receiver模式

 Receiver 是做什麼的呢?為什麼需要 Receiver?它跟 Sidecar 有什麼區別?

它們都可以將資料上傳到物件儲存以供長期儲存,區別在於最新資料的儲存。

由於資料上傳不可能實時,Sidecar 模式將最新的監控資料存到 Prometheus 本機,Query 通過調所有 Sidecar 的 Store API 來獲取最新資料,這就成一個問題:如果 Sidecar 數量非常多或者 Sidecar 跟 Query 離的比較遠,每次查詢 Query 都調所有 Sidecar 會消耗很多資源,並且速度很慢,而我們檢視監控大多數情況都是看的最新資料。

為了解決這個問題,Thanos Receiver 元件被提出,它適配了 Prometheus 的 remote write API,也就是所有 Prometheus 例項可以實時將資料 push 到 Thanos Receiver,最新資料也得以集中起來,然後 Thanos Query 也不用去所有 Sidecar 查最新資料了,直接查 Thanos Receiver 即可。

另外,Thanos Receiver 也將資料上傳到物件儲存以供長期儲存,當然,物件儲存中的資料同樣由 Thanos Store Gateway 暴露給 Thanos Query。

 

 有同學可能會問:如果規模很大,Receiver 壓力會不會很大,成為效能瓶頸?當然,設計者在設計這個元件時肯定會考慮這個問題,Receiver 實現了一致性雜湊,支援叢集部署,所以即使規模很大也不會成為效能瓶頸

 

 二、Thanos部署

Thanos 支援雲原生部署方式,充分利用 Kubernetes 的資源排程與動態擴容能力。從官方文件裡可以看到,當前 Thanos 在 Kubernetes 上部署有以下三種:

  • prometheus-operator:叢集中安裝了 prometheus-operator 後,就可以通過建立 CRD 物件來部署 Thanos 了;
  • 社群貢獻的一些 helm charts:很多個版本,目標都是能夠使用 helm 來一鍵部署 thanos;
  • kube-thanos:Thanos 官方的開源專案,包含部署 thanos 到 kubernetes 的 jsonnet 模板與 yaml 示例。

本文將通過prometheus-operator方式部署thanos

2.1、架構圖

root@deploy:~# cat /etc/issue
Ubuntu 20.04.3 LTS \n \l


192.168.1.100 deploy   # 部署和管理k8s的節點

192.168.1.101 devops-master  # 叢集版本 v1.18.9
192.168.1.102 devops-node1
192.168.1.103 devops-node2

192.168.1.110 test-master  # 叢集版本 v1.18.9
192.168.1.111 test-node1
192.168.1.112 test-node2

192.168.1.200 nfs-server

 部署k8s叢集請參考:https://www.cnblogs.com/zhrx/p/15884118.html

2.2、部署nfs-server

root@nfs-server:~# apt install nfs-server nfs-common -y

root@nfs-server:~# vim /etc/exports 
# /etc/exports: the access control list for filesystems which may be exported
#		to NFS clients.  See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes       hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4        gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check)
#
/data *(rw,sync,no_root_squash)


root@nfs-server:~# showmount -e
Export list for nfs-server:
/data *

root@nfs-server:~# systemctl start nfs-server.service

2.2.1、建立nfs-server儲存類

在兩個叢集中都執行

rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-provisioner
  namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
   name: nfs-provisioner-runner
   namespace: default
rules:
   -  apiGroups: [""]
      resources: ["persistentvolumes"]
      verbs: ["get", "list", "watch", "create", "delete"]
   -  apiGroups: [""]
      resources: ["persistentvolumeclaims"]
      verbs: ["get", "list", "watch", "update"]
   -  apiGroups: ["storage.k8s.io"]
      resources: ["storageclasses"]
      verbs: ["get", "list", "watch"]
   -  apiGroups: [""]
      resources: ["events"]
      verbs: ["watch", "create", "update", "patch"]
   -  apiGroups: [""]
      resources: ["services", "endpoints"]
      verbs: ["get","create","list", "watch","update"]
   -  apiGroups: ["extensions"]
      resources: ["podsecuritypolicies"]
      resourceNames: ["nfs-provisioner"]
      verbs: ["use"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-provisioner
    namespace: default
roleRef:
  kind: ClusterRole
  name: nfs-provisioner-runner
  apiGroup: rbac.authorization.k8s.io

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nfs-client-provisioner
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccount: nfs-provisioner
      containers:
        - name: nfs-client-provisioner
          image: registry.cn-hangzhou.aliyuncs.com/open-ali/nfs-client-provisioner
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: nfs-client-root
              mountPath:  /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: zhrx/nfs
            - name: NFS_SERVER
              value: 192.168.1.200
            - name: NFS_PATH
              value: /data
      volumes:
        - name: nfs-client-root
          nfs:
            server: 192.168.1.200
            path: /data

class.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: zhrx-nfs-storage
provisioner: zhrx/nfs
reclaimPolicy: Retain

建立儲存類

kubectl apply -f rbac.yaml
kubectl apply -f deployment.yaml
kubectl apply -f class.yaml

 3.2、部署prometheus和thanos-sidecar容器

下載prometheus-opreator:https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.5.0.tar.gz

root@deploy:~/manifest/prometheus-operator# tar xf kube-prometheus-0.5.tar.gz
root@deploy:~/manifest/prometheus-operator# cd kube-prometheus-0.5.0/manifests

預設映象指向的是官方的,最好的辦法是將映象逐個拉到本地並推送到自己的harbor倉庫方便以後部署,如果網路環境OK的話也可以直接部署,這裡我已經把映象拉下來推送到自己的harbor倉庫了,並且已經修改為自己的倉庫路徑

 部署crd相關資源

root@deploy:~/manifest/prometheus-operator# cd kube-prometheus-0.5.0/manifests/setup/
root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests/setup# ls
0namespace-namespace.yaml                                       prometheus-operator-0prometheusruleCustomResourceDefinition.yaml  prometheus-operator-clusterRoleBinding.yaml
prometheus-operator-0alertmanagerCustomResourceDefinition.yaml  prometheus-operator-0servicemonitorCustomResourceDefinition.yaml  prometheus-operator-deployment.yaml
prometheus-operator-0podmonitorCustomResourceDefinition.yaml    prometheus-operator-0thanosrulerCustomResourceDefinition.yaml     prometheus-operator-service.yaml
prometheus-operator-0prometheusCustomResourceDefinition.yaml    prometheus-operator-clusterRole.yaml                              prometheus-operator-serviceAccount.yaml
root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests/setup# k-devops apply -f .     # devops環境
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests/setup# k-test apply -f .    # test環境
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created

部署promethues相關pod

 修改prometheus-prometheus.yaml配置,新增thanos-sidecar容器和pvc模板配置

注意:部署到不同環境需要修改externalLabels 的標籤值

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  image: harbor.zhrx.com/monitoring/prometheus:v2.15.2
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  externalLabels:
    env: devops           # 部署到不同環境需要修改此處label
    cluster: devops-idc-cluster   # 部署到不同環境需要修改此處label
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.15.2
  storage:                      # 新增pvc模板,儲存類指向nfs
    volumeClaimTemplate:
      apiVersion: v1
      kind: PersistentVolumeClaim
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        storageClassName: zhrx-nfs-storage
  thanos:          # 新增thano-sidecar容器
    baseImage: harbor.zhrx.com/monitoring/thanos
    version: v0.20.0
root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests# k-devops apply -f ./
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-operator created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests# vim prometheus-prometheus.yaml 
root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests# k-test apply -f ./
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-operator created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created

驗證

 

# 驗證thanos-sidecar容器
root@deploy:~# k-devops describe pod prometheus-k8s-0 -n monitoring
.............
  thanos-sidecar:
    Container ID:  docker://7c8b3442ba8f81a5e5828c02e8e4f08b80c416375aea3adab407e9c341ed9f1b
    Image:         harbor.zhrx.com/monitoring/thanos:v0.20.0
    Image ID:      docker-pullable://harbor.zhrx.com/monitoring/thanos@sha256:8bcb077ca3c7d14fe242457d15dd3d98860255c21a673930645891138167d196
    Ports:         10902/TCP, 10901/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      sidecar
      --prometheus.url=http://localhost:9090/
      --tsdb.path=/prometheus
      --grpc-address=[$(POD_IP)]:10901
      --http-address=[$(POD_IP)]:10902
    State:          Running
      Started:      Fri, 25 Mar 2022 15:42:09 +0800
    Ready:          True
    Restart Count:  0
    Environment:
      POD_IP:   (v1:status.podIP)
    Mounts:
      /prometheus from prometheus-k8s-db (rw,path="prometheus-db")
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-9h89g (ro)
.............

暴露thanos-sidecar埠

root@deploy:~/manifest/prometheus-operator# vim thanos-sidecar-nodeport.yaml
apiVersion: v1
kind: Service
metadata:
  name: prometheus-k8s-nodeport
  namespace: monitoring
spec:
  ports:
    - port: 10901
      targetPort: 10901
      nodePort: 30901
  selector:
    app: prometheus
    prometheus: k8s
  type: NodePort
root@deploy:~/manifest/prometheus-operator# k-devops apply -f thanos-sidecar-nodeport.yaml
service/prometheus-k8s-nodeport created
root@deploy:~/manifest/prometheus-operator# k-test apply -f thanos-sidecar-nodeport.yaml
service/prometheus-k8s-nodeport created
root@deploy:~/manifest/prometheus-operator# 
root@deploy:~/manifest/prometheus-operator# k-devops get svc -n monitoring | grep prometheus-k8s-nodeport
prometheus-k8s-nodeport   NodePort    10.68.17.73    <none>        10901:30901/TCP              25s

3.3、部署thanos-query元件

 我這裡是把thanos-query元件部署到了devops叢集

root@deploy:~/manifest/prometheus-operator# vim thanos-query.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-query
  namespace: monitoring
  labels:
    app: thanos-query
spec:
  selector:
    matchLabels:
      app: thanos-query
  template:
    metadata:
      labels:
        app: thanos-query
    spec:
      containers:
        - name: thanos
          image: harbor.zhrx.com/monitoring/thanos:v0.20.0
          args:
            - query
            - --log.level=debug
            - --query.replica-label=prometheus_replica # prometheus-operator 裡面配置的副本標籤為 prometheus_replica
            # Discover local store APIs using DNS SRV.
            - --store=192.168.1.101:30901
            - --store=192.168.1.110:30901
          ports:
            - name: http
              containerPort: 10902
            - name: grpc
              containerPort: 10901
          livenessProbe:
            httpGet:
              path: /-/healthy
              port: http
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              path: /-/healthy
              port: http
            initialDelaySeconds: 15
---
apiVersion: v1
kind: Service
metadata:
  name: thanos-query
  namespace: monitoring
  labels:
    app: thanos-query
spec:
  ports:
    - port: 9090
      targetPort: http
      name: http
      nodePort: 30909
  selector:
    app: thanos-query
  type: NodePort
root@deploy:~/manifest/prometheus-operator# k-devops apply -f thanos-query.yaml
deployment.apps/thanos-query created
service/thanos-query created

root@deploy:~/manifest/prometheus-operator# k-devops get pod -n monitoring | grep query
thanos-query-f9bc76679-jp297           1/1     Running   0          34s

訪問thanos-quey,埠為宿主機的IP:30909

可以看到thanos-query已經識別devops叢集和test叢集的thanos-sidecar,下面就可以查詢這兩個叢集的指標資料

 可以查詢到兩個叢集的指標資料