Grafana+Prometheus 搭建 JuiceFS 視覺化監控系統

阿新 • • 發佈：2022-05-25

作為承載海量資料儲存的分散式檔案系統，使用者通常需要直觀地瞭解整個系統的容量、檔案數量、CPU 負載、磁碟 IO、快取等指標的變化。

JuiceFS 沒有重複造輪子，而是通過 Prometheus 相容的 API 對外提供實時的狀態資料，只需將其新增到使用者自建的 Prometheus Server 建立時序資料，然後通過 Grafana 等工具即可輕鬆實現 JucieFS 檔案系統的視覺化監控。

快速上手

這裡假設你搭建的 Prometheus Server、Grafana 與 JuiceFS 客戶端都執行在相同的主機上。其中：

Prometheus Server：用於收集並儲存各種指標的時序資料，安裝方法請參考

官方文件。
Grafana：用於從 Prometheus 讀取並可視化展現時序資料，安裝方法請參考官方文件。

Ⅰ. 獲得實時資料

JuiceFS 通過 Prometheus 型別的 API 對外提供資料。檔案系統掛載後，預設可以通過 http://localhost:9567/metrics 地址獲得客戶端輸出的實時監控資料。

Ⅱ. 新增 API 到 Prometheus Server

編輯 Prometheus 的配置檔案，新增一個新 job 並指向 JuiceFS 的 API 地址，例如：

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "juicefs" 
    static_configs:      - targets: ["localhost:9567"]

假設配置檔名為 prometheus.yml，載入該配置啟動服務：

./prometheus --config.file=prometheus.yml

訪問 http://localhost:9090 即可看到 Prometheus 的介面。

Ⅲ. 通過 Grafana 展現 Prometheus 的資料

如下圖所示，新建 Data Source：

Name: 為了便於識別，可以填寫檔案系統的名稱。
URL: Prometheus 的資料介面，預設為 http://localhost:9090

然後，使用 grafana_template.json 建立一個儀表盤。進入新建的儀表盤即可看到檔案系統的視覺化圖表了：

收集監控指標

根據部署 JuiceFS 的方式不同可以有不同的收集監控指標的方法，下面分別介紹。

掛載點

當通過 juicefs mount 命令掛載 JuiceFS 檔案系統後，可以通過 http://localhost:9567/metrics 這個地址收集監控指標，你也可以通過 --metrics 選項自定義。如：

$ juicefs mount --metrics localhost:9567 ...

你可以使用命令列工具檢視這些監控指標：

$ curl http://localhost:9567/metrics

除此之外，每個 JuiceFS 檔案系統的根目錄還有一個叫做 .stats 的隱藏檔案，通過這個檔案也可以檢視監控指標。例如（這裡假設掛載點的路徑是 /jfs）：

$ cat /jfs/.stats

Kubernetes

JuiceFS CSI 驅動預設會在 mount pod 的 9567 埠提供監控指標，也可以通過在 mountOptions 中新增 metrics 選項自定義（關於如何修改 mountOptions 請參考 CSI 驅動文件），如：

apiVersion: v1
kind: PersistentVolume
metadata:
  name: juicefs-pv
  labels:
    juicefs-name: ten-pb-fs
spec:
  ...
  mountOptions:
    - metrics=0.0.0.0:9567

新增一個抓取任務到 prometheus.yml 來收集監控指標：

scrape_configs:
  - job_name: 'juicefs'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
      action: keep
      regex: juicefs-mount
    - source_labels: [__address__]
      action: replace
      regex: ([^:]+)(:\d+)?
      replacement: $1:9567
      target_label: __address__
    - source_labels: [__meta_kubernetes_pod_node_name]
      target_label: node
      action: replace

這裡假設 Prometheus 服務執行在 Kubernetes 叢集中，如果你的 Prometheus 服務執行在 Kubernetes 叢集之外，請確保 Prometheus 服務可以訪問 Kubernetes 節點，請參考這個 issue 新增 api_server 和 tls_config 配置到以上檔案：

scrape_configs:
  - job_name: 'juicefs'
    kubernetes_sd_configs:
    - api_server: <Kubernetes API Server>
      role: pod
      tls_config:
        ca_file: <...>
        cert_file: <...>
        key_file: <...>
        insecure_skip_verify: false
    relabel_configs:
    ...

S3 閘道器

JuiceFS S3 閘道器預設會在 http://localhost:9567/metrics 這個地址提供監控指標，你也可以通過 --metrics 選項自定義。如：

$ juicefs gateway --metrics localhost:9567 ...

如果你是在 Kubernetes 中部署 JuiceFS S3 閘道器，可以參考 Kubernetes 小節的 Prometheus 配置來收集監控指標（區別主要在於 __meta_kubernetes_pod_label_app_kubernetes_io_name 這個標籤的正則表示式），例如：

scrape_configs:
  - job_name: 'juicefs-s3-gateway'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
        action: keep
        regex: juicefs-s3-gateway
      - source_labels: [__address__]
        action: replace
        regex: ([^:]+)(:\d+)?
        replacement: $1:9567
        target_label: __address__
      - source_labels: [__meta_kubernetes_pod_node_name]
        target_label: node
        action: replace

通過 Prometheus Operator 收集

Prometheus Operator 讓使用者在 Kubernetes 環境中能夠快速部署和管理 Prometheus，藉助 Prometheus Operator 提供的 ServiceMonitor CRD 可以自動生成抓取配置。例如（假設 JuiceFS S3 閘道器的 Service 部署在 kube-system 名字空間）：

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: juicefs-s3-gateway
spec:
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: juicefs-s3-gateway
  endpoints:
    - port: metrics

Hadoop

JuiceFS Hadoop Java SDK 支援把監控指標上報到 Pushgateway 或者 Graphite。

Pushgateway

啟用指標上報到 Pushgateway：

<property>
  <name>juicefs.push-gateway</name>
  <value>host:port</value>
</property>

同時可以通過 juicefs.push-interval 配置修改上報指標的頻率，預設為 10 秒上報一次。

根據 Pushgateway 官方文件的建議，Prometheus 的抓取配置中需要設定 honor_labels: true。

需要特別注意，Prometheus 從 Pushgateway 抓取的指標的時間戳不是 JuiceFS Hadoop Java SDK 上報時的時間，而是抓取時的時間，具體請參考 Pushgateway 官方文件。

預設情況下 Pushgateway 只會在記憶體中儲存指標，如果需要持久化到磁碟上，可以通過 --persistence.file 選項指定儲存的檔案路徑以及 --persistence.interval 選項指定儲存到檔案的頻率（預設 5 分鐘儲存一次）。

每一個使用 JuiceFS Hadoop Java SDK 的程序會有唯一的指標，而 Pushgateway 會一直記住所有收集到的指標，導致指標數持續積累佔用過多記憶體，也會使得 Prometheus 抓取指標時變慢，建議定期清理 Pushgateway 上的指標。

定期使用下面的命令清理 Pushgateway 的指標資料，清空指標不影響執行中的 JuiceFS Hadoop Java SDK 持續上報資料。注意 Pushgateway 啟動時必須指定 --web.enable-admin-api 選項，同時以下命令會清空 Pushgateway 中的所有監控指標。

$ curl -X PUT http://host:9091/api/v1/admin/wipe

Graphite

啟用指標上報到 Graphite：

<property>
  <name>juicefs.push-graphite</name>
  <value>host:port</value>
</property>

同時可以通過 juicefs.push-interval 配置修改上報指標的頻率，預設為 10 秒上報一次。

JuiceFS Hadoop Java SDK 支援的所有配置引數請參考文件。

使用 Consul 作為註冊中心

JuiceFS 支援使用 Consul 作為監控指標 API 的註冊中心，預設的 Consul 地址是 127.0.0.1:8500，你也可以通過 --consul 選項自定義。如：

$ juicefs mount --consul 1.2.3.4:8500 ...

當配置了 Consul 地址以後，--metrics 選項不再需要配置，JuiceFS 將會根據自身網路與埠情況自動配置監控指標 URL。如果同時設定了 --metrics，則會優先嚐試監聽配置的 URL。

註冊到 Consul 上的每個例項，其 serviceName 都為 juicefs，serviceId 的格式為 <IP>:<mount-point>，例如：127.0.0.1:/tmp/jfs。

每個 instance 的 meta 都包含了 hostname 與 mountpoint 兩個維度，其中 mountpoint 為 s3gateway 代表該例項為 S3 閘道器。

視覺化監控指標

Grafana 儀表盤模板

JuiceFS 提供一些 Grafana 的儀表盤模板，將模板匯入以後就可以展示收集上來的監控指標。目前提供的儀表盤模板有：

模板名稱	說明
`grafana_template.json`	用於展示自掛載點、S3 閘道器（非 Kubernetes 部署）及 Hadoop Java SDK 收集的指標
`grafana_template_k8s.json`	用於展示自 Kubernetes CSI 驅動、S3 閘道器（Kubernetes 部署）收集的指標

Grafana 儀表盤示例效果如下圖：

總結

使用 Grafana 做為巨集觀觀測工具，當出現異常情況時可以首先觀察其中是否存在異常指標，再進行進一步的分析。同時重要指標建議設定報警提示，以便實時獲取系統狀態異常的通知。

如有幫助的話歡迎關注我們專案 Juicedata/JuiceFS 喲！ (0ᴗ0✿)

Grafana+Prometheus 搭建 JuiceFS 視覺化監控系統

快速上手

Ⅰ. 獲得實時資料

Ⅱ. 新增 API 到 Prometheus Server

Ⅲ. 通過 Grafana 展現 Prometheus 的資料

收集監控指標

掛載點

Kubernetes

S3 閘道器

通過 Prometheus Operator 收集

Hadoop

Pushgateway

Graphite

使用 Consul 作為註冊中心

視覺化監控指標

Grafana 儀表盤模板

總結

Grafana+Prometheus 搭建 JuiceFS 視覺化監控系統

Flink實戰（七十）：監控（二）搭建flink視覺化監控 Pushgateway+ Prometheus + Grafana （windows ）

利用Prometheus + Grafana 對伺服器效能視覺化監控

雲原生監控系統Prometheus——Sprint Boot視覺化監控

輸電線路視覺化監控系統，遠端監拍夜間工作

Spring Boot + Prometheus + Grafana 打造視覺化監控，一目瞭然！

micrometer + grafana + prometheus搭建JVM監控

jvm視覺化監控搭建

JMX視覺化監控執行緒池

拒絕黑盒應用-Spring Boot 應用視覺化監控

Kafka 視覺化監控和管理 UI工具評估

k8s搭建kuboard 視覺化檢視

Docker部署Portainer搭建輕量級視覺化管理UI

小間距LED視覺化監控平臺在煤礦行業的應用例項

視覺化監控的軟體有哪些？

從聊城資料湖方案淺談，資料中心為什麼建設視覺化坐席系統

關於python tushare Tkinter構建的簡單股票視覺化查詢系統（Beta v0.13）

Dubbo理論、zookeeper安裝、Dubbo-admin視覺化監控安裝

數字孿生之智慧消防3D視覺化管理系統

視覺化監控微服務 Hystrix Dashboard、Turbine stream

Grafana+Prometheus 搭建 JuiceFS 視覺化監控系統

快速上手

Ⅰ. 獲得實時資料

Ⅱ. 新增 API 到 Prometheus Server

Ⅲ. 通過 Grafana 展現 Prometheus 的資料

收集監控指標

掛載點

Kubernetes

S3 閘道器

通過 Prometheus Operator 收集

Hadoop

Pushgateway

Graphite

使用 Consul 作為註冊中心

視覺化監控指標

Grafana 儀表盤模板

總結

相關推薦