Prometheus + Spring Boot 應用監控

阿新 • • 發佈：2021-03-05

1. Prometheus是什麼

Prometheus是一個具有活躍生態系統的開源系統監控和告警工具包。一言以蔽之，它是一套開源監控解決方案。

Prometheus主要特性：

多維資料模型，其中包含由指標名稱和鍵/值對標識的時間序列資料
PromQL，一種靈活的查詢語言
不依賴分散式儲存；單伺服器節點是自治的
時間序列收集通過HTTP上的pull模型進行
通過中間閘道器支援推送（push）時間序列
通過服務發現或靜態配置發現目標
支援多種模式的圖形和儀表盤

為什麼用pull（拉取）而不用push（推送）呢？

因為，pull有以下優勢：

進行更改時，可以在膝上型電腦上執行監控
可以更輕鬆地判斷目標是否下線

可以手動轉到目標並使用Web瀏覽器檢查其執行狀況

目標暴露HTTP端點，Prometheus服務端通過HTTP主動拉取資料。既然是服務端自己主動向目標拉取資料，那麼服務端執行在本地（我們自己的電腦上）也是可以的，只要能訪問目標端點即可，同時就像心跳檢測一樣可以判斷目標是否下線，還有，服務端自己主動拉取，那麼想拉取誰的資料就拉取誰的資料，因而可以隨意切換拉取目標。

回想一下Skywalking是怎麼做的，SkyWalking有客戶端和服務端，需要在目標服務上安裝探針（agent），探針採集目標服務的指標資料，上報給服務端OAP服務，這個對目標有一定的侵入性，不過可以接受。Prometheus不需要探針，可以藉助push gateway來實現push效果。

對了，有個名詞要先說清楚，metrics （譯：度量，指標），個人更傾向於把它翻譯成指標，後面說指標就是metrics

2. 基本概念

2.1. 資料模型

Prometheus基本上將所有資料儲存為時間序列：具有時間戳的值流，它們屬於同一個指標和同一組標記的維度。除了儲存的時間序列外，Prometheus還可以生成臨時派生的時間序列作為查詢的結果。

Metric names and labels

Every time series is uniquely identified by its metric name and optional key-value pairs called labels.

每個時間序列都由它的指標名稱和稱為標籤的可選鍵/值對唯一標識。

樣本構成實際的時間序列資料。每個樣本包括：

一個64位的浮點值
一個毫秒時間戳

給定指標名稱和一組標籤，時間序列通常使用這種符號來標識：

<metric name>{<label name>=<label value>, ...}

例如，有一個時間序列，指標名稱是api_http_requests_total，標籤有method="POST"和handler="/messages"，那麼它可能被表示成這樣：

api_http_requests_total{method="POST", handler="/messages"}

2.2. 指標型別

Counter

counter是一個累積量度，代表一個單調遞增的計數器，其值只能增加或在重新啟動時重置為零。例如，可以使用計數器來表示已服務請求數，已完成任務或錯誤的數量。

不要使用計數器來顯示可以減小的值。例如，請勿對當前正在執行的程序數使用計數器，代替的應該使用量規。

Gauge

量規是一種指標，代表可以任意上下波動的單個數值。

量規通常用於測量值，例如溫度或當前記憶體使用量，還用於可能上升和下降的“計數”，例如併發請求數。

Histogram

直方圖對觀察結果（通常是請求持續時間或響應大小）進行抽樣，並在可配置的桶中對它們進行計數。它還提供了所有觀測值的總和。

一個基礎指標名稱為<basename>的直方圖在抓取期間會暴露多個時間序列：

觀察桶的累積計數器，表示為 <basename>_bucket{le="<upper inclusive bound>"}
所有觀測值的總和，表示為 <basename>_sum
觀察到的事件數量，表示為 <basename>_count

Summary

與直方圖類似，摘要對觀察結果（通常是請求持續時間和響應大小等內容）進行抽樣分析。雖然它還提供了觀測值的總數和所有觀測值的總和，但它可以計算滑動時間視窗內的可配置分位數。

一個基礎指標名稱為<basename>的摘要在抓取期間暴露多個時間序列:

觀測事件的φ分位數（0≤φ≤1）流，表示為<basename>{quantile="<φ>"}
所有觀測值的總和，表示為 <basename>_sum
觀察到的事件數，表示為 <basename>_count

2.3. 作業和例項

在Prometheus的術語中，可以抓取的端點稱為例項，通常對應於單個程序。具有相同目的的例項集合，稱為作業。

例如，一個作業有四個例項：

job: api-server
- instance 1: 1.2.3.4:5670
- instance 2: 1.2.3.4:5671
- instance 3: 5.6.7.8:5670
- instance 4: 5.6.7.8:5671

當Prometheus抓取目標時，它會自動在抓取的時間序列上附加一些標籤，以識別被抓取的目標：

job：目標所屬的已配置的作業名稱
instance：被抓取的目標URL的<host>:<port>部分

3. 安裝與配置

Prometheus通過抓取指標HTTP端點從目標收集指標。由於Prometheus以相同的方式暴露自己的資料，因此它也可以抓取並監視其自身的健康狀況。

預設情況下，不用更改配置，直接執行就可以抓取prometheus自身的健康狀況資料

# Start Prometheus.
# By default, Prometheus stores its database in ./data (flag --storage.tsdb.path)

./prometheus --config.file=prometheus.yml

直接訪問 localhost:9090

訪問 localhost:9090/metrics 可以檢視各項指標

舉個例子

輸入以下表達式，點“Execute”，可以看到以下效果

prometheus_target_interval_length_seconds

這應該返回多個不同的時間序列（以及每個序列的最新值），每個序列的指標名稱均為prometheus_target_interval_length_seconds，但具有不同的標籤。

這個是以圖形化的方式展示指標，通過localhost:9090/metrics檢視也是一樣的

如果我們只對99%的延遲感興趣，我們可以使用以下查詢：

prometheus_target_interval_length_seconds{quantile="0.99"}

為了計算返回的時間序列數，查詢應該這樣寫：

count(prometheus_target_interval_length_seconds)

接下來，讓我們利用Node Exporter來多新增幾個目標：

tar -xzvf node_exporter-*.*.tar.gz
cd node_exporter-*.*

# Start 3 example targets in separate terminals:
./node_exporter --web.listen-address 127.0.0.1:8080
./node_exporter --web.listen-address 127.0.0.1:8081
./node_exporter --web.listen-address 127.0.0.1:8082

接下來，配置Prometheus來抓取這三個新目標

首先，定義一個名為'node'的作業，這個作業負責從這三個目標端點抓取資料。假設，想象前兩個端點是生產環境的，另一個是非生產環境的，為了以示區別，我們將其打上兩個不同的標籤。在本示例中，我們將group="production"標籤新增到第一個目標組，同時將group="canary"新增到第二個目標。

scrape_configs:
  - job_name:       'node'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:8080', 'localhost:8081']
        labels:
          group: 'production'

      - targets: ['localhost:8082']
        labels:
          group: 'canary'

3.1. 配置

為了檢視所有的命令列引數，執行如下命令

./prometheus -h

配置檔案是YAML格式的，可以使用 --config.file引數指定

配置檔案的主要結構如下：

global:
  # How frequently to scrape targets by default.
  [ scrape_interval: <duration> | default = 1m ]

  # How long until a scrape request times out.
  [ scrape_timeout: <duration> | default = 10s ]

  # How frequently to evaluate rules.
  [ evaluation_interval: <duration> | default = 1m ]

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    [ <labelname>: <labelvalue> ... ]

  # File to which PromQL queries are logged.
  # Reloading the configuration will reopen the file.
  [ query_log_file: <string> ]

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
  [ - <filepath_glob> ... ]

# A list of scrape configurations.
scrape_configs:
  [ - <scrape_config> ... ]

# Alerting specifies settings related to the Alertmanager.
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

# Settings related to the remote write feature.
remote_write:
  [ - <remote_write> ... ]

# Settings related to the remote read feature.
remote_read:
  [ - <remote_read> ... ]

4. 抓取 Spring Boot 應用

Prometheus希望抓取或輪詢單個應用程式例項以獲取指標。 Spring Boot在 /actuator/prometheus 提供了一個actuator端點，以適當的格式提供Prometheus抓取。

為了以Prometheus伺服器可以抓取的格式公開指標，需要依賴 micrometer-registry-prometheus

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <version>1.6.4</version>
</dependency>

下面是一個示例 prometheus.yml

scrape_configs:
  - job_name: 'spring'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['HOST:PORT']

接下來，建立一個專案，名為prometheus-example

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.4.3</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.cjs.example</groupId>
    <artifactId>prometheus-example</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>prometheus-example</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>1.8</java.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
            <scope>runtime</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>

application.yml

spring:
  application:
    name: prometheus-example
management:
  endpoints:
    web:
      exposure:
        include: "*"
  metrics:
    tags:
      application: ${spring.application.name}

這句別忘了： management.metrics.tags.application=${spring.application.name}

Spring Boot Actuator 預設的端點很多，詳見

https://docs.spring.io/spring-boot/docs/2.4.3/reference/html/production-ready-features.html#production-ready-endpoints

啟動專案，瀏覽器訪問 /actuator/prometheus 端點

配置Prometheus抓取該應用

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']
  
  - job_name: 'springboot-prometheus'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['192.168.100.93:8080']

重啟服務

./prometheus --config.file=prometheus.yml

4.1. Grafana

https://grafana.com/docs/

https://grafana.com/tutorials/

下載&解壓

wget https://dl.grafana.com/oss/release/grafana-7.4.3.linux-amd64.tar.gz
tar -zxvf grafana-7.4.3.linux-amd64.tar.gz

啟動

./bin/grafana-server web

瀏覽器訪問 http://localhost:3000

預設賬號是 admin/admin

首次登陸後我們將密碼改成admin1234

先配置一個數據源，一會兒新增儀表盤的時候要選擇資料來源的

Grafana官方提供了很多模板，我們可以直接使用

首先要找到我們想要的模板

比如，我們這裡隨便選了一個模板

可以直接將模板JSON檔案下載下來匯入，也可以直接輸入模板ID載入，這裡我們直接輸入模板ID

立竿見影，馬上就看到漂亮的展示介面了

我們再新增一個DashBoard （ID：12856）

Prometheus + Spring Boot 應用監控

Prometheus + Spring Boot 應用監控

Spring Boot應用監控的實戰教程

springboot(十九)：使用Spring Boot Actuator監控應用

Spring Boot應用的健康監控

使用Spring Boot Actuator監控應用

spring Boot(十九)：使用Spring Boot Actuator監控應用

Spring Boot應用中監控tomcat,druid連線池狀態

Spring Boot （十）： Spring Boot Admin 監控 Spring Boot 應用

拒絕黑盒應用-Spring Boot 應用視覺化監控

Spring-Boot-應用視覺化監控

深入JVM分析spring-boot應用hibernate-validator

Spring Boot應用的啟動和停止（Spring Boot應用通過start命令啟動）

Docker 容器整合 Spring Boot 應用

構建Spring Boot應用鏡像

將Spring Boot應用程序遷移到Java9：兼容性

Spring Boot 應用AOP

開發者測試-采用精準測試工具對Spring Boot應用進行測試

用 Docker、Gradle 來構建、運行、發布一個 Spring Boot 應用

Spring Boot應用總結更新

SpringBoot學習_使用嚮導快速建立Spring Boot應用

Prometheus + Spring Boot 應用監控

相關推薦