Prometheus部署（一）

阿新 • • 發佈：2021-11-22

Prometheus部署

Prometheus是最初在SoundCloud上構建的開源系統監視和警報工具包。自2012年成立以來，許多公司和組織都採用了Prometheus，該專案擁有非常活躍的開發人員和使用者社群。Prometheus 於2016年加入了 Cloud Native Computing Foundation，這是繼Kubernetes之後的第二個託管專案。

官網：https://prometheus.io
Exporter是一個採集監控資料並通過Prometheus監控規範對外提供資料的元件，能為Prometheus提供監控的介面。

Exporter將監控資料採集的端點通過HTTP服務的形式暴露給Prometheus Server，Prometheus Server通過訪問該Exporter提供的Endpoint端點，即可獲取到需要採集的監控資料。不同的Exporter負責不同的業務。

Prometheus              開源的系統監控和報警框架，靈感源自Google的Borgmon監控系統

AlertManager            處理由客戶端應用程式（如Prometheus server）傳送的警報。它負責將重複資料刪除，分組和路由到正確的接收者整合，還負責沉默和抑制警報

Node_Exporter           用來監控各節點的資源資訊的exporter，應部署到prometheus監控的所有節點

PushGateway             推送閘道器，用於接收各節點推送的資料並暴露給Prometheus server

文件：https://prometheus.io/docs/introduction/overview/

下載prometheus各元件：

https://prometheus.io/download/

Prometheus介紹

prometheus的特點：

1. 多維的資料模型（基於時間序列的Key、Value鍵值對）

2. 靈活的查詢和聚合語言PromQL

3. 提供本地儲存和分散式儲存

4. 通過基於HTTP的Pull模型採集時間序列資料

5. 可利用Pushgateway（Prometheus的可選中介軟體）實現Push模式

6. 可通過動態服務發現或靜態配置發現目標機器

7. 支援多種圖表和資料大盤

prometheus的元件：

1. Prometheus server，負責拉取、儲存時間序列資料

2. 客戶端庫（client library），插入應用程式程式碼

3. 推送閘道器（push gateway），支援短暫的任務

4. 特殊型別的exporter，支援如HAProxy，StatsD，Graphite等服務

5. 一個alertmanager處理告警

6. 各種支援工具

架構圖
prometheus的使用場景：

prometheus非常適合記錄任何純數字時間序列。它既適合以機器為中心的監視，也適合監視高度動態的面向服務的體系結構。在微服務世界中，它對多維資料收集和查詢的支援是一種特別的優勢。

prometheus的設計旨在提高可靠性，使其成為中斷期間要使用的系統，從而使您能夠快速診斷問題。每個prometheus伺服器都是獨立的，而不依賴於網路儲存或其他遠端服務，當基礎設施部分出現問題時仍然可以使用它。

Prometheus概念

資料模型：

prometheus將所有資料儲存為時間序列：屬於相同 metric名稱和相同標籤組（鍵值對）的時間戳值流。

-metric 和標籤：

每一個時間序列都是由其 metric名稱和一組標籤（鍵值對）組成唯一標識。

metric名稱代表了被監控系統的一般特徵（如 http_requests_total代表接收到的HTTP請求總數）。它可能包含ASCII字母和數字，以及下劃線和冒號，它必須匹配正則表示式[a-zA-Z_:][a-zA-Z0-9_:]*。

注意：冒號是為使用者定義的記錄規則保留的，不應該被exporter使用。

標籤給prometheus建立了多維度資料模型：對於相同的 metric名稱，標籤的任何組合都可以標識該 metric的特定維度例項（例如：所有使用POST方法到 /api/tracks 介面的HTTP請求）。查詢語言會基於這些維度進行過濾和聚合。更改任何標籤值，包括新增或刪除標籤，都會建立一個新的時間序列。

標籤名稱可能包含ASCII字母、數字和下劃線，它必須匹配正則表示式[a-zA-Z_][a-zA-Z0-9_]*。另外，以雙下劃線__開頭的標籤名稱僅供內部使用。

標籤值可以包含任何Unicode字元。標籤值為空的標籤被認為是不存在的標籤。

表示法:

給定 metric名稱和一組標籤，通常使用以下表示法標識時間序列：
<metric name>{<label name>=<label value>, ...}
例如，一個時間序列的 metric名稱是 api_http_requests_total，標籤是 method="POST" 和 handler="/messages"。可以這樣寫：
api_http_requests_total{method="POST", handler="/messages"}
這和OpenTSDB的表示法是一樣的。

metric型別：

Counter             值只能單調增加或重啟時歸零，可以用來表示處理的請求數、完成的任務數、出現的錯誤數量等

Gauge               值可以任意增加或減少，可以用來測量溫度、當前記憶體使用等

Histogram           取樣觀測結果，一般用來請求持續時間或響應大小，並在一個可配置的分佈區間（bucket）內計算這些結果，提供所有觀測結果的總和
                        
                        累加的 counter，代表觀測區間：<basename>_bucket{le="<upper inclusive bound>"}
                        所有觀測值的總數：<basename>_sum
                        觀測的事件數量：<basenmae>_count

Summary             取樣觀測結果，一般用來請求持續時間或響應大小，提供觀測次數及所有觀測結果的總和，還可以通過一個滑動的時間視窗計算可分配的分位數
                        觀測的事件流φ-quantiles (0 ≤ φ ≤ 1)：<basename>{quantile="φ"}
                        所有觀測值的總和：<basename>_sum
                        觀測的事件數量：<basename>_count

例項與任務：
在prometheus中，一個可以拉取資料的端點叫做例項（instance），一般等同於一個程序。一組有著同樣目標的例項（例如為彈性或可用性而複製的程序副本）叫做任務（job）。

當prometheus拉取目標時，它會自動新增一些標籤到時間序列中，用於標識被拉取的目標：

job：目標所屬的任務名稱

instance：目標URL中的<host>:<port>部分

如果兩個標籤在被拉取的資料中已經存在，那麼就要看配置選項 honor_labels 的值來決定行為了。

每次對例項的拉取，prometheus會在以下的時間序列中儲存一個樣本（樣本指的是在一個時間序列中特定時間點的一個值）：

up{job="<job-name>", instance="<instance-id>"}：如果例項健康（可達），則為 1 ，否則為 0

scrape_duration_seconds{job="<job-name>", instance="<instance-id>"}：拉取的時長

scrape_samples_post_metric_relabeling{job="<job-name>", instance="<instance-id>"}：在 metric relabeling 之後，留存的樣本數量

scrape_samples_scraped{job="<job-name>", instance="<instance-id>"}：目標暴露出的樣本數量

up 時間序列對於例項的可用性監控來說非常有用。

https://blog.csdn.net/miss1181248983/article/details/107112451/

Prometheus部署

全部關閉防火牆和selinux
systemctl stop firewalld && systemctl disable firewalld

下載prometheus：

# wget https://github.com/prometheus/prometheus/releases/download/v2.29.2/prometheus-2.13.1.linux-amd64.tar.gz
wget https://github.com/prometheus/prometheus/releases/download/v2.13.1/prometheus-2.13.1.linux-amd64.tar.gz

tar -zxvf prometheus-2.13.1.linux-amd64.tar.gz
mv prometheus-2.13.1.linux-amd64 prometheus
cd prometheus
vi prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

   #採集node exporter監控資料
  - job_name: 'node'
    static_configs:
    - targets: ['172.16.76.246:9100','172.16.245.197:9100','172.16.245.198:9100','172.16.76.245:9100','172.16.76.243:9100','172.16.76.244:9100','172.16.245.196:9100','172.16.76.247:9100','172.16.76.248:9100','172.16.76.249:9100']

安裝prometheus：

# -M 不要自動建立使用者的登入目錄
useradd  -s /sbin/nologin -M prometheus
mkdir  /opt/prometheus/data -p
chown -R prometheus:prometheus /opt/prometheus/
vi /usr/lib/systemd/system/prometheus.service

# vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data
Restart=on-failure
[Install]
WantedBy=multi-user.target

啟動prometheus：

systemctl daemon-reload

systemctl enable prometheus && systemctl start prometheus && systemctl status prometheus
netstat -nltp | grep prometheus

tcp6       0      0 :::9090                 :::*                    LISTEN      27238/prometheus

# 檢視日誌
journalctl -u prometheus -fn 200

瀏覽器訪問：

http://121.40.xxx.xxx:9090/service-discovery
http://121.40.xxx.xxx:9090/targets
http://121.40.xxx.xxx:9090/graph
http://121.40.xxx.xxx:9090/metrics

Prometheus部署（一）