基於Prometheus監控例項

阿新 • • 發佈：2020-08-05

部署Prometheus

基於Prometheus+Grafana監控服務物件，如伺服器，MySQL/mongodb等資料庫

前期準備

軟體下載

#  Prometheus Server
https://prometheus.io/download/

wget -c https://github.com/prometheus/prometheus/releases/download/v2.20.0/prometheus-2.20.0.linux-amd64.tar.gz &

# 告警通知管理元件
wget -c https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz &

# exporter元件
wget -c https://github.com/prometheus/consul_exporter/releases/download/v0.7.1/consul_exporter-0.7.1.linux-amd64.tar.gz &
wget -c https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz &
wget -c https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz &

Prometheus 安裝

傳統二進位制包安裝和 Docker 安裝方式

二進位制包安裝

mkdir -p /ups/app/monitor/
# 解壓
tar -xf prometheus-*.linux-amd64.tar.gz -C /ups/app/monitor/
# 重新命名目錄
cd /ups/app/monitor/
mv prometheus-*.linux-amd64 prometheus
ln -s prometheus-2.20.0 prometheus

# 建立目錄
mkdir -p prometheus/{bin,logs,config/rules,data}
cd prometheus/config && mkdir -p targets/{node,redis,postgresql,mysql}
# 建立使用者
# groupadd -g 2000 prometheus
useradd -r -M -c "Prometheus Server" -d /ups/app/monitor/ -s /sbin/nologin prometheus
# 修改目錄屬主
chown -R prometheus.prometheus /ups/app/monitor/prometheus-2.20.0
# 重構目錄結構
cd /ups/app/monitor/prometheus
mv prometheus promtool tsdb bin/
mv prometheus.yml config/

服務啟動引數項

[root@progs prometheus]# ./bin/prometheus --help
usage: prometheus [<flags>]

The Prometheus monitoring server

Flags:
  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
      --config.file="prometheus.yml"  
                                 Prometheus configuration file path.
      --web.listen-address="0.0.0.0:9090"  
                                 Address to listen on for UI, API, and telemetry.
      --web.read-timeout=5m      Maximum duration before timing out read of the request, and closing idle connections.
      --web.max-connections=512  Maximum number of simultaneous connections.
      --web.external-url=<URL>   The URL under which Prometheus is externally reachable (for example, if Prometheus is served via a reverse
                                 proxy). Used for generating relative and absolute links back to Prometheus itself. If the URL has a path
                                 portion, it will be used to prefix all HTTP endpoints served by Prometheus. If omitted, relevant URL
                                 components will be derived automatically.
      --web.route-prefix=<path>  Prefix for the internal routes of web endpoints. Defaults to path of --web.external-url.
      --web.user-assets=<path>   Path to static asset directory, available at /user.
      --web.enable-lifecycle     Enable shutdown and reload via HTTP request.
      --web.enable-admin-api     Enable API endpoints for admin control actions.
      --web.console.templates="consoles"  
                                 Path to the console template directory, available at /consoles.
      --web.console.libraries="console_libraries"  
                                 Path to the console library directory.
      --web.page-title="Prometheus Time Series Collection and Processing Server"  
                                 Document title of Prometheus instance.
      --web.cors.origin=".*"     Regex for CORS origin. It is fully anchored. Example: 'https?://(domain1|domain2)\.com'
      --storage.tsdb.path="data/"  
                                 Base path for metrics storage.
      --storage.tsdb.retention=STORAGE.TSDB.RETENTION  
                                 [DEPRECATED] How long to retain samples in storage. This flag has been deprecated, use
                                 "storage.tsdb.retention.time" instead.
      --storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME  
                                 How long to retain samples in storage. When this flag is set it overrides "storage.tsdb.retention". If neither
                                 this flag nor "storage.tsdb.retention" nor "storage.tsdb.retention.size" is set, the retention time defaults
                                 to 15d. Units Supported: y, w, d, h, m, s, ms.
      --storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE  
                                 [EXPERIMENTAL] Maximum number of bytes that can be stored for blocks. Units supported: KB, MB, GB, TB, PB.
                                 This flag is experimental and can be changed in future releases.
      --storage.tsdb.no-lockfile  
                                 Do not create lockfile in data directory.
      --storage.tsdb.allow-overlapping-blocks  
                                 [EXPERIMENTAL] Allow overlapping blocks, which in turn enables vertical compaction and vertical query merge.
      --storage.tsdb.wal-compression  
                                 Compress the tsdb WAL.
      --storage.remote.flush-deadline=<duration>  
                                 How long to wait flushing sample on shutdown or config reload.
      --storage.remote.read-sample-limit=5e7  
                                 Maximum overall number of samples to return via the remote read interface, in a single query. 0 means no
                                 limit. This limit is ignored for streamed response types.
      --storage.remote.read-concurrent-limit=10  
                                 Maximum number of concurrent remote read calls. 0 means no limit.
      --storage.remote.read-max-bytes-in-frame=1048576  
                                 Maximum number of bytes in a single frame for streaming remote read response types before marshalling. Note
                                 that client might have limit on frame size as well. 1MB as recommended by protobuf by default.
      --rules.alert.for-outage-tolerance=1h  
                                 Max time to tolerate prometheus outage for restoring "for" state of alert.
      --rules.alert.for-grace-period=10m  
                                 Minimum duration between alert and restored "for" state. This is maintained only for alerts with configured
                                 "for" time greater than grace period.
      --rules.alert.resend-delay=1m  
                                 Minimum amount of time to wait before resending an alert to Alertmanager.
      --alertmanager.notification-queue-capacity=10000  
                                 The capacity of the queue for pending Alertmanager notifications.
      --alertmanager.timeout=10s  
                                 Timeout for sending alerts to Alertmanager.
      --query.lookback-delta=5m  The maximum lookback duration for retrieving metrics during expression evaluations and federation.
      --query.timeout=2m         Maximum time a query may take before being aborted.
      --query.max-concurrency=20  
                                 Maximum number of queries executed concurrently.
      --query.max-samples=50000000  
                                 Maximum number of samples a single query can load into memory. Note that queries will fail if they try to load
                                 more samples than this into memory, so this also limits the number of samples a query can return.
      --log.level=info           Only log messages with the given severity or above. One of: [debug, info, warn, error]
      --log.format=logfmt        Output format of log messages. One of: [logfmt, json]

配置服務項

# 配置服務啟動項
cat > /usr/lib/systemd/system/prometheus.service <<-EOF
[Unit]
Description=https://prometheus.io
After=network.target
#After=postgresql.service mariadb.service mysql.service
Wants=network-online.target

[Service]
User=prometheus
Group=prometheus

Type=simple

WorkingDirectory=/ups/app/monitor/prometheus/
# RuntimeDirectory=prometheus
# RuntimeDirectoryMode=0750
ExecStart=/ups/app/monitor/prometheus/bin/prometheus \
    --config.file=/ups/app/monitor/prometheus/config/prometheus.yml \
    --storage.tsdb.retention=30d \
    --storage.tsdb.path="/ups/app/monitor/prometheus/data/" \
    --web.console.templates=/ups/app/monitor/prometheus/consoles \
    --web.console.libraries=/ups/app/monitor/prometheus/console_libraries \
    --web.enable-lifecycle --web.enable-admin-api \
    --web.listen-address=:9090 
Restart=on-failure
# Sets open_files_limit
LimitNOFILE=10000
TimeoutStopSec=20

StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus

[Install]
WantedBy=multi-user.target
EOF

日誌重定向輸出到指定檔案

cat > /etc/rsyslog.d/prometheus.conf <<-EOF
if \$programname == 'prometheus' then /ups/app/monitor/prometheus/logs/prometheusd.log
& stop
EOF

配置引數檔案

vi /ups/app/monitor/prometheus/config/prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - progs:9093  # 對應啟動的altermanager節點的9093埠

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/alert_node.yml"
  - "rules/alert_mysql.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
    - targets: ['localhost:9100']
    relabel_configs:
    - action: replace
      source_labels: ['__address__']  ##源標籤
      regex: (.*):(.*)                ##正則，會匹配到__address__值
      replacement: $1                 ##引用正則匹配到的內容
      target_label: HOSTNAME          ##賦予新的標籤，名為HOSTNAME

  - job_name: 'MySQL'
    static_configs:
    - targets: ['localhost:9104']
    relabel_configs:
    - action: replace
      source_labels: ['__address__']  ##源標籤
      regex: (.*):(.*)                ##正則，會匹配到__address__值
      replacement: $1                 ##引用正則匹配到的內容
      target_label: instance          ##賦予新的標籤，名為 instance

檢查配置檔案

cd /ups/app/monitor/prometheus
./bin/promtool check config config/prometheus.yml

啟動服務

# 啟動服務
./bin/prometheus --config.file=config/prometheus.yml
或
# 載入服務
systemctl daemon-reload

systemctl enable prometheus.service
systemctl start  prometheus.service
systemctl stop   prometheus.service
systemctl status prometheus.service

重新載入Prometheus服務

增加啟動引數--web.enable-lifecycle可以不關閉服務方式載入配置

curl -X POST http://localhost:9090/-/reload

驗證

# 執行 version 檢查執行環境是否正常
./bin/prometheus version

lsof -i :9090

# 開啟Web介面，預設埠9090
http://192.168.10.181:9090

Docker安裝方式

安裝docker軟體

yum -y install docker

執行命令安裝Prometheus

使用Quay.io or Docker Hub Docker映象倉庫安裝

$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 quay.io/prometheus/prometheus

# 通過prometheus.yml檔案啟動
docker run \
    -p 9090:9090 \
    -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \
    prom/prometheus

# 配置使用額外的卷
docker run \
    -p 9090:9090 \
    -v /path/to/config:/etc/prometheus \
    prom/prometheus

通過Dockerfile安裝

FROM prom/prometheus
ADD prometheus.yml /etc/prometheus/

# 
docker build -t my-prometheus .
docker run -p 9090:9090 my-prometheus

Docker管理prometheus

# 執行 docker ps 檢視所有服務
docker ps


執行 docker start prometheus 啟動服務

執行 docker stats prometheus 檢視 prometheus 狀態

執行 docker stop prometheus 停止服務

配置

Prometheus 啟動的時候，可以載入執行引數 -config.file 指定配置檔案，預設為 prometheus.yml。

在配置檔案中我們可以指定 global, alerting, rule_files, scrape_configs, remote_write, remote_read 等屬性。

全域性配置

global 屬於全域性的預設配置，它主要包含 4 個屬性，

scrape_interval: 拉取 targets 的預設時間間隔。
scrape_timeout: 拉取一個 target 的超時時間。
evaluation_interval: 執行 rules 的時間間隔。
external_labels: 額外的屬性，會新增到拉取的資料並存到資料庫中。

告警配置

可以使用執行引數 -alertmanager.xxx 來配置 Alertmanager，它這種方式不靈活。不支援動態更新載入，以及動態定義告警屬性。

因此，通過alerting 配置主要用來解決這個問題。它能夠更好的管理 Alertmanager, 主要包含 2 個引數：

alert_relabel_configs: 動態修改 alert 屬性的規則配置。
alertmanagers: 用於動態發現 Alertmanager 的配置。

規則配置

rule_files 主要用於配置 rules 檔案，它支援多個檔案以及檔案目錄

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

資料拉取配置

scrape_configs 主要用於配置拉取資料節點，每一個拉取配置主要包含以下引數：

job_name：任務名稱
honor_labels：用於解決拉取資料標籤有衝突，當設定為 true, 以拉取資料為準，否則以服務配置為準
params：資料拉取訪問時帶的請求引數
scrape_interval：拉取時間間隔
scrape_timeout: 拉取超時時間
metrics_path：拉取節點的 metric 路徑
scheme：拉取資料訪問協議
sample_limit：儲存的資料標籤個數限制，如果超過限制，該資料將被忽略，不入儲存；預設值為0，表示沒有限制
relabel_configs：拉取資料重置標籤配置
metric_relabel_configs：metric 重置標籤配置

遠端可寫儲存

remote_write 主要用於可寫遠端儲存配置，主要包含以下引數：

url: 訪問地址
remote_timeout: 請求超時時間
write_relabel_configs: 標籤重置配置, 拉取到的資料，經過重置處理後，傳送給遠端儲存

注意： remote_write 屬於試驗階段，慎用

遠端可讀儲存

remote_read 主要用於可讀遠端儲存配置，主要包含以下引數：

url: 訪問地址
remote_timeout: 請求超時時間

注意： remote_read 屬於試驗階段，慎用

服務發現

在 Prometheus 的配置中，一個最重要的概念就是資料來源 target，而資料來源的配置主要分為靜態配置和動態發現, 大致為以下幾類：

static_configs: 靜態服務發現
dns_sd_configs: DNS 服務發現
file_sd_configs: 檔案服務發現
consul_sd_configs: Consul 服務發現
serverset_sd_configs: Serverset 服務發現
nerve_sd_configs: Nerve 服務發現
marathon_sd_configs: Marathon 服務發現
kubernetes_sd_configs: Kubernetes 服務發現
gce_sd_configs: GCE 服務發現
ec2_sd_configs: EC2 服務發現
openstack_sd_configs: OpenStack 服務發現
azure_sd_configs: Azure 服務發現
triton_sd_configs: Triton 服務發現

配置樣例

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.

rule_files:
  - "rules/node.rules"

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    scrape_interval: 8s
    static_configs:
      - targets: ['127.0.0.1:9100', '127.0.0.12:9100']

  - job_name: 'mysqld'
    static_configs:
      - targets: ['127.0.0.1:9104']
      
  - job_name: 'memcached'
    static_configs:
      - targets: ['127.0.0.1:9150']

部署Grafana

web視覺化軟體

軟體下載地址

# grafana程式包
https://grafana.com/grafana/download 
# grafana-dashboards包
https://github.com/percona/grafana-dashboards/releases


# Standalone Linux Binaries(64 Bit)SHA256: b6cbc04505edb712f206228261d0ea5ab7e9c03e9f77d0d36930886c861366ed
wget https://dl.grafana.com/oss/release/grafana-7.1.1.linux-amd64.tar.gz
tar -xf grafana-7.1.1.linux-amd64.tar.gz

軟體安裝部署

二進位制包安裝

mkdir -p /ups/app/monitor/
# 解壓
tar -xf grafana-*.linux-amd64.tar.gz -C /ups/app/monitor/

# 重新命名目錄
cd /ups/app/monitor/
mv grafana-6.7.1 grafana
mkdir -p /ups/app/monitor/grafana/{logs}

# 建立使用者
# groupadd -g 2001 grafana
useradd -r -d /ups/app/monitor/grafana -c "Grafana Server" -M -s /sbin/nologin grafana

# 修改目錄屬主
chown -R grafana.grafana /ups/app/monitor/grafana

配置服務項

# 配置服務啟動項
cat > /usr/lib/systemd/system/grafana.service <<-EOF
[Unit]
Description=Grafana instance
Documentation=http://docs.grafana.org
Wants=network-online.target
After=network-online.target
#After=After=postgresql-12.service mysql3308.service mysql.service

[Service]
# EnvironmentFile=/etc/sysconfig/grafana-server
User=grafana
Group=grafana
Type=notify
Restart=on-failure
WorkingDirectory=/ups/app/monitor/grafana
RuntimeDirectory=grafana
RuntimeDirectoryMode=0750

# ExecStart=/ups/app/monitor/grafana/bin/grafana-server                               \
#                             --config=\${CONF_FILE}                                   \
#                             --pidfile=\${PID_FILE_DIR}/grafana-server.pid            \
#                             --packaging=rpm                                         \
#                             cfg:default.paths.logs=\${LOG_DIR}                       \
#                             cfg:default.paths.data=\${DATA_DIR}                      \
#                             cfg:default.paths.plugins=\${PLUGINS_DIR}                \
#                             cfg:default.paths.provisioning=\${PROVISIONING_CFG_DIR}  

ExecStart=/ups/app/monitor/grafana/bin/grafana-server
LimitNOFILE=10000
TimeoutStopSec=20

#StandardOutput=syslog
#StandardError=syslog
#SyslogIdentifier=grafana

[Install]
WantedBy=multi-user.target
EOF

日誌重定向輸出到指定檔案(刪)

cat > /etc/rsyslog.d/grafana.conf <<-EOF
if \$programname == 'grafana' then /ups/app/monitor/grafana/logs/grafana.log
& stop
EOF

啟動服務

# 啟動服務
/ups/app/monitor/grafana/bin/grafana-server &
或
# 載入服務
systemctl daemon-reload

systemctl enable  grafana.service
systemctl start   grafana.service
systemctl stop    grafana.service
systemctl restart grafana.service
systemctl status  grafana.service

Docker安裝方式

docker run -d --name=grafana -p 3000:3000 grafana/grafana

驗證

# 開啟Web介面，預設埠3000 (預設賬號/密碼：admin/admin)
http://192.168.10.181:3000

配置檔案

路徑

預設路徑：$WORKING_DIR/conf/defaults.ini
自定義配置：$WORKING_DIR/conf/custom.ini
使用--config引數覆蓋自定義配置檔案路徑
- ./grafana-server --config /custom/config.ini --homepath /custom/homepath cfg:default.paths.logs=/custom/path

新增外掛

語法

[root@progs bin]# ./grafana-cli --help
NAME:
   Grafana CLI - A new cli application

USAGE:
   grafana-cli [global options] command [command options] [arguments...]

VERSION:
   7.1.1

AUTHOR:
   Grafana Project <[email protected]>

COMMANDS:
   plugins  Manage plugins for grafana
   admin    Grafana admin commands
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --pluginsDir value       Path to the Grafana plugin directory (default: "/var/lib/grafana/plugins") [$GF_PLUGIN_DIR]
   --repo value             URL to the plugin repository (default: "https://grafana.com/api/plugins") [$GF_PLUGIN_REPO]
   --pluginUrl value        Full url to the plugin zip file instead of downloading the plugin from grafana.com/api [$GF_PLUGIN_URL]
   --insecure               Skip TLS verification (insecure) (default: false)
   --debug                  Enable debug logging (default: false)
   --configOverrides value  Configuration options to override defaults as a string. e.g. cfg:default.paths.log=/dev/null
   --homepath value         Path to Grafana install/home path, defaults to working directory
   --config value           Path to config file
   --help, -h               show help (default: false)
   --version, -v            print the version (default: false)

# 查詢可用的外掛
grafana-cli plugins list-remote

id: abhisant-druid-datasource version: 0.0.5
id: agenty-flowcharting-panel version: 0.9.0
id: aidanmountford-html-panel version: 0.0.1
id: akumuli-datasource version: 1.3.11
id: alexanderzobnin-zabbix-app version: 3.12.4
id: alexandra-trackmap-panel version: 1.2.5
id: andig-darksky-datasource version: 1.0.1
id: aquaqanalytics-kdbadaptor-datasource version: 1.0.1
id: ayoungprogrammer-finance-datasource version: 1.0.0
id: belugacdn-app version: 1.2.0
id: bessler-pictureit-panel version: 1.0.0
id: blackmirror1-singlestat-math-panel version: 1.1.7
id: blackmirror1-statusbygroup-panel version: 1.1.1
id: bosun-app version: 0.0.28
id: briangann-datatable-panel version: 1.0.2
id: briangann-gauge-panel version: 0.0.6
id: btplc-alarm-box-panel version: 1.0.8
id: btplc-peak-report-panel version: 0.2.4
id: btplc-status-dot-panel version: 0.2.4
id: btplc-trend-box-panel version: 0.1.9
id: camptocamp-prometheus-alertmanager-datasource version: 0.0.8
id: citilogics-geoloop-panel version: 1.1.1
id: cloudflare-app version: 0.1.4
id: cloudspout-button-panel version: 7.0.3
id: cognitedata-datasource version: 2.0.0
id: corpglory-progresslist-panel version: 1.0.5
id: dalmatinerdb-datasource version: 1.0.5
id: dalvany-image-panel version: 2.1.1
id: ddurieux-glpi-app version: 1.3.0
id: devicehive-devicehive-datasource version: 2.0.1
id: devopsprodigy-kubegraf-app version: 1.4.2
id: digiapulssi-breadcrumb-panel version: 1.1.6
id: digiapulssi-organisations-panel version: 1.3.0
id: digrich-bubblechart-panel version: 1.1.0
id: doitintl-bigquery-datasource version: 1.0.8
id: farski-blendstat-panel version: 1.0.2
id: fastweb-openfalcon-datasource version: 1.0.0
id: fatcloud-windrose-panel version: 0.7.0
id: fetzerch-sunandmoon-datasource version: 0.1.6
id: flant-statusmap-panel version: 0.2.0
id: foursquare-clouderamanager-datasource version: 0.9.2
id: fzakaria-simple-annotations-datasource version: 1.0.0
id: gnocchixyz-gnocchi-datasource version: 1.7.0
id: goshposh-metaqueries-datasource version: 0.0.3
id: grafana-azure-data-explorer-datasource version: 2.1.0
id: grafana-azure-monitor-datasource version: 0.3.0
id: grafana-clock-panel version: 1.1.1
id: grafana-googlesheets-datasource version: 1.0.0
id: grafana-image-renderer version: 2.0.0
id: grafana-influxdb-08-datasource version: 1.0.2
id: grafana-influxdb-flux-datasource version: 7.0.0
id: grafana-kairosdb-datasource version: 3.0.1
id: grafana-kubernetes-app version: 1.0.1
id: grafana-piechart-panel version: 1.5.0
id: grafana-polystat-panel version: 1.2.0
id: grafana-simple-json-datasource version: 1.4.0
id: grafana-strava-datasource version: 1.1.1
id: grafana-worldmap-panel version: 0.3.2
id: gretamosa-topology-panel version: 1.0.0
id: gridprotectionalliance-openhistorian-datasource version: 1.0.2
id: gridprotectionalliance-osisoftpi-datasource version: 1.0.4
id: hawkular-datasource version: 1.1.1
id: ibm-apm-datasource version: 0.9.0
id: instana-datasource version: 2.7.3
id: jasonlashua-prtg-datasource version: 4.0.3
id: jdbranham-diagram-panel version: 1.6.2
id: jeanbaptistewatenberg-percent-panel version: 1.0.6
id: kentik-app version: 1.3.4
id: larona-epict-panel version: 1.2.2
id: linksmart-hds-datasource version: 1.0.1
id: linksmart-sensorthings-datasource version: 1.3.0
id: logzio-datasource version: 5.0.0
id: macropower-analytics-panel version: 1.0.0
id: magnesium-wordcloud-panel version: 1.0.0
id: marcuscalidus-svg-panel version: 0.3.3
id: marcusolsson-hourly-heatmap-panel version: 0.4.1
id: marcusolsson-treemap-panel version: 0.2.0
id: michaeldmoore-annunciator-panel version: 1.0.5
id: michaeldmoore-multistat-panel version: 1.4.1
id: monasca-datasource version: 1.0.0
id: monitoringartist-monitoringart-datasource version: 1.0.0
id: moogsoft-aiops-app version: 8.0.0
id: mtanda-google-calendar-datasource version: 1.0.4
id: mtanda-heatmap-epoch-panel version: 0.1.7
id: mtanda-histogram-panel version: 0.1.6
id: mxswat-separator-panel version: 1.0.0
id: natel-discrete-panel version: 0.1.0
id: natel-influx-admin-panel version: 0.0.5
id: natel-plotly-panel version: 0.0.6
id: natel-usgs-datasource version: 0.0.2
id: neocat-cal-heatmap-panel version: 0.0.3
id: novalabs-annotations-panel version: 0.0.1
id: ns1-app version: 0.0.7
id: ntop-ntopng-datasource version: 1.0.0
id: opennms-helm-app version: 5.0.1
id: ovh-warp10-datasource version: 2.2.0
id: paytm-kapacitor-datasource version: 0.1.2
id: percona-percona-app version: 1.0.0
id: petrslavotinek-carpetplot-panel version: 0.1.1
id: pierosavi-imageit-panel version: 0.1.3
id: pr0ps-trackmap-panel version: 2.1.0
id: praj-ams-datasource version: 1.2.0
id: pue-solr-datasource version: 1.0.2
id: quasardb-datasource version: 3.8.2
id: rackerlabs-blueflood-datasource version: 0.0.2
id: radensolutions-netxms-datasource version: 1.2.2
id: raintank-snap-app version: 0.0.5
id: raintank-worldping-app version: 1.2.7
id: redis-datasource version: 1.1.2
id: ryantxu-ajax-panel version: 0.0.7-dev
id: ryantxu-annolist-panel version: 0.0.1
id: satellogic-3d-globe-panel version: 0.1.0
id: savantly-heatmap-panel version: 0.2.0
id: sbueringer-consul-datasource version: 0.1.5
id: scadavis-synoptic-panel version: 1.0.4
id: sidewinder-datasource version: 0.2.0
id: simpod-json-datasource version: 0.2.0
id: skydive-datasource version: 1.2.0
id: smartmakers-trafficlight-panel version: 1.0.0
id: sni-pnp-datasource version: 1.0.5
id: sni-thruk-datasource version: 1.0.3
id: snuids-radar-panel version: 1.4.4
id: snuids-trafficlights-panel version: 1.4.5
id: spotify-heroic-datasource version: 0.0.1
id: stagemonitor-elasticsearch-app version: 0.83.2
id: udoprog-heroic-datasource version: 0.1.0
id: vertamedia-clickhouse-datasource version: 2.0.2
id: vertica-grafana-datasource version: 0.1.0
id: vonage-status-panel version: 1.0.9
id: voxter-app version: 0.0.1
id: xginn8-pagerduty-datasource version: 0.2.1
id: yesoreyeram-boomtable-panel version: 1.3.0
id: yesoreyeram-boomtheme-panel version: 0.1.0
id: zuburqan-parity-report-panel version: 1.2.1

安裝外掛

安裝到預設外掛路徑

./grafana-cli --pluginsDir /ups/app/monitor/grafana/data/plugins plugins install grafana-piechart-panel 
./grafana-cli --pluginsDir /ups/app/monitor/grafana/data/plugins plugins install grafana-polystat-panel
./grafana-cli --pluginsDir /ups/app/monitor/grafana/data/plugins plugins install digiapulssi-breadcrumb-panel

安裝過程截圖

結果確認

./bin/grafana-cli plugins ls

匯入模板

前端介面匯入檔案

後臺配置模板路徑

# 1. 解壓
unzip -qo grafana-dashboards-2.9.0.zip
cd grafana-dashboards-2.9.0
cp -r dashboards /ups/app/monitor/grafana/grafana-dashboards

# 2. 建立 mysqld_export.yml 檔案
cat > /ups/app/monitor/grafana/conf/provisioning/dashboards/mysqld_export.yml <<-EOF
apiVersion: 1

providers:
  - name: 'mysqld_exporter'
     orgId: 1
     folder: ''
     type: file
     options:
       path: /ups/app/monitor/grafana/grafana-dashboards
EOF

# 3. 重啟grafana服務

配置Promethues資料來源

Exporter軟體

在 Prometheus 中負責資料彙報的程式統一叫做 Exporter, 而不同的 Exporter 負責不同的業務。

軟體

主機監控程式（node_exporter）

軟體部署

二進位制安裝

軟體安裝

# 建立使用者
#groupadd -g 2000 prometheus
useradd -r -M -c "Prometheus agent" -d /ups/app/monitor/ -s /sbin/nologin prometheus

# 解壓檔案
mkdir -p /ups/app/monitor/
tar -xf node_exporter-*.linux-amd64.tar.gz -C /ups/app/monitor/ --no-same-owner

# 重新命名目錄
cd /ups/app/monitor/
mv node_exporter-*.linux-amd64 node_exporter

# 修改目錄屬主
# chown -R prometheus.prometheus /ups/app/monitor/node_exporter

配置服務項

# 配置服務檔案
cat > /usr/lib/systemd/system/node_exporter.service <<-EOF
[Unit]
Description=node exporter
Documentation=https://prometheus.io
After=network.target

[Service]
#User=prometheus
#Group=prometheus
Restart=on-failure
ExecStart=/ups/app/monitor/node_exporter/node_exporter --web.listen-address=:9100
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=node_exporter

[Install]
WantedBy=multi-user.target
EOF

日誌重定向輸出到指定檔案

  cat > /etc/rsyslog.d/node_exporter.conf <<-EOF
  if \$programname == 'node_exporter' then /ups/app/monitor/node_exporter/node.log
  & stop
  EOF

啟動服務

# 啟動服務
systemctl daemon-reload
systemctl restart node_exporter.service
systemctl status node_exporter.service

或

# 啟動客戶端
cd /ups/app/monitor/node_exporter
./node_exporter &

Docker安裝

docker run -d -p 9100:9100 \
  -v "/proc:/host/proc:ro" \
  -v "/sys:/host/sys:ro" \
  -v "/:/rootfs:ro" \
  --net="host" \
  quay.io/prometheus/node-exporter \
    -collector.procfs /host/proc \
    -collector.sysfs /host/sys \
    -collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"

接入Prometheus監控

exporter集中式配置

修改prometheus引數檔案

利用 Prometheus 的 static_configs 來拉取 node_exporter 的資料。開啟 prometheus.yml 檔案, 在 scrape_configs 中新增如下配置

# 配置prometheus.yml檔案
cat >> /ups/app/monitor/prometheus/config/prometheus.yml <<-EOF

  - job_name: 'node_exporter'
    scrape_interval: 1s
    file_sd_configs:
      - files:
        - targets/node/nodes-instances.json
        refresh_interval: 10s
    relabel_configs:
    - action: replace
      source_labels: ['__address__']
      regex: (.*):(.*)
      replacement: $1
      target_label: hostname
    - action: labeldrop
      regex: __meta_filepath
EOF

配置主機伺服器列表json檔案

vi /ups/app/monitor/prometheus/config/targets/node/nodes-instances.json

[
  {
    "targets": [ "192.168.10.181:9100","192.168.10.182:9100", "192.168.10.190:9100","192.168.10.191:9100","192.168.10.192:9100"]
  }
]

exporter獨立配置

每個監控物件獨立一個檔案配置

修改Prometheus引數配置檔案

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - progs:9093  # 對應啟動的altermanager節點的9093埠

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/alert_node.yml"
  - "rules/alert_mysql.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    scrape_interval: 1s
    file_sd_configs:
      - files:
        - targets/node/*.yml
        refresh_interval: 10s
    relabel_configs:
    - action: replace
      source_labels: ['__address__']
      regex: (.*):(.*)
      replacement: $1
      target_label: hostname
    - action: labeldrop
      regex: __meta_filepath

配置主機伺服器例項檔案

vi /ups/app/monitor/prometheus/config/targets/node/nodes1-instances.yml
[
  {
    "targets": ["192.168.10.181:9100"],
    "labels": { }
  }
]

vi /ups/app/monitor/prometheus/config/targets/node/nodes2-instances.yml
[
  {
    "targets": ["192.168.10.182:9100"],
    "labels": { }
  }
]

重啟prometheus載入配置

# 檢查並重新載入配置檔案
./bin/promtool check config config/prometheus.yml
# 重啟服務
systemctl restart prometheus

訪問

瀏覽器中訪問 http://IP:9100/metrics

監控功能

預設開啟的功能

名稱	說明	系統
arp	從 `/proc/net/arp` 中收集 ARP 統計資訊	Linux
conntrack	從 `/proc/sys/net/netfilter/` 中收集 conntrack 統計資訊	Linux
cpu	收集 cpu 統計資訊	Darwin, Dragonfly, FreeBSD, Linux
diskstats	從 `/proc/diskstats` 中收集磁碟 I/O 統計資訊	Linux
edac	錯誤檢測與糾正統計資訊	Linux
entropy	可用核心熵資訊	Linux
exec	execution 統計資訊	Dragonfly, FreeBSD
filefd	從 `/proc/sys/fs/file-nr` 中收集檔案描述符統計資訊	Linux
filesystem	檔案系統統計資訊，例如磁碟已使用空間	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon	從 `/sys/class/hwmon/` 中收集監控器或感測器資料資訊	Linux
infiniband	從 InfiniBand 配置中收集網路統計資訊	Linux
loadavg	收集系統負載資訊	Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm	從 `/proc/mdstat` 中獲取裝置統計資訊	Linux
meminfo	記憶體統計資訊	Darwin, Dragonfly, FreeBSD, Linux
netdev	網口流量統計資訊，單位 bytes	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netstat	從 `/proc/net/netstat` 收集網路統計資料，等同於 `netstat -s`	Linux
sockstat	從 `/proc/net/sockstat` 中收集 socket 統計資訊	Linux
stat	從 `/proc/stat` 中收集各種統計資訊，包含系統啟動時間，forks, 中斷等	Linux
textfile	通過 `--collector.textfile.directory` 引數指定本地文字收集路徑，收集文字資訊	any
time	系統當前時間	any
uname	通過 `uname` 系統呼叫, 獲取系統資訊	any
vmstat	從 `/proc/vmstat` 中收集統計資訊	Linux
wifi	收集 wifi 裝置相關統計資料	Linux
xfs	收集 xfs 執行時統計資訊	Linux (kernel 4.4+)
zfs	收集 zfs 效能統計資訊	Linux

預設關閉功能

名稱	說明	系統
bonding	收集系統配置以及啟用的繫結網絡卡數量	Linux
buddyinfo	從 `/proc/buddyinfo` 中收集記憶體碎片統計資訊	Linux
devstat	收集裝置統計資訊	Dragonfly, FreeBSD
drbd	收集遠端映象塊裝置（DRBD）統計資訊	Linux
interrupts	收集更具體的中斷統計資訊	Linux，OpenBSD
ipvs	從 `/proc/net/ip_vs` 中收集 IPVS 狀態資訊，從 `/proc/net/ip_vs_stats` 獲取統計資訊	Linux
ksmd	從 `/sys/kernel/mm/ksm` 中獲取核心和系統統計資訊	Linux
logind	從 `logind` 中收集會話統計資訊	Linux
meminfo_numa	從 `/proc/meminfo_numa` 中收集記憶體統計資訊	Linux
mountstats	從 `/proc/self/mountstat` 中收集檔案系統統計資訊，包括 NFS 客戶端統計資訊	Linux
nfs	從 `/proc/net/rpc/nfs` 中收集 NFS 統計資訊，等同於 `nfsstat -c`	Linux
qdisc	收集佇列推定統計資訊	Linux
runit	收集 runit 狀態資訊	any
supervisord	收集 supervisord 狀態資訊	any
systemd	從 `systemd` 中收集裝置系統狀態資訊	Linux
tcpstat	從 `/proc/net/tcp` 和 `/proc/net/tcp6` 收集 TCP 連線狀態資訊	Linux

監控MySQL

MySQL資料庫伺服器上安裝mysql_exporter

安裝exporter軟體

# 建立使用者
# groupadd -g 2000 prometheus
useradd -u 2000 -M -c "Prometheus agent" -s /sbin/nologin prometheus

# 解壓檔案
mkdir -p /ups/app/monitor/
tar -xf mysqld_exporter-0.12.1.linux-amd64.tar.gz -C /ups/app/monitor/

# 重新命名目錄
cd /ups/app/monitor/
mv mysqld_exporter-0.12.1.linux-amd64 mysqld_exporter

# 修改目錄屬主
chown -R prometheus.prometheus /ups/app/monitor/mysqld_exporter

建立MySQL監控使用者

在待監控MySQL上建立使用者

CREATE USER 'monitor'@'localhost' IDENTIFIED BY 'monitor';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'monitor'@'localhost';
CREATE USER 'monitor'@'192.168.10.%' IDENTIFIED BY 'monitor';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'monitor'@'192.168.10.%';
flush privileges;

配置客戶端賬號密碼檔案

cat > /ups/app/monitor/mysqld_exporter/.my.cnf <<EOF
[client]
user=monitor
password=monitor
port=3308
socket=/ups/app/mysql/mysql3308/logs/mysql3308.sock
host=progs
EOF

chmod 400 /ups/app/monitor/mysqld_exporter/.my.cnf
chown prometheus:prometheus /ups/app/monitor/mysqld_exporter/.my.cnf

配置服務

# 配置服務檔案
cat > /usr/lib/systemd/system/mysql_exporter.service <<-EOF
[Unit]
Description=mysqld exporter
Documentation=https://prometheus.io
After=network.target
After=postgresql-12.service mysql3308.service mysql.service

[Service]
Restart=on-failure
# ExecStart=/ups/app/monitor/mysqld_exporter/mysqld_exporter --config.my-cnf=/ups/app/monitor/mysqld_exporter/.my.cnf

ExecStart=/ups/app/monitor/mysqld_exporter/mysqld_exporter \
            --config.my-cnf=/ups/app/monitor/mysqld_exporter/.my.cnf \
            --collect.info_schema.innodb_tablespaces \
            --collect.info_schema.innodb_metrics  \
            --collect.perf_schema.tableiowaits \
            --collect.perf_schema.indexiowaits \
            --collect.perf_schema.tablelocks \
            --collect.engine_innodb_status \
            --collect.perf_schema.file_events \
            --collect.binlog_size \
            --collect.info_schema.clientstats \
            --collect.perf_schema.eventswaits

StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=mysqld_exporter

[Install]
WantedBy=multi-user.target
EOF

日誌重定向輸出到指定檔案

  cat > /etc/rsyslog.d/mysqld_exporter.conf <<-EOF
  if \$programname == 'mysqld_exporter' then /ups/app/monitor/mysqld_exporter/node.log
  & stop
  EOF

啟動服務

# 啟動服務
systemctl daemon-reload
systemctl restart mysql_exporter.service
systemctl status mysql_exporter.service

或

# 啟動客戶端
./mysqld_exporter --config.my-cnf=/ups/app/monitor/mysqld_exporter/.my.cnf

# 預設埠：9104
lsof -i :9104
netstat -tnlp|grep ':9104'

驗證

http://192.168.10.181:9104/metrics

加入到Prometheus監控(Prometheus Server端)

# 配置prometheus.yml檔案
cat >> /ups/app/monitor/prometheus/config/prometheus.yml <<-EOF

  - job_name: 'MySQL'
    static_configs:
    - targets: ['progs:9104','192.168.10.181:9104']

EOF

重啟prometheus

# 檢查並重新載入配置檔案
./bin/promtool check config config/prometheus.yml
# 重啟服務
systemctl restart prometheus

驗證

http://192.168.10.181:9090/tagets

監控PostgreSQL

軟體部署

下載地址

wget -c https://github.com/wrouesnel/postgres_exporter/releases/download/v0.8.0/postgres_exporter_v0.8.0_linux-amd64.tar.gz

安裝

二進位制包安裝

解壓

tar -xf postgres_exporter_v0.8.0_linux-amd64.tar.gz -C /ups/app/monitor
mv postgres_exporter* postgres_exporter

配置服務項

# 配置服務檔案
cat > /usr/lib/systemd/system/postgres_exporter.service <<-EOF
[Unit]
Description=PostgreSQL Exporter
Documentation=https://github.com/wrouesnel/postgres_exporter
After=network.target

[Service]
User=postgres
Group=postgres
Restart=on-failure
# DATA_SOURCE_NAME=\"postgresql://postgres:postgres@localhost:5432/postgres?sslmode=disable\"; 
ExecStart="export DATA_SOURCE_NAME=\"user=postgres passfile=/home/postgres/.pgpass host=192.168.10.181 port=5432 sslmode=prefer\"; \"/ups/app/monitor/postgres_exporter/postgres_exporter --web.listen-address=:9187 --extend.query-path=/ups/app/monitor/postgres_exporter/queries.yaml\""
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=postgres_exporter

[Install]
WantedBy=multi-user.target
EOF

配置自定義查詢語句檔案

vi /ups/app/monitor/postgres_exporter/queries.yaml

pg_replication:
  query: "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) as lag"
  master: true
  metrics:
    - lag:
        usage: "GAUGE"
        description: "Replication lag behind master in seconds"

pg_postmaster:
  query: "SELECT pg_postmaster_start_time as start_time_seconds from pg_postmaster_start_time()"
  master: true
  metrics:
    - start_time_seconds:
        usage: "GAUGE"
        description: "Time at which postmaster started"

pg_stat_user_tables:
  query: "SELECT current_database() datname, schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_live_tup, n_dead_tup, n_mod_since_analyze, COALESCE(last_vacuum, '1970-01-01Z'), COALESCE(last_vacuum, '1970-01-01Z') as last_vacuum, COALESCE(last_autovacuum, '1970-01-01Z') as last_autovacuum, COALESCE(last_analyze, '1970-01-01Z') as last_analyze, COALESCE(last_autoanalyze, '1970-01-01Z') as last_autoanalyze, vacuum_count, autovacuum_count, analyze_count, autoanalyze_count FROM pg_stat_user_tables"
  metrics:
    - datname:
        usage: "LABEL"
        description: "Name of current database"
    - schemaname:
        usage: "LABEL"
        description: "Name of the schema that this table is in"
    - relname:
        usage: "LABEL"
        description: "Name of this table"
    - seq_scan:
        usage: "COUNTER"
        description: "Number of sequential scans initiated on this table"
    - seq_tup_read:
        usage: "COUNTER"
        description: "Number of live rows fetched by sequential scans"
    - idx_scan:
        usage: "COUNTER"
        description: "Number of index scans initiated on this table"
    - idx_tup_fetch:
        usage: "COUNTER"
        description: "Number of live rows fetched by index scans"
    - n_tup_ins:
        usage: "COUNTER"
        description: "Number of rows inserted"
    - n_tup_upd:
        usage: "COUNTER"
        description: "Number of rows updated"
    - n_tup_del:
        usage: "COUNTER"
        description: "Number of rows deleted"
    - n_tup_hot_upd:
        usage: "COUNTER"
        description: "Number of rows HOT updated (i.e., with no separate index update required)"
    - n_live_tup:
        usage: "GAUGE"
        description: "Estimated number of live rows"
    - n_dead_tup:
        usage: "GAUGE"
        description: "Estimated number of dead rows"
    - n_mod_since_analyze:
        usage: "GAUGE"
        description: "Estimated number of rows changed since last analyze"
    - last_vacuum:
        usage: "GAUGE"
        description: "Last time at which this table was manually vacuumed (not counting VACUUM FULL)"
    - last_autovacuum:
        usage: "GAUGE"
        description: "Last time at which this table was vacuumed by the autovacuum daemon"
    - last_analyze:
        usage: "GAUGE"
        description: "Last time at which this table was manually analyzed"
    - last_autoanalyze:
        usage: "GAUGE"
        description: "Last time at which this table was analyzed by the autovacuum daemon"
    - vacuum_count:
        usage: "COUNTER"
        description: "Number of times this table has been manually vacuumed (not counting VACUUM FULL)"
    - autovacuum_count:
        usage: "COUNTER"
        description: "Number of times this table has been vacuumed by the autovacuum daemon"
    - analyze_count:
        usage: "COUNTER"
        description: "Number of times this table has been manually analyzed"
    - autoanalyze_count:
        usage: "COUNTER"
        description: "Number of times this table has been analyzed by the autovacuum daemon"

pg_statio_user_tables:
  query: "SELECT current_database() datname, schemaname, relname, heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit, toast_blks_read, toast_blks_hit, tidx_blks_read, tidx_blks_hit FROM pg_statio_user_tables"
  metrics:
    - datname:
        usage: "LABEL"
        description: "Name of current database"
    - schemaname:
        usage: "LABEL"
        description: "Name of the schema that this table is in"
    - relname:
        usage: "LABEL"
        description: "Name of this table"
    - heap_blks_read:
        usage: "COUNTER"
        description: "Number of disk blocks read from this table"
    - heap_blks_hit:
        usage: "COUNTER"
        description: "Number of buffer hits in this table"
    - idx_blks_read:
        usage: "COUNTER"
        description: "Number of disk blocks read from all indexes on this table"
    - idx_blks_hit:
        usage: "COUNTER"
        description: "Number of buffer hits in all indexes on this table"
    - toast_blks_read:
        usage: "COUNTER"
        description: "Number of disk blocks read from this table's TOAST table (if any)"
    - toast_blks_hit:
        usage: "COUNTER"
        description: "Number of buffer hits in this table's TOAST table (if any)"
    - tidx_blks_read:
        usage: "COUNTER"
        description: "Number of disk blocks read from this table's TOAST table indexes (if any)"
    - tidx_blks_hit:
        usage: "COUNTER"
        description: "Number of buffer hits in this table's TOAST table indexes (if any)"
        
pg_database:
  query: "SELECT pg_database.datname, pg_database_size(pg_database.datname) as size FROM pg_database"
  master: true
  cache_seconds: 30
  metrics:
    - datname:
        usage: "LABEL"
        description: "Name of the database"
    - size_bytes:
        usage: "GAUGE"
        description: "Disk space used by the database"

pg_stat_statements:
  query: "SELECT t2.rolname, t3.datname, queryid, calls, total_time / 1000 as total_time_seconds, min_time / 1000 as min_time_seconds, max_time / 1000 as max_time_seconds, mean_time / 1000 as mean_time_seconds, stddev_time / 1000 as stddev_time_seconds, rows, shared_blks_hit, shared_blks_read, shared_blks_dirtied, shared_blks_written, local_blks_hit, local_blks_read, local_blks_dirtied, local_blks_written, temp_blks_read, temp_blks_written, blk_read_time / 1000 as blk_read_time_seconds, blk_write_time / 1000 as blk_write_time_seconds FROM pg_stat_statements t1 join pg_roles t2 on (t1.userid=t2.oid) join pg_database t3 on (t1.dbid=t3.oid)"
  master: true
  metrics:
    - rolname:
        usage: "LABEL"
        description: "Name of user"
    - datname:
        usage: "LABEL"
        description: "Name of database"
    - queryid:
        usage: "LABEL"
        description: "Query ID"
    - calls:
        usage: "COUNTER"
        description: "Number of times executed"
    - total_time_seconds:
        usage: "COUNTER"
        description: "Total time spent in the statement, in milliseconds"
    - min_time_seconds:
        usage: "GAUGE"
        description: "Minimum time spent in the statement, in milliseconds"
    - max_time_seconds:
        usage: "GAUGE"
        description: "Maximum time spent in the statement, in milliseconds"
    - mean_time_seconds:
        usage: "GAUGE"
        description: "Mean time spent in the statement, in milliseconds"
    - stddev_time_seconds:
        usage: "GAUGE"
        description: "Population standard deviation of time spent in the statement, in milliseconds"
    - rows:
        usage: "COUNTER"
        description: "Total number of rows retrieved or affected by the statement"
    - shared_blks_hit:
        usage: "COUNTER"
        description: "Total number of shared block cache hits by the statement"
    - shared_blks_read:
        usage: "COUNTER"
        description: "Total number of shared blocks read by the statement"
    - shared_blks_dirtied:
        usage: "COUNTER"
        description: "Total number of shared blocks dirtied by the statement"
    - shared_blks_written:
        usage: "COUNTER"
        description: "Total number of shared blocks written by the statement"
    - local_blks_hit:
        usage: "COUNTER"
        description: "Total number of local block cache hits by the statement"
    - local_blks_read:
        usage: "COUNTER"
        description: "Total number of local blocks read by the statement"
    - local_blks_dirtied:
        usage: "COUNTER"
        description: "Total number of local blocks dirtied by the statement"
    - local_blks_written:
        usage: "COUNTER"
        description: "Total number of local blocks written by the statement"
    - temp_blks_read:
        usage: "COUNTER"
        description: "Total number of temp blocks read by the statement"
    - temp_blks_written:
        usage: "COUNTER"
        description: "Total number of temp blocks written by the statement"
    - blk_read_time_seconds:
        usage: "COUNTER"
        description: "Total time the statement spent reading blocks, in milliseconds (if track_io_timing is enabled, otherwise zero)"
    - blk_write_time_seconds:
        usage: "COUNTER"
        description: "Total time the statement spent writing blocks, in milliseconds (if track_io_timing is enabled, otherwise zero)"

日誌重定向輸出到指定檔案

cat > /etc/rsyslog.d/postgres_exporter.conf <<-EOF
if \$programname == 'postgres_exporter' then /ups/app/monitor/postgres_exporter/exporter.log
& stop
EOF

啟動服務

# 啟動服務
systemctl daemon-reload
systemctl restart postgres_exporter.service
systemctl status postgres_exporter.service


# 命令列啟動客戶端-- postgresql://postgres:password@localhost:5432/postgres
export DATA_SOURCE_NAME="postgresql://postgres:postgres@localhost:5432/postgres?sslmode=disable"
export PG_EXPORTER_EXTEND_QUERY_PATH="/ups/app/monitor/postgres_exporter/queries.yaml"
./postgres_exporter &

Docker安裝

docker run --net=host -e DATA_SOURCE_NAME="postgresql://postgres:password@localhost:5432/postgres?sslmode=disable" wrouesnel/postgres_exporter

接入Prometheus監控

新增配置Prometheus檔案

  - job_name: 'postgres_exporter'
    scrape_interval: 1s
    file_sd_configs:
      - files:
        - targets/postgresql/*.yml
    relabel_configs:
      - action: replace
        source_labels: ['__address__']
        regex: (.*):(.*):(.*)
        replacement: $2
        target_label: hostip
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.10.181:9121

告警規則檔案

vi rules/alert_pg.yml

---
groups:
  - name: PostgreSQL
    rules:
    - alert: PostgreSQLMaxConnectionsReached
      expr: sum(pg_stat_activity_count) by (instance) > sum(pg_settings_max_connections) by (instance)
      for: 1m
      labels:
        severity: email
      annotations:
        summary: "{{ $labels.instance }} has maxed out Postgres connections."
        description: "{{ $labels.instance }} is exceeding the currently configured maximum Postgres connection limit (current value: {{ $value }}s). Services may be degraded - please take immediate action (you probably need to increase max_connections in the Docker image and re-deploy."

    - alert: PostgreSQLHighConnections
      expr: sum(pg_stat_activity_count) by (instance) > sum(pg_settings_max_connections * 0.8) by (instance)
      for: 10m
      labels:
        severity: email
      annotations:
        summary: "{{ $labels.instance }} is over 80% of max Postgres connections."
        description: "{{ $labels.instance }} is exceeding 80% of the currently configured maximum Postgres connection limit (current value: {{ $value }}s). Please check utilization graphs and confirm if this is normal service growth, abuse or an otherwise temporary condition or if new resources need to be provisioned (or the limits increased, which is mostly likely)."

    - alert: PostgreSQLDown
      expr: pg_up != 1
      for: 1m
      labels:
        severity: email
      annotations:
        summary: "PostgreSQL is not processing queries: {{ $labels.instance }}"
        description: "{{ $labels.instance }} is rejecting query requests from the exporter, and thus probably not allowing DNS requests to work either. User services should not be effected provided at least 1 node is still alive."

    - alert: PostgreSQLSlowQueries
      expr: avg(rate(pg_stat_activity_max_tx_duration{datname!~"template.*"}[2m])) by (datname) > 2 * 60
      for: 2m
      labels:
        severity: email
      annotations:
        summary: "PostgreSQL high number of slow on {{ $labels.cluster }} for database {{ $labels.datname }} "
        description: "PostgreSQL high number of slow queries {{ $labels.cluster }} for database {{ $labels.datname }} with a value of {{ $value }} "

    - alert: PostgreSQLQPS
      expr: avg(irate(pg_stat_database_xact_commit{datname!~"template.*"}[5m]) + irate(pg_stat_database_xact_rollback{datname!~"template.*"}[5m])) by (datname) > 10000
      for: 5m
      labels:
        severity: email
      annotations:
        summary: "PostgreSQL high number of queries per second {{ $labels.cluster }} for database {{ $labels.datname }}"
        description: "PostgreSQL high number of queries per second on {{ $labels.cluster }} for database {{ $labels.datname }} with a value of {{ $value }}"

    - alert: PostgreSQLCacheHitRatio
      expr: avg(rate(pg_stat_database_blks_hit{datname!~"template.*"}[5m]) / (rate(pg_stat_database_blks_hit{datname!~"template.*"}[5m]) + rate(pg_stat_database_blks_read{datname!~"template.*"}[5m]))) by (datname) < 0.98
      for: 5m
      labels:
        severity: email
      annotations:
        summary: "PostgreSQL low cache hit rate on {{ $labels.cluster }} for database {{ $labels.datname }}"
        description: "PostgreSQL low on cache hit rate on {{ $labels.cluster }} for database {{ $labels.datname }} with a value of {{ $value }}"

non-superuser收集指標所需許可權

DATA_SOURCE_NAME=postgresql://postgres_exporter:password@localhost:5432/postgres?sslmode=disable

-- To use IF statements, hence to be able to check if the user exists before
-- attempting creation, we need to switch to procedural SQL (PL/pgSQL)
-- instead of standard SQL.
-- More: https://www.postgresql.org/docs/9.3/plpgsql-overview.html
-- To preserve compatibility with <9.0, DO blocks are not used; instead,
-- a function is created and dropped.
CREATE OR REPLACE FUNCTION __tmp_create_user() returns void as $$
BEGIN
  IF NOT EXISTS (
          SELECT                       -- SELECT list can stay empty for this
          FROM   pg_catalog.pg_user
          WHERE  usename = 'postgres_exporter') THEN
    CREATE USER postgres_exporter;
  END IF;
END;
$$ language plpgsql;

SELECT __tmp_create_user();
DROP FUNCTION __tmp_create_user();

ALTER USER postgres_exporter WITH PASSWORD 'password';
ALTER USER postgres_exporter SET SEARCH_PATH TO postgres_exporter,pg_catalog;

-- If deploying as non-superuser (for example in AWS RDS), uncomment the GRANT
-- line below and replace <MASTER_USER> with your root user.
-- GRANT postgres_exporter TO <MASTER_USER>;
CREATE SCHEMA IF NOT EXISTS postgres_exporter;
GRANT USAGE ON SCHEMA postgres_exporter TO postgres_exporter;
GRANT CONNECT ON DATABASE postgres TO postgres_exporter;

CREATE OR REPLACE FUNCTION get_pg_stat_activity() RETURNS SETOF pg_stat_activity AS
$$ SELECT * FROM pg_catalog.pg_stat_activity; $$
LANGUAGE sql
VOLATILE
SECURITY DEFINER;

CREATE OR REPLACE VIEW postgres_exporter.pg_stat_activity
AS
  SELECT * from get_pg_stat_activity();

GRANT SELECT ON postgres_exporter.pg_stat_activity TO postgres_exporter;

CREATE OR REPLACE FUNCTION get_pg_stat_replication() RETURNS SETOF pg_stat_replication AS
$$ SELECT * FROM pg_catalog.pg_stat_replication; $$
LANGUAGE sql
VOLATILE
SECURITY DEFINER;

CREATE OR REPLACE VIEW postgres_exporter.pg_stat_replication
AS
  SELECT * FROM get_pg_stat_replication();

GRANT SELECT ON postgres_exporter.pg_stat_replication TO postgres_exporter;

重新載入配置

curl -X POST http://localhost:9090/-/reload

監控redis

軟體部署

下載地址

wget -c https://github.com/oliver006/redis_exporter/releases/download/v1.9.0/redis_exporter-v1.9.0.linux-amd64.tar.gz

安裝

二進位制包安裝

解壓

tar -xf redis_exporter-v1.9.0.linux-amd64.tar.gz -C /ups/app/monitor/
mv redis_exporter-* redis_exporter

配置服務項

# 配置服務檔案
cat > /usr/lib/systemd/system/redis_exporter.service <<-EOF
[Unit]
Description=Redis Exporter
Documentation=https://github.com/oliver006/redis_exporter
After=network.target

[Service]
#User=prometheus
#Group=prometheus
Restart=on-failure
ExecStart=/ups/app/monitor/redis_exporter/redis_exporter -redis-only-metrics --web.listen-address=:9121
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=redis_exporter

[Install]
WantedBy=multi-user.target
EOF

日誌重定向輸出到指定檔案

cat > /etc/rsyslog.d/redis_exporter.conf <<-EOF
if \$programname == 'redis_exporter' then /ups/app/monitor/redis_exporter/exporter.log
& stop
EOF

啟動服務

# 啟動服務
systemctl daemon-reload
systemctl restart redis_exporter.service
systemctl status redis_exporter.service


# 命令列啟動客戶端
cd /ups/app/monitor/redis_exporter
./redis_exporter &

Docker安裝

docker run -d --name redis_exporter -p 9121:9121 oliver006/redis_exporter

接入Prometheus監控

配置prometheus.yml檔案

新增redis資料採集項

集中式配置

scrape_configs:
  - job_name: 'redis_exporter'
    file_sd_configs:
      - files:
        - targets/redis/redis-instances.json
    metrics_path: /scrape
    relabel_configs:
      - action: replace
        source_labels: ['__address__']
        regex: (.*):(.*):(.*)
        replacement: $2
        target_label: hostip
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.10.181:9121

  ## config for scraping the exporter itself
  - job_name: 'redis_exporter_single'
    static_configs:
      - targets:
        - 192.168.10.181:9121

配置redis伺服器json檔案

vi targets/redis/redis-instances.json

[
  {
    "targets": [ "redis://192.168.10.181:6379", "redis://192.168.10.151:6379"],
    "labels": { }
  }
]

帶密碼URI格式：redis://host:<<PASSWORD>>@<<HOSTNAME>>:<<PORT>>

獨立配置

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - progs:9093  # 對應啟動的altermanager節點的9093埠

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/alert_node.yml"
  - "rules/alert_mysql.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    scrape_interval: 1s
    file_sd_configs:
      - files:
        - targets/node/*.yml
        refresh_interval: 10s
    relabel_configs:
    - action: replace
      source_labels: ['__address__']
      regex: (.*):(.*)
      replacement: $1
      target_label: hostname
    - action: labeldrop
      regex: __meta_filepath

  - job_name: 'redis_exporter'
    scrape_interval: 1s
    file_sd_configs:
      - files:
        - targets/redis/*.yml
    metrics_path: /scrape
    relabel_configs:
      - action: replace
        source_labels: ['__address__']
        regex: (.*):(.*):(.*)
        replacement: $2
        target_label: hostip
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.10.181:9121

配置redis伺服器json檔案

vi targets/redis/redis1_exporter.yml
[
  {
    "targets": [ "redis://192.168.10.181:6379"],
    "labels": { }
  }
]

vi targets/redis/redis2_exporter.yml
[
  {
    "targets": [ "redis://192.168.10.151:6379"],
    "labels": { }
  }
]

重啟prometheus載入配置

# 檢查並重新載入配置檔案
./bin/promtool check config config/prometheus.yml
# 重啟服務
systemctl restart prometheus

告警元件

在 Prometheus 中告警分為兩部分:

Prometheus 服務根據所設定的告警規則將告警資訊傳送給 Alertmanager。
Alertmanager 對收到的告警資訊進行處理，包括去重，降噪，分組，策略路由告警通知。

使用告警服務主要的步驟如下：

下載配置 Alertmanager。
通過設定 -alertmanager.url 讓 Prometheus 服務與 Alertmanager 進行通訊。
在 Prometheus 服務中設定告警規則。

安裝告警管理模組軟體

二進位制安裝

mkdir -p /ups/app/monitor/
# 解壓
tar -xf alertmanager-0.20.0.linux-amd64.tar.gz -C /ups/app/monitor/ --no-same-owner
cd /ups/app/monitor/
mv alertmanager-0.20.0.linux-amd64/ alertmanager

# 建立使用者
# groupadd -g 2000 prometheus
useradd -r -M -s /sbin/nologin -d /ups/app/monitor/alertmanager -c "Prometheus agent" prometheus

# 建立目錄
cd /ups/app/monitor/
mkdir -p alertmanager/{bin,logs,config,data}
cd alertmanager
mv alertmanager.yml config/
mv alertmanager amtool bin/

# 修改目錄屬主
chown -R prometheus.prometheus /ups/app/monitor/alertmanager

配置服務項

# 配置服務啟動項
cat > /usr/lib/systemd/system/alertmanager.service <<-EOF
[Unit]
Description=alertmanager
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/ups/app/monitor/alertmanager/bin/alertmanager \
        --config.file=/ups/app/monitor/alertmanager/config/alertmanager.yml \
        --web.listen-address=192.168.10.181:9093 \
        --cluster.listen-address=0.0.0.0:8001 \
        --storage.path=/ups/app/monitor/alertmanager/data \
        --log.level=info
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

基本配置

cat /ups/app/monitor/alertmanager/config/alertmanager.yml

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

啟動服務

# 載入服務
systemctl daemon-reload
systemctl enable alertmanager.service 
systemctl start alertmanager.service 
systemctl status alertmanager

案例

通過企業微信接收告警

準備工作

註冊企業微信賬號
建立第三方應用，點選建立應用按鈕 -> 填寫應用

詳細配置

prometheus 配置

vi /ups/app/monitor/promethues/config/prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
    - targets: ['localhost:9100']

rules.yml 配置

cat > /ups/app/monitor/promethues/config/rules.yml <<-EOF
groups:
- name: node
  rules:
  - alert: server_status
    expr: up{job="node"} == 0
    for: 15s
    annotations:
      summary: "機器 {{ $labels.instance }} 掛了"
EOF

alertmanger 配置

cat > /ups/app/monitor/alertmanager/config/alertmanager.yml <<-EOF
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'wechat'
receivers:
- name: 'wechat'
  wechat_configs:
  - corp_id: 'ww9e5158867cf67d24'
    to_party: '1'
    agent_id: '1000002'
    api_secret: 'eRDqnTEOtlk2DtPiaxOA2w5fFyNhpIPkdQU-6Ty94cI'
EOF

引數說明：

corp_id: 企業微信賬號唯一 ID，可以在我的企業中檢視。
to_party: 需要傳送的組。
agent_id: 第三方企業應用的 ID，可以在自己建立的第三方企業應用詳情頁面檢視。
api_secret: 第三方企業應用的金鑰，可以在自己建立的第三方企業應用詳情頁面檢視。

附錄

參考文件

基於Prometheus監控例項

部署Prometheus 基於Prometheus+Grafana監控服務物件，如伺服器，MySQL/mongodb等資料庫

基於Rancher k8s部署Prometheus 監控swoole專案核心指標實戰

keywordRancher Prometheus swoole docker docker-compose Prometheus 特性 Prometheus 是由SoundCloud開發的開源監控報警系統和時序列資料庫(TSDB)。Prometheus使用Go語言開發，是Google BorgMon監控系統的開源版本

7.prometheus監控多個MySQL例項

mysqld_exporter集中部署集中部署，就是說我們將所有的mysqld_exporter部署在同一臺伺服器上，在這臺伺服器上對mysqld_exporter進行統一的管理，下面介紹一下集中部署的方法。這裡我們專門起一臺IP為172.18.0.23的伺

prometheus監控多個MySQL例項

新增MySQL監控新增MySQL監控主機，這裡以新增10.10.20.14為例進行說明。解壓exporter壓縮包。

基於docker部署prometheus監控平臺

一、prometheus介紹 Prometheus是一套開源的系統監控報警框架。Prometheus作為新一代的雲原生監控系統，相比傳統監控監控系統（Nagios或者Zabbix）擁有如下優點。

使用Prometheus監控Golang服務-基於YoyoGo框架

Prometheus Prometheus是一個非常棒的工具，結合grafana能夠讓我在不寫程式碼，或者少寫程式碼的情況下搭建一套有效的監控體系。這裡介紹一下Prometheus監控golang程式的方式。

基於 prometheus 的微服務指標監控

基於prometheus的微服務指標監控服務上線後我們往往需要對服務進行監控，以便能及早發現問題並做針對性的優化，監控又可分為多種形式，比如日誌監控，呼叫鏈監控，指標監控等等。而通過指標監控能清晰的觀察出服務指

使用 Prometheus 對 netcore api 進行自定義監控基於 prometheus-net.AspNetCore

準備Prometheus環境和Grafana （3.10）手動或者cli 新增nuget包引用prometheus-net.AspNetCore

如何基於Python和Flask編寫Prometheus監控

介紹 Prometheus 的基本原理是通過 HTTP 週期性抓取被監控元件的狀態。任意元件只要提供對應的 HTTP 介面並且符合 Prometheus 定義的資料格式，就可以接入 Prometheus 監控。

基於 Prometheus 精準監控 & 報警實踐

導讀：交付的專案執行絲滑且無阻客戶體驗良好，沒有 bug 大概是所有鍵盤打工人的夢想。那麼我們能否在客戶感知到 bug 之前就修復掉它，當 bug 產生時如快速的感知並定位到問題，及時進行修復？針對報警的快速感知以

基於javamelody監控springboot專案過程詳解

JavaMelody是用來在QA和實際執行生產環境中監控Java或Java EE應用程式伺服器的一個開源框架。它不是一個工具來模擬來自使用者的請求，而是一個測量和計算使用者在實際操作中應用程式的使用情況的工具，並以圖表的形式

基於python3監控伺服器狀態進行郵件報警

在正式的生產環境中，我們常常會需要監控伺服器的狀態，以保證公司整個業務的正常運轉，常常我們會用到像nagios、zabbix這類工具進行實時監控，那麼用python我們怎麼進行監控呢？這裡我們利用了python3呼叫psutil和y

使用Grafana+Prometheus監控mysql服務效能

Prometheus（也叫普羅米修斯）官網：https://prometheus.io/docs/introduction/overview/ Grafana官網：https://grafana.com/enterprise

07 . Prometheus監控Memcached並配置Grafana

List CentOS7.3 prometheus-2.2.1.linux-amd64.tar.gz redis_exporter-v0.30.0.linux-amd64.tar.gz ` 節點名 IP

使用Operator部署Prometheus監控k8s叢集

一、什麼是Operator Operator是由CoreOS開發的，用來擴充套件Kubernetes API，特定的應用程式控制器，它用來建立、配置和管理複雜的有狀態應用，如資料庫、快取和監控系統。Operator基於Kubernetes的資源和控制器概

09 . Prometheus監控tomcat+jvm

List CentOS7.3 prometheus-2.2.1.linux-amd64.tar.gz redis_exporter-v0.30.0.linux-amd64.tar.gz 節點名 IP 軟體版本

圖文詳解Prometheus監控+Grafana+Alertmanager告警安裝使用

一：前言一個服務上線了後，你想知道這個服務是否可用，需要監控。假如線上出故障了，你要先於顧客感知錯誤，你需要監控。還有對資料庫，伺服器的監控，等等各層面的監控。

使用 Prometheus 監控 kubernetes 叢集

安裝 Promethues ~]# git clone https://github.com/imirsh/kube-prometheus.git ~]# kubectlapply -f kube-prometheus/manifests/setup

Prometheus 監控 mysql

MySQL Server Exporter 用於Prometheus收集MySQL 伺服器的指標的 export 支援版本： MySQL >= 5.6.

Prometheus監控神器-Alertmanager篇(1)

本章節主要涵蓋了Alertmanager的工作機制與配置檔案的比較詳細的知識內容，由淺入深的給大家講解。

基於Prometheus監控例項

部署Prometheus

前期準備

軟體下載

Prometheus 安裝

二進位制包安裝

服務啟動引數項

配置服務項

日誌重定向輸出到指定檔案

配置引數檔案

檢查配置檔案

啟動服務

重新載入Prometheus服務

驗證

Docker安裝方式

安裝docker軟體

執行命令安裝Prometheus

使用Quay.io or Docker Hub Docker映象倉庫安裝

通過Dockerfile安裝

Docker管理prometheus

配置

全域性配置

告警配置

規則配置

資料拉取配置

遠端可寫儲存

遠端可讀儲存

服務發現

配置樣例

部署Grafana

軟體下載地址

軟體安裝部署

二進位制包安裝

配置服務項

日誌重定向輸出到指定檔案(刪)

啟動服務

Docker安裝方式

驗證

配置檔案

路徑

新增外掛

語法

安裝外掛

安裝過程截圖

結果確認

匯入模板

前端介面匯入檔案

後臺配置模板路徑

配置Promethues資料來源

Exporter軟體

主機監控程式（node_exporter）

軟體部署

二進位制安裝

軟體安裝

配置服務項

啟動服務

Docker安裝

接入Prometheus監控

exporter集中式配置

修改prometheus引數檔案

配置主機伺服器列表json檔案

exporter獨立配置

修改Prometheus引數配置檔案

配置主機伺服器例項檔案

重啟prometheus載入配置

訪問

監控功能

預設開啟的功能

預設關閉功能

監控MySQL

安裝exporter軟體

建立MySQL監控使用者

配置客戶端賬號密碼檔案

配置服務

啟動服務

驗證

加入到Prometheus監控(Prometheus Server端)

重啟prometheus

驗證

監控PostgreSQL