Prometheus 監控之 Blackbox_exporter黑盒監測 [icmp、tcp、http(get\post)、dns、ssl證書過期時間]
阿新 • • 發佈:2018-12-25
Blackbox_exporter 主動監測主機與服務狀態
Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的監控資料採集
官方github: https://github.com/prometheus/blackbox_exporter
部署Blackbox_exporter
[[email protected] ]$ cd /usr/local/blackbox_exporter/
[[email protected] ]$ wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.12.0/blackbox_exporter-0.12.0.linux-amd64.tar.gz
[ [email protected] ]$ tar zxvf blackbox_exporter-0.12.0.linux-amd64.tar.gz
[[email protected] blackbox_exporter-0.12.0.linux-amd64]$ cd blackbox_exporter-0.12.0.linux-amd64
[[email protected] blackbox_exporter-0.12.0.linux-amd64]$ ll
total 15720
-rwxr-xr-x. 1 1000 1000 16074005 Feb 27 2018 blackbox_exporter
-rw-rw-r--. 1 1000 1000 932 Nov 21 16:05 blackbox.yml
-rw-rw-r--. 1 1000 1000 11357 Feb 27 2018 LICENSE
-rw-rw-r--. 1 1000 1000 94 Feb 27 2018 NOTICE
[ [email protected] blackbox_exporter-0.12.0.linux-amd64]$cp -r blackbox_exporter /usr/local/bin
[[email protected] blackbox_exporter-0.12.0.linux-amd64]$ cat /etc/supervisord.conf|grep blackbox -A 20
[program:blackbox_exporter]
command=/usr/local/bin/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox_exporter-0.12.0.linux-amd64/blackbox.yml
stdout_logfile= /tmp/prometheus/blackbox_exporter.log
autostart=true
autorestart=true
startsecs=5
priority=1
user=root
stopasgroup=true
killasgroup=true
[[email protected] blackbox_exporter-0.12.0.linux-amd64]$ supervisorctl status |grep blackbox
blackbox_exporter RUNNING pid 25343, uptime 0:19:25
blackbox.yml 檔案
- 通過 blackbox.yml 定義模組詳細資訊
- 在 Prometheus 配置檔案中引用該模組以及配置被監控目標主機
modules:
http_2xx:
prober: http
timeout: 10s
http:
preferred_ip_protocol: "ip4" ##如果http監測是使用ipv4 就要寫上,目前國內使用ipv6很少。
http_post_2xx_query: ##用於post請求使用的模組)由於每個介面傳參不同 可以定義多個module 用於不同介面(例如此命名為http_post_2xx_query 用於監測query.action介面
prober: http
timeout: 15s
http:
preferred_ip_protocol: "ip4" ##使用ipv4
method: POST
headers:
Content-Type: application/json ##header頭
body: '{"hmac":"","params":{"publicFundsKeyWords":"xxx"}}' ##傳參
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
timeout: 5s
icmp:
Blackbox_exporter 應用場景
- HTTP 測試
定義 Request Header 資訊
判斷 Http status / Http Respones Header / Http Body 內容 - TCP 測試
業務元件埠狀態監聽
應用層協議定義與監聽 - ICMP 測試
主機探活機制 - POST 測試
介面聯通性 - SSL 證書過期時間
HTTP 測試
- 相關程式碼塊新增到 Prometheus 檔案內
- 對應 blackbox.yml檔案的 http_2xx 模組
- job_name: 'blackbox_http_2xx'
scrape_interval: 45s
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- https://www.baidu.com/
- 172.0.0.1:9090
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.XXX.XX.XX:9115 # The blackbox exporter's real hostname:port.
http截圖
TCP 測試
- 監聽 業務埠地址,用來判斷服務是否線上,我覺的和telnet 差不多
- 相關程式碼塊新增到 Prometheus 檔案內
- 對應 blackbox.yml檔案的 tcp_connect 模組
- job_name: "blackbox_telnet_port]"
scrape_interval: 5s
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets: [ '1x3.x1.xx.xx4:443' ]
labels:
group: 'xxxidc機房ip監控'
- targets: ['10.xx.xx.xxx:443']
labels:
group: 'Process status of nginx(main) server'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.xxx.xx.xx:9115
tcp_connect截圖
ICMP 測試
- 相關程式碼塊新增到 Prometheus 配置檔案內
- 對應 blackbox.yml檔案的 icmp 模組
- job_name: 'blackbox00_ping_idc_ip'
scrape_interval: 10s
metrics_path: /probe
params:
module: [icmp] #ping
static_configs:
- targets: [ '1x.xx.xx.xx' ]
labels:
group: 'xxnginx 虛擬IP'
relabel_configs:
- source_labels: [__address__]
regex: (.*)(:80)?
target_label: __param_target
replacement: ${1}
- source_labels: [__param_target]
regex: (.*)
target_label: ping
replacement: ${1}
- source_labels: []
regex: .*
target_label: __address__
replacement: 1x.xxx.xx.xx:9115
icmp截圖
POST 測試
- 監聽業務介面地址,用來判斷介面是否線上
- 相關程式碼塊新增到 Prometheus 檔案內
- 對應 blackbox.yml檔案的 http_post_2xx_query 模組(監聽query.action這個介面)
- job_name: 'blackbox_http_2xx_post'
scrape_interval: 10s
metrics_path: /probe
params:
module: [http_post_2xx_query]
static_configs:
- targets:
- https://xx.xxx.com/api/xx/xx/fund/query.action
labels:
group: 'Interface monitoring'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 1x.xx.xx.xx:9115 # The blackbox exporter's real hostname:port.
POST截圖
檢視監聽過程
類似於
curl http://172.16.10.65:9115/probe?target=prometheus.io&module=http_2xx&debug=true
告警應用測試
icmp、tcp、http、post 監測是否正常可以觀察probe_success 這一指標
probe_success == 0 ##聯通性異常
probe_success == 1 ##聯通性正常
告警也是判斷這個指標是否等於0,如等於0 則觸發異常報警
[[email protected] prometheus]$ cat rules/blackbox-alert.rules
groups:
- name: blackbox_network_stats
rules:
- alert: blackbox_network_stats
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} is down"
description: "This requires immediate action!"
參考:https://www.tidb.cc/Monitor/170603-Blackbox_exporter.html#告警測試案例
SSL 證書過期時間監測
cat << 'EOF' > prometheus.yml
rule_files:
- ssl_expiry.rules
scrape_configs:
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- example.com # Target to probe
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115 # Blackbox exporter.
EOF
cat << 'EOF' > ssl_expiry.rules
groups:
- name: ssl_expiry.rules
rules:
- alert: SSLCertExpiringSoon
expr: probe_ssl_earliest_cert_expiry{job="blackbox"} - time() < 86400 * 30
for: 10m
EOF