Alertmanager釘釘告警
阿新 • • 發佈:2021-11-23
將釘釘接入 Prometheus AlertManager WebHook
- 建立釘釘自定義機器人
釘釘點選頭像,機器人--> 新增機器人,進入後新增“自定義機器人”,然後按要求操作,獲取 Webhook 和 加密串
測試釘釘告警機器人:
curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
- 部署釘釘告警外掛
# 下載釘釘告警外掛 cd /opt/alertmanager/ curl -O -L https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.0.0/prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz # 解壓並重命名 tar -zxvf prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz mv prometheus-webhook-dingtalk-2.0.0.linux-amd64/ prometheus-webhook-dingtalk chown -R prometheus:prometheus *
- 修改釘釘告警外掛的配置檔案
cd prometheus-webhook-dingtalk
cp config.example.yml config.yml
chown -R prometheus:prometheus *
vim config.yml
targets: webhook1: url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # secret for signature secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
- 啟動釘釘告警外掛
cd /opt/alertmanager/prometheus-webhook-dingtalk
nohup ./prometheus-webhook-dingtalk --config.file=./config.yml &
- 配置系統啟動
vim /usr/lib/systemd/system/prometheus-webhook-dingtalk.service
[Unit]
Description=prometheus-webhook-dingtalk
After=network-online.target
[Service]
Restart=on-failure
ExecStart=/opt/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/opt/alertmanager/prometheus-webhook-dingtalk/config.yml
[Install]
WantedBy=multi-user.target
#命令列啟動
systemctl daemon-reload
systemctl start prometheus-webhook-dingtalk
systemctl status prometheus-webhook-dingtalk
systemctl enable prometheus-webhook-dingtalk
ss -tnl | grep 8060
journalctl -u prometheus-webhook-dingtalk -fn 200
- 測試
curl http://localhost:8060/dingtalk/webhook1/send -H 'Content-Type: application/json' -d '{"msgtype": "text","text": {"content": "監控告警"}}'
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m #如果告警成功,每隔10min傳送一次
#receiver: 'web.hook'
receiver: 'dingtalk'
#receivers:
#- name: 'web.hook'
# webhook_configs:
# - url: 'http://127.0.0.1:5001/'
receivers:
- name: 'dingtalk'
webhook_configs:
- url: 'http://localhost:8060/dingtalk/webhook1/send'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
【告警資料的狀態】:
1、Inactive
表示沒有達到告警的閾值,即expr表示式不成立。
2、Pending
表示達到了告警的閾值,即expr表示式成立了,但是未滿足告警的持續時間,即for的值。
3、Firing
已經達到閾值,且滿足了告警的持續時間。
經測試發現,如果同一個告警資料達到了Firing,那麼不會再次產生一個告警資料,除非該告警解決了。
eg:
比如:192.168.1.1:9080 這個服務的宕機時間超過了1分鐘,並且產生了一個Firing的告警資料,如果這臺機器沒有恢復,則不會再次產生相同的告警資料。