Prometheus 企業微信報警/inhibit抑制 /靜默
阿新 • • 發佈:2022-12-13
建立企業微信應用
註冊企業微信:訪問https://work.weixin.qq.com/,註冊企業,隨便填,不需要認證
建立應用
建立告警配置
vim /usr/local/prometheus-2.1/rule2.yml groups: - name: cluster rules: - alert: HIGHCPU expr: (1-irate(node_cpu_seconds_total{mode="idle",job="export_test2"}[1m]))*100 > 10 for: 5s labels: for: 'highcpu' annotations: description: CPU MORE THAN 10% summary: 'cpu more than 10%'
在Prometheus的配置中新增以上規則
vim /usr/local/prometheus-2.1/prometheus.yml
rule_files:
- "/usr/local/prometheus-2.1/rule.yml"
- "/usr/local/prometheus-2.1/rule2.yml" #新增此規則
建立報警策略
vim /usr/local/alertmanager-0.15.2/alertmanager.yml global: wechat_api_corp_id: 'ww0cxxxxf5b5' wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/' wechat_api_secret: 'K4jHxxxxxxxL4_4Xj-lvQ' route: group_by: ['alertname'] group_wait: 10s group_interval: 5s repeat_interval: 10s receiver: 'weixin' routes: - receiver: 'weixin' match: severity: 'critical' - receiver: 'weixin' match: for: 'highcpu' receivers: - name: 'weixin' wechat_configs: - send_resolved: true #告警恢復傳送通知 to_party: '1' agent_id: '1000003' corp_id: 'ww0cxxxxf5b5' api_url: 'https://qyapi.weixin.qq.com/cgi-bin/' api_secret: 'K4jHxxxxxxxL4_4Xj-lvQ'
corp_id :在企業微信中我的企業 --> 企業資訊 --> 企業ID
agent_id 與 api_secret :點選建立的應用Prometheus,可以看到AgentId 與 Secret
to_party:是指傳送資訊的部門ID
api_url : 企業微信地址
重啟prometheus 與 alertmanager 服務
測試
在被監控機上拉高cpu
cat /dev/urandom | md5sum
企業微信收到告警資訊
[FIRING:1] HIGHCPU (0 highcpu node2 export_test2 idle) CPU MORE THAN 10% cpu more than 10% Alerts Firing: Labels: - alertname = HIGHCPU - cpu = 0 - for = highcpu - instance = node2 - job = export_test2 - mode = idle Annotations: - description = CPU MORE THAN 10% - summary = cpu more than 10% Source: http://centos1.com:9090/graph?g0.expr=%281+-+irate%28node_cpu_seconds_total%7Bjob%3D%22export_test2%22%2Cmode%3D%22idle%22%7D%5B1m%5D%29%29+%2A+100+%3E+10&g0.tab=1 AlertmanagerUrl: http://centos1.com:9093/#/alerts?receiver=weixin
cpu恢復,收到通知資訊
[RESOLVED] HIGHCPU (0 highcpu node2 export_test2 idle)
CPU MORE THAN 10% cpu more than 10%
Alerts Resolved:
Labels:
- alertname = HIGHCPU
- cpu = 0
- for = highcpu
- instance = node2
- job = export_test2
- mode = idle
Annotations:
- description = CPU MORE THAN 10%
- summary = cpu more than 10%
Source: http://centos1.com:9090/graph?g0.expr=%281+-+irate%28node_cpu_seconds_total%7Bjob%3D%22export_test2%22%2Cmode%3D%22idle%22%7D%5B1m%5D%29%29+%2A+100+%3E+10&g0.tab=1
AlertmanagerUrl:
http://centos1.com:9093/#/alerts?receiver=weixin
注:如果你的企業微信收不到告警資訊,並堅信配置沒有問題,那麼可以重新註冊一個企業微信試試。
抑制規則試用
注:本文中配置抑制的兩個監控項沒有直接邏輯聯絡,純屬測試抑制功能
新增新的告警配置
vim /usr/local/prometheus-2.1/rule2.yml 在尾部新增以下配置
- name: test
rules:
- alert: go_goroutines
expr: go_goroutines{instance="node2",job="export_test2"} > 5
for: 10s
labels:
severity: 'warning'
annotations:
description: go_goroutines > 5
新增以上規則的通知方式與抑制配置
vim /usr/local/alertmanager-0.15.2/alertmanager.yml
global:
wechat_api_corp_id: 'ww0xxxxxxx5b5'
wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
wechat_api_secret: 'K4jH8xxxxxxxxxxxXj-lvQ'
#templates:
# - '/alertmanager/template/wechat.tmpl'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 5s
repeat_interval: 10s
receiver: 'weixin'
routes:
- receiver: 'weixin'
match:
severity: 'critical'
- receiver: 'weixin'
match:
for: 'highcpu'
- receiver: 'weixin' #新新增通知方式(三行)
match:
severity: 'warning'
receivers:
- name: 'weixin'
wechat_configs:
- send_resolved: true
to_party: '1'
agent_id: '1000003'
corp_id: 'ww0xxxxxxxxx5b5'
api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
api_secret: 'K4jH8xxxxxxxxxxxxxxxxxxxXj-lvQ'
inhibit_rules: #新新增抑制規則
- source_match:
for: 'highcpu'
target_match:
severity: 'warning'
equal: ['instance','job']
實現效果為:當cpu與go_goroutines都滿足告警條件,cpu發出告警,go_goroutines被抑制
當已經發送的告警通知匹配到target_match和target_match_re規則,當有新的告警規則如果滿足source_match或者定義的匹配規則,並且以傳送的告警與新產生的告警中equal定義的標籤完全相同,則啟動抑制機制,新的告警不會發送。
靜默規則試用
在alertmanager的web介面建立臨時靜默規則,將label為 for:highcpu 的告警靜默,web介面 silences --> New Silence
注:以下start與end使用的是UTC時間,比北京時間晚8h
新增靜默規則後在指定時間段內不再收到label為 for:highcpu 的告警資訊。