Prometheus+alertmanage配置郵件報警
阿新 • • 發佈:2021-11-29
架構圖
1,官方下載編譯安裝
https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
開機自啟配置
root@ceph-teamplate:/apps# cat /lib/systemd/system/alertmanager.service [Unit] Description=alertmanager After=network.target [Service] ExecStart=/apps/alertmanager/alertmanager WorkingDirectory=/apps/alertmanager Restart=on-failure [Install] WantedBy=multi-user.target
2,進入解壓後的alertmanager資料夾,修改alertmanager.yml檔案,配置報警資訊,alertmanager.yml 內容如下:
root@blackbox_exporter-189:/apps/alertmanager# cat alertmanager.yml global: resolve_timeout: 5m smtp_from: '[email protected]' #用於傳送郵件的郵箱 smtp_smarthost: 'smtp.qq.com:465' smtp_auth_username: '[email protected]' #郵箱地址 smtp_auth_password: 'xxxx' #郵箱授權密碼 smtp_require_tls: false smtp_hello: 'qq.com' route: #設定報警分發策略 group_by: ['alertname'] #分組標籤 group_wait: 8s #告警等待時間,告警產生後等待8s,如果有相同告警一起發出 group_interval: 3s #兩組告警間隔時間 repeat_interval: 2m #重複告警的間隔時間,減少相同郵件傳送頻率,此處測試設定為2分鐘 receiver: 'email' #預設接收者
#routes: #指定哪些組可以接受訊息
#- receiver: mail receivers: - name: 'email' email_configs: - to: '[email protected]' 接受報警郵箱地址 send_resolved: true #inhibit_rules: # - source_match: # severity: 'critical' # target_match: # severity: 'warning' # equal: ['alertname', 'dev', 'instance']
檢查alertmanager.yml 配置是否正確
root@blackbox_exporter-189:/apps/alertmanager# ./amtool check-config alertmanager.yml
Checking 'alertmanager.yml' SUCCESS
Found:
- global config
- route
- 0 inhibit rules
- 1 receivers
- 0 templates
3,瀏覽器訪問: http://192.168.192.182:9090/rules:9093 (IP:9093)
4,進入Prometheus的安裝目錄下修改Prometheus配置,取消alertmanager有關注釋
root@pro182:/apps/prometheus# cat prometheus.yml # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 192.168.192.189:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "/apps/prometheus/*.yaml" #配置告警規則 # - "first_rules.yml" # - "second_rules.yml"
編寫告警規則檔案rule.yml
(此處用於測試,設定為當記憶體佔用高於1%時,就會告警)
root@pro182:/apps/prometheus# cat neicun.yaml groups: - name: mem-rule rules: - alert: "記憶體報警" expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 1 for: 5s labels: severity: warning annotations: summary: "服務名:{{$labels.alertname}} 記憶體報警" description: "{{ $labels.alertname }} 記憶體資源利用率大於 1%" value: "{{ $value }}"
瀏覽器訪問http://192.168.192.189:9093/#/alerts ,也能看到告警資訊
aa