部署kube-prometheus,添加郵件報警
阿新 • • 發佈:2018-01-14
cpu time sbt dashboard hack ont png 準備 blog
這個項目出自coreos,已經存在很久了,第一次嘗試的時候還很簡陋,現在完善了很多。
項目提供了一鍵部署腳本,跑起來並不難,不過個人感覺要真正掌握並靈活使用並不是很容易。
kube version: 1.9.1
OS version: debian stretch
1、從github把項目拉下來。
# git clone https://github.com/coreos/prometheus-operator.git
2、準備鏡像,這裏用到了墻外的鏡像,自行科學下載。
quay.io/prometheus/alertmanager:v0.9.1 quay.io/coreos/configmap-reload:v0.0.1 grafana/grafana:4.5.2 quay.io/coreos/grafana-watcher:v0.0.8 quay.io/coreos/kube-state-metrics:v1.0.1 gcr.io/google_containers/addon-resizer:1.0 quay.io/prometheus/node-exporter:v0.15.0 quay.io/prometheus/prometheus:v2.0.0 quay.io/coreos/prometheus-config-reloader:v0.0.2 quay.io/coreos/prometheus-operator:v0.15.0
3、執行腳本部署項目。
腳本內容如下:
#!/usr/bin/env bash if [ -z "${KUBECONFIG}" ]; then export KUBECONFIG=~/.kube/config fi # CAUTION - setting NAMESPACE will deploy most components to the given namespace # however some are hardcoded to ‘monitoring‘. Only use if you have reviewed all manifests. if [ -z "${NAMESPACE}" ]; then NAMESPACE=monitoring fi kubectl create namespace "$NAMESPACE" kctl() { kubectl --namespace "$NAMESPACE" "$@" } kctl apply -f manifests/prometheus-operator # Wait for CRDs to be ready. printf "Waiting for Operator to register custom resource definitions..." until kctl get customresourcedefinitions servicemonitors.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done until kctl get customresourcedefinitions prometheuses.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done until kctl get customresourcedefinitions alertmanagers.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done until kctl get servicemonitors.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done until kctl get prometheuses.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done until kctl get alertmanagers.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done echo "done!" kctl apply -f manifests/node-exporter kctl apply -f manifests/kube-state-metrics kctl apply -f manifests/grafana/grafana-credentials.yaml kctl apply -f manifests/grafana find manifests/prometheus -type f ! -name prometheus-k8s-roles.yaml ! -name prometheus-k8s-role-bindings.yaml -exec kubectl --namespace "$NAMESPACE" apply -f {} \; kubectl apply -f manifests/prometheus/prometheus-k8s-roles.yaml kubectl apply -f manifests/prometheus/prometheus-k8s-role-bindings.yaml kctl apply -f manifests/alertmanager/
從腳本上看,其實很簡單,有木有..
# cd contrib/kube-prometheus/ # hack/cluster-monitoring/deploy namespace "monitoring" created clusterrolebinding "prometheus-operator" created clusterrole "prometheus-operator" created serviceaccount "prometheus-operator" created service "prometheus-operator" created deployment "prometheus-operator" created Waiting for Operator to register custom resource definitions......done! daemonset "node-exporter" created service "node-exporter" created clusterrolebinding "kube-state-metrics" created clusterrole "kube-state-metrics" created deployment "kube-state-metrics" created rolebinding "kube-state-metrics" created role "kube-state-metrics-resizer" created serviceaccount "kube-state-metrics" created service "kube-state-metrics" created secret "grafana-credentials" created secret "grafana-credentials" unchanged configmap "grafana-dashboards-0" created deployment "grafana" created service "grafana" created servicemonitor "alertmanager" created servicemonitor "prometheus-operator" created prometheus "k8s" created servicemonitor "kubelet" created servicemonitor "prometheus" created service "prometheus-k8s" created servicemonitor "node-exporter" created servicemonitor "kube-scheduler" created servicemonitor "kube-controller-manager" created servicemonitor "kube-state-metrics" created configmap "prometheus-k8s-rules" created serviceaccount "prometheus-k8s" created servicemonitor "kube-apiserver" created role "prometheus-k8s" created role "prometheus-k8s" created role "prometheus-k8s" created clusterrole "prometheus-k8s" created rolebinding "prometheus-k8s" created rolebinding "prometheus-k8s" created rolebinding "prometheus-k8s" created clusterrolebinding "prometheus-k8s" created secret "alertmanager-main" created service "alertmanager-main" created alertmanager "main" created
4、由於事先準備好了鏡像,很快就運行起來了。
# kubectl get po -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 11h
alertmanager-main-1 2/2 Running 0 11h
alertmanager-main-2 2/2 Running 0 11h
grafana-6b67b479d5-2hhnp 2/2 Running 0 11h
kube-state-metrics-6f7b5c94f-r8hm7 2/2 Running 0 11h
node-exporter-27744 1/1 Running 0 11h
node-exporter-9vhlv 1/1 Running 0 11h
node-exporter-rhjfb 1/1 Running 0 11h
node-exporter-xpqr8 1/1 Running 0 11h
prometheus-k8s-0 2/2 Running 0 11h
prometheus-k8s-1 2/2 Running 0 11h
prometheus-operator-8697c7fff9-mm8v5 1/1 Running 0 11h
這裏曝光了三個服務:
- Prometheus UI on node port
30900
- Alertmanager UI on node port
30903
- Grafana on node port
30902
通過相應地端口就能訪問對應的服務。
5、添加controller-manager和scheduler的監控。
# kubectl apply -f -f manifests/k8s/ -n kube-system
這裏其實是添加了2個service,註意namespace是kube-system,而不是monitoring:
# kubectl get ep -n kube-system | grep discovery
kube-controller-manager-prometheus-discovery 192.168.5.104:10252,192.168.5.105:10252,192.168.5.42:10252 3d
kube-scheduler-prometheus-discovery 192.168.5.104:10251,192.168.5.105:10251,192.168.5.42:10251 3d
6、由於我的kubernetes是高可用架構,有三個apiserver,這裏有個bug有個需要處理。apiserver是無狀態的,三個endpoint會自身沖突,需要是在kubernetes1.9以上,通過給apiserver傳遞一個參數--endpoint-reconciler-type=lease
解決該問題。
默認是這個樣子:
解決完是這個樣子:
7、添加郵件報警規則。這個配置是用base64被加密的。
# vim manifests/alertmanager/alertmanager-config.yaml
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-main
data:
alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KICBzbXRwX3NtYXJ0aG9zdDogIm1haWwub3VwZW5nLmNvbToyNSIKICBzbXRwX2Zyb206ICJuYWdpb3NfbW9uaXRvckBvdXBlbmcuY29tIgogIHNtdHBfYXV0aF91c2VybmFtZTogIm5hZ2lvc19tb25pdG9yQG91cGVuZy5jb20iCiAgc210cF9hdXRoX3Bhc3N3b3JkOiAiZGVsbGRlbGwiCnJvdXRlOgogIGdyb3VwX2J5OiBbJ2FsZXJ0bmFtZScsICdjbHVzdGVyJywgJ3NlcnZpY2UnXQogIGdyb3VwX3dhaXQ6IDMwcwogIGdyb3VwX2ludGVydmFsOiA1bQogIHJlcGVhdF9pbnRlcnZhbDogM2gKICByZWNlaXZlcjogdGVhbS1YLW1haWxzCiAgcm91dGVzOgogIC0gbWF0Y2hfcmU6CiAgICAgIGFsZXJ0bmFtZTogXihob3N0X2NwdV91c2FnZXxub2RlX2ZpbGVzeXN0ZW1fZnJlZXxob3N0X2Rvd24pJAogICAgcmVjZWl2ZXI6IHRlYW0tWC1tYWlscwogICAgcm91dGVzOgogICAgLSBtYXRjaDoKICAgICAgICBzZXZlcml0eTogY3JpdGljYWwKICAgICAgcmVjZWl2ZXI6IHRlYW0tWC1tYWlscwpyZWNlaXZlcnM6Ci0gbmFtZTogInRlYW0tWC1tYWlscyIKICBlbWFpbF9jb25maWdzOgogIC0gdG86ICJuaG9yaXpvbi1zYUBvdXBlbmcuY29tIgo=
要拿到配置內容,用base64反解一下就可以了,默認配置是這樣:
# echo "加密的內容" | base64 -d
global:
resolve_timeout: 5m
route:
group_by: [‘job‘]
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: ‘null‘
routes:
- match:
alertname: DeadMansSwitch
receiver: ‘null‘
receivers:
- name: ‘null‘
添加自己的郵件設置:
global:
resolve_timeout: 5m
smtp_smarthost: "mail.xxxx.com:25"
smtp_from: "[email protected]"
smtp_auth_username: "[email protected]"
smtp_auth_password: "pass"
route:
group_by: [‘alertname‘, ‘cluster‘, ‘service‘]
group_wait: 30s
group_interval: 5m
repeat_interval: 15m
receiver: team-X-mails
routes:
- match_re:
alertname: ^(host_cpu_usage|node_filesystem_free|host_down)$
receiver: team-X-mails
routes:
- match:
severity: critical
receiver: team-X-mails
receivers:
- name: "team-X-mails"
email_configs:
- to: "[email protected]"
定義好之後,用base64加密一下,替換掉之前的地方。然後應用配置:
# kubectl apply -f manifests/alertmanager/alertmanager-config.yaml -n monitoring
這時候就能收到報警了:
8、到這裏部署就完成了,貼一下幾個頁面:
grafana
prometheus
alertmanager
部署kube-prometheus,添加郵件報警