cAdvisor容器監控規則

阿新 • • 發佈：2020-09-01

其他說明參考host主機監控規則：https://www.cnblogs.com/sanduzxcvbnm/p/13589848.html

在prometheus主程式目錄下的rules目錄下新建docker.yml檔案，新增上如下內容，然後重啟prometheus。

groups:
- name:  Docker containers monitoring
  rules: 
  - alert: ContainerKilled
    expr: time() - container_last_seen > 60
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Container killed (instance {{ $labels.instance }})"
      description: "A container has disappeared\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: ContainerCpuUsage
    expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Container CPU usage (instance {{ $labels.instance }})"
      description: "Container CPU usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: ContainerMemoryUsage
    expr: (sum(container_memory_usage_bytes) BY (instance, name) / sum(container_spec_memory_limit_bytes) BY (instance, name) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Container Memory usage (instance {{ $labels.instance }})"
      description: "Container Memory usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: ContainerVolumeUsage
    expr: (1 - (sum(container_fs_inodes_free) BY (instance) / sum(container_fs_inodes_total) BY (instance)) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Container Volume usage (instance {{ $labels.instance }})"
      description: "Container Volume usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: ContainerVolumeIoUsage
    expr: (sum(container_fs_io_current) BY (instance, name) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Container Volume IO usage (instance {{ $labels.instance }})"
      description: "Container Volume IO usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: ContainerHighThrottleRate
    expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Container high throttle rate (instance {{ $labels.instance }})"
      description: "Container is being throttled\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: PgbouncerActiveConnectinos
    expr: pgbouncer_pools_server_active_connections > 200
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "PGBouncer active connectinos (instance {{ $labels.instance }})"
      description: "PGBouncer pools are filling up\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: PgbouncerErrors
    expr: increase(pgbouncer_errors_count{errmsg!="server conn crashed?"}[5m]) > 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "PGBouncer errors (instance {{ $labels.instance }})"
      description: "PGBouncer is logging errors. This may be due to a a server restart or an admin typing commands at the pgbouncer console.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: PgbouncerMaxConnections
    expr: rate(pgbouncer_errors_count{errmsg="no more connections allowed (max_client_conn)"}[1m]) > 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "PGBouncer max connections (instance {{ $labels.instance }})"
      description: "The number of PGBouncer client connections has reached max_client_conn.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: SidekiqQueueSize
    expr: sidekiq_queue_size{} > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Sidekiq queue size (instance {{ $labels.instance }})"
      description: "Sidekiq queue {{ $labels.name }} is growing\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: SidekiqSchedulingLatencyTooHigh
    expr: max(sidekiq_queue_latency) > 120
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Sidekiq scheduling latency too high (instance {{ $labels.instance }})"
      description: "Sidekiq jobs are taking more than 2 minutes to be picked up. Users may be seeing delays in background processing.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: ConsulServiceHealthcheckFailed
    expr: consul_catalog_service_node_healthy == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Consul service healthcheck failed (instance {{ $labels.instance }})"
      description: "Service: `{{ $labels.service_name }}` Healthcheck: `{{ $labels.service_id }}`\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: ConsulMissingMasterNode
    expr: consul_raft_peers < 3
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Consul missing master node (instance {{ $labels.instance }})"
      description: "Numbers of consul raft peers should be 3, in order to preserve quorum.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: ConsulAgentUnhealthy
    expr: consul_health_node_status{status="critical"} == 1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Consul agent unhealthy (instance {{ $labels.instance }})"
      description: "A Consul agent is down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

cAdvisor容器監控規則

其他說明參考host主機監控規則：https://www.cnblogs.com/sanduzxcvbnm/p/13589848.html 在prometheus主程式目錄下的rules目錄下新建docker.yml檔案，新增上如下內容，然後重啟prometheus。

Docker進階-容器監控cAdvisor+InfluxDB+Granfana

概述前面文章介紹使用docker compose組合應用並利用scale快速對容器進行擴容。由於docker compose啟動的服務都在同一臺宿主機上，對於一個宿主機上執行多個容器應用時，容器的執行情況如：CPU使用率、記憶體使用率

容器監控：cAdvisor

CAdvisor是Google開源的一款用於展示和分析容器執行狀態的視覺化工具。通過在主機上執行CAdvisor使用者可以輕鬆的獲取到當前主機上容器的執行統計資訊，並以圖表的形式向用戶展示。

容器監控實踐，從入門到放棄

相關文件 prometheus 入門到放棄 https://yunlzheng.gitbook.io/prometheus-book/ prometheus關於容器監控主要引數和指標

容器監控工具WeaveScope初步安裝，瞭解

Weave Scope是Docker和Kubernetes的視覺化和監視工具。它提供了自上而下的應用程式檢視以及整個基礎架構檢視，並允許您實時診斷將分散式容器化應用程式部署到雲提供商時遇到的任何問題。

容器監控原理

我們知道 Docker 是基於 Namespace、Cgroups 和聯合檔案系統實現的。其中 Cgroups 不僅可以用於容器資源的限制，還可以提供容器的資源使用率。無論何種監控方案的實現，底層資料都來源於 Cgroups。

grafan+cadvisor+prometheus監控docker

grafan+cadvisor+prometheus監控docker：執行cadvisor: docker run \\ --volume=/:/rootfs:ro \\ --volume=/var/run:/var/run:ro \\

docker容器監控系統

Cadvisor+InfluxDB+Grafana Cadvisor Cadvisor是檢測單節點資源資訊的工具，提供了一個http介面的查詢介面，可以和其他工具整合使用，Cadvisor既可以採集宿主機還可以採集容器的資源資料進行事實監控，包括，CPU，記

cAdvisor監控容器

部署指令碼 #!/bin/bash netstat -ntlp | grep 18080 && echo \"port 18080 used\" && exit

golang實現對docker容器心跳監控功能

自己寫的go程式放到線上本來編譯成二進位制扔上去就行啦，但是懷著一顆docker的心，最終還是將它放到docker容器中執行起來了，執行起來也ok，一個最小容器64M，統一管理起來也方便，但是畢竟是個線上長駐記憶體的服務

k8s 容器的資源需求，資源限制-監控-資源指標API及自定義指標API

POD資源：requests:需求，最低保障limits:限制，硬限制CPU：一顆邏輯CPU（一個核心）1=1000微核，millicores500m=0.5CPU記憶體：E、P、T、G、M、KEi、Pi、Ti、Gi、Mi、Ki、Qos：Guranteed：最高優先順序，確保、保證

使用 Sysdig Falco 和 Kubernetes 設定執行時容器安全監控

Falco是一個雲原生執行時安全系統，可與容器和原始 Linux 主機一起使用。它由Sysdig開發，是 Cloud Native Computing Foundation（雲原生計算基金會）的一個沙箱專案。Falco 的工作方式是檢視檔案更改、網路活動、程

k8s-3-容器雲監控系統

apollo小結課程目錄一、容器雲監控prometheus概述 https://prometheus.io/docs/introduction/overview/#官方文件

經典筆試題：監控容器元素的數量（採用CountDownLatch實現）

筆試題：實現一個容器，提供兩個方法：add,size寫兩個執行緒，執行緒1新增10個元素到容器中，執行緒2實現監控元素的個數，當個數到5時，執行緒2給出提示並結束

Prometheus監控主機和容器

CPU使用率我們從計算每個CPU模式的每秒速率開始。PromQL有一個名為irate的函式，用於計算距離向量中時間序列的每秒瞬時增長率。讓我們在``node_cpu_seconds_total`度量上使用irate函式。在查詢框中輸入:

容器雲平臺No.7~kubernetes監控系統prometheus-operator

簡介 prometheus-operator Prometheus：一個非常優秀的監控工具或者說是監控方案。它提供了資料蒐集、儲存、處理、視覺化和告警一套完整的解決方案。作為kubernetes官方推薦的監控系統，用Prometheus來監控kubernete

Day6-01 如何監控K8S容器中的PHP程序

一、概述在生產環境中我們希望能夠通過prometheus監控容器中php程序的狀態，剛好看到一個比較好的解決辦法，利用php-fpm-exporter對php-fpm進行監控，但想實現該需求需要具備以下條件：

第10章構建docker容器的監控系統

chua 　　在Docker用的越來越多的時代中，它的輕量級得到很多企業的青睞，所以在Docker容器部署服務及應用場景越來越多。

Docker監控容器資源的佔用情況【轉】

啟動一個容器並限制資源啟動一個centos容器，限制其記憶體為1G ，可用cpu數為2

容器網路（十）Docker 最常用的監控方案【73】

十、監控（一）Docker 最常用的監控方案當 Docker 部署規模逐步變大後，視覺化監控容器環境的效能和健康狀態將會變得越來越重要。

cAdvisor容器監控規則

相關推薦