Kafka監控工具彙總

阿新 • • 發佈：2019-08-23

對於大資料叢集來說，監控功能是非常必要的，通過日誌判斷故障低效，我們需要完整的指標來幫我們管理Kafka叢集。本文討論Kafka的監控以及一些常用的第三方監控工具。

一、Kafka Monitoring

首先介紹kafka的監控原理，第三方工具也是通過這些來進行監控的，我們也可以自己去是實現監控，官網關於監控的文件地址如下：

http://kafka.apache.org/documentation/#monitoring](http://kafka.apache.org/documentation/#monitoring)

kafka使用Yammer Metrics進行監控，Yammer Metrics是一個java的監控庫。

kafka預設有很多的監控指標，預設都使用JMX介面遠端訪問，具體方法是在啟動broker和clients之前設定JMX_PORT：

JMX_PORT=9997 bin/kafka-server-start.sh config/server.properties

Kafka的每個監控指標都是以JMX MBEAN的形式定義的，MBEAN是一個被管理的資源例項。

我們可以使用Jconsole （Java Monitoring and Management Console），一種基於JMX的視覺化監視、管理工具。

來視覺化監控的結果：

圖2 Jconsole

隨後在Mbean下可以找到各種kafka的指標。

Mbean的命名規範是 kafka.xxx:type=xxx,xxx=xxx

主要分為以下幾類：

（監控指標較多，這裡只擷取部分，具體請檢視官方文件）

Graphing and Alerting 監控:

kafka.server為伺服器相關，kafka.network為網路相關。

Description	Mbean name	Normal value
Message in rate	kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
Byte in rate from clients	kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
Byte in rate from other brokers	kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec
Request rate	kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce\|FetchConsumer\|FetchFollower}
Error rate	kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=([-.\w]+),error=([-.\w]+)	Number of errors in responses counted per-request-type, per-error-code. If a response contains multiple errors, all are counted. error=NONE indicates successful responses.

Common monitoring metrics for producer/consumer/connect/streams監控：

kafka執行過程中的監控。

Metric/Attribute name	Description	Mbean name
connection-close-rate	Connections closed per second in the window.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)
connection-close-total	Total connections closed in the window.	kafka.[producer\|consumer\|connect]:type=[producer\|consumer\|connect]-metrics,client-id=([-.\w]+)

Common Per-broker metrics for producer/consumer/connect/streams監控：

每一個broker的監控。

Metric/Attribute name	Description	Mbean name
outgoing-byte-rate	The average number of outgoing bytes sent per second for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
outgoing-byte-total	The total number of outgoing bytes sent for a node.	kafka.[producer\|consumer\|connect]:type=[consumer\|producer\|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)

Producer監控：

producer呼叫過程中的監控。

Metric/Attribute name	Description	Mbean name
waiting-threads	The number of user threads blocked waiting for buffer memory to enqueue their records.	kafka.producer:type=producer-metrics,client-id=([-.\w]+)
buffer-total-bytes	The maximum amount of buffer memory the client can use (whether or not it is currently used).	kafka.producer:type=producer-metrics,client-id=([-.\w]+)
buffer-available-bytes	The total amount of buffer memory that is not being used (either unallocated or in the free list).	kafka.producer:type=producer-metrics,client-id=([-.\w]+)
bufferpool-wait-time	The fraction of time an appender waits for space allocation.	kafka.producer:type=producer-metrics,client-id=([-.\w]+)

Consumer監控：

consumer呼叫過程中的監控。

Metric/Attribute name	Description	Mbean name
commit-latency-avg	The average time taken for a commit request	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-latency-max	The max time taken for a commit request	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-rate	The number of commit calls per second	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-total	The total number of commit calls	kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)

Connect監控：

	Attribute name	Description
	connector-count	The number of connectors run in this worker.
	connector-startup-attempts-total	The total number of connector startups that this worker has attempted.

Streams 監控：

Metric/Attribute name	Description	Mbean name
commit-latency-avg	The average execution time in ms for committing, across all running tasks of this thread.	kafka.streams:type=stream-metrics,client-id=([-.\w]+)
commit-latency-max	The maximum execution time in ms for committing across all running tasks of this thread.	kafka.streams:type=stream-metrics,client-id=([-.\w]+)
poll-latency-avg	The average execution time in ms for polling, across all running tasks of this thread.	kafka.streams:type=stream-metrics,client-id=([-.\w]+)

這些指標涵蓋了我們使用kafka過程中的各種情況，還有kafka.log記錄日誌資訊。每一個Mbean下都有具體的引數。

通過這些引數，比如出站進站速率，ISR變化速率，Producer端的batch大小，執行緒數，Consumer端的延時大小，流速等等，當然我們也要關注JVM，還有OS層面的監控，這些都有通用的工具，這裡不做贅述。

kafka的監控原理已經基本瞭解，其他第三方監控工具也大部分是在這個層面進行的完善，下面來介紹幾款主流的監控工具。

二、JmxTool

JmxTool並不是一個框架，而是Kafka預設提供的一個工具，用於實時檢視JMX監控指標。。

開啟終端進入到Kafka安裝目錄下，輸入命令bin/kafka-run-class.sh kafka.tools.JmxTool便可以得到JmxTool工具的幫助資訊。

比如我們要監控入站速率，可以輸入命令：

bin/kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9997/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes FifteenMinuteRate --reporting-interval 5000

BytesInPerSec的值每5秒會列印在控制檯上:

>kafka_2.12-2.0.0 rrd$ bin/kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9997/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes FifteenMinuteRate --reporting-interval 5000

Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://:9997/jmxrmi.

"time","kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec:FifteenMinuteRate"

2018-08-10 14:52:15,784224.2587058166

2018-08-10 14:52:20,1003401.2319497257

2018-08-10 14:52:25,1125080.6160773218

2018-08-10 14:52:30,1593394.1860063889

三、Kafka-Manager

雅虎公司2015年開源的kafka監控框架，使用scala編寫。github地址如下：https://github.com/yahoo/kafka-manager

使用條件：

Kafka 0.8.. or 0.9.. or 0.10.. or 0.11..
Java 8+

下載kafka-manager

配置：conf/application.conf

kafka-manager.zkhosts="my.zookeeper.host.com:2181,other.zookeeper.host.com:2181"

部署：這裡要用到sbt部署

./sbt clean dist

啟動：

 bin/kafka-manager
 指定埠：
 $ bin/kafka-manager -Dconfig.file=/path/to/application.conf -Dhttp.port=8080
 許可權：
 $ bin/kafka-manager -Djava.security.auth.login.config=/path/to/my-jaas.conf

隨後訪問local host:8080

就可以看到監控頁面了：

圖 topic

圖 broker

頁面非常的簡潔，也有很多豐富的功能，開源免費，推薦使用，只是目前版本支援到Kafka 0.8.. or 0.9.. or 0.10.. or 0.11，需要特別注意。

四、kafka-monitor

linkin開源的kafka監控框架，github地址如下：https://github.com/linkedin/kafka-monitor

基於 Gradle 2.0以上版本，支援java 7和java 8.

支援kafka從0.8-2.0，使用者可根據需求下載不同分支即可。

使用：

編譯：

$ git clone https://github.com/linkedin/kafka-monitor.git
$ cd kafka-monitor 
$ ./gradlew jar

修改配置：config/kafka-monitor.properties

"zookeeper.connect" = "localhost:2181"

啟動：

$ ./bin/kafka-monitor-start.sh config/kafka-monitor.properties
單叢集啟動：
$ ./bin/single-cluster-monitor.sh --topic test --broker-list localhost:9092 --zookeeper localhost:2181
多叢集啟動：
$ ./bin/kafka-monitor-start.sh config/multi-cluster-monitor.properties

隨後訪問localhost:8080 看到監控頁面

圖 kafkamonitor

同時我們還可以通過http請求查詢其他指標：

curl localhost:8778/jolokia/read/kmf.services:type=produce-service,name=*/produce-availability-avg

總體來說，他的web功能比較簡單，使用者使用不多，http功能很有用，支援版本較多。

五、Kafka Offset Monitor

官網地址http://quantifind.github.io/KafkaOffsetMonitor/

github地址 https://github.com/quantifind/KafkaOffsetMonitor

使用：下載以後執行

java -cp KafkaOffsetMonitor-assembly-0.3.0.jar:kafka-offset-monitor-another-db-reporter.jar \
     com.quantifind.kafka.offsetapp.OffsetGetterWeb \
     --zk zk-server1,zk-server2 \
     --port 8080 \
     --refresh 10.seconds \
     --retain 2.days
     --pluginsArgs anotherDbHost=host1,anotherDbPort=555

隨後檢視localhost:8080

圖 offsetmonitor1

圖offsetmonitor2

這個專案更關注於對offset的監控，頁面很豐富，但是15年以後不再更新，無法支援最新版本kafka。繼續維護的版本地址如下https://github.com/Morningstar/kafka-offset-monitor。

六、Cruise-control

linkin於2017年8月開源了cruise-control框架，用於監控大規模叢集，包括一系列的運維功能，據稱在linkedin有著兩萬多臺的kafka叢集，專案還在持續更新中。

專案github地址：https://github.com/linkedin/cruise-control

使用：

下載
git clone https://github.com/linkedin/cruise-control.git && cd cruise-control/
編譯
./gradlew jar
修改 config/cruisecontrol.properties
bootstrap.servers   zookeeper.connect
啟動:
./gradlew jar copyDependantLibs
./kafka-cruise-control-start.sh [-jars PATH_TO_YOUR_JAR_1,PATH_TO_YOUR_JAR_2] config/cruisecontrol.properties [port]

啟動後訪問：

http://localhost:9090/kafkacruisecontrol/state

沒有頁面，所有都是用rest api的形式提供的。

介面列表如下：https://github.com/linkedin/cruise-control/wiki/REST-APIs

這個框架靈活性很大，使用者可以根據自己的情況來獲取各種指標優化自己的叢集。

七、Doctorkafka

DoctorKafka是Pinterest 開源 Kafka 叢集自愈和工作負載均衡工具。

Pinterest是一個進行圖片分享的社交站點。他們使用 Kafka 作為中心化的訊息傳輸工具，用於資料攝取、流處理等場景。隨著使用者數量的增加，Kafka 叢集也越來越龐大，對它的管理日趨複雜，並變成了運維團隊的沉重負擔，因此他們研發了 Kafka 叢集自愈和工作負載均衡工具 DoctorKafka，最近他們已經在GitHub上將該專案開源。

使用：

下載：
git clone [git-repo-url] doctorkafka
cd doctorkafka
編譯：
mvn package -pl kafkastats -am
啟動：
java -server \
    -Dlog4j.configurationFile=file:./log4j2.xml \
    -cp lib/*:kafkastats-0.2.4.8.jar \
    com.pinterest.doctorkafka.stats.KafkaStatsMain \
        -broker 127.0.0.1 \
        -jmxport 9999 \
        -topic brokerstats \
        -zookeeper zookeeper001:2181/cluster1 \
        -uptimeinseconds 3600 \
        -pollingintervalinseconds 60 \
        -ostrichport 2051 \
        -tsdhostport localhost:18126 \
        -kafka_config /etc/kafka/server.properties \
        -producer_config /etc/kafka/producer.properties \
        -primary_network_ifacename eth0

頁面如下：

圖dockerkafka

DoctorKafka 在啟動之後，會階段性地檢查每個叢集的狀態。當探測到 broker 出現故障時，它會將故障 broker 的工作負載轉移給有足夠頻寬的 broker。如果在叢集中沒有足夠的資源進行重分配的話，它會發出告警。屬於一個自動維護叢集健康的框架。

八、Burrow

Burrow是LinkedIn開源的一款專門監控consumer lag的框架。

github地址如下：https://github.com/linkedin/Burrow

使用Burrow監控kafka, 不需要預先設定lag的閾值, 他完全是基於消費過程的動態評估

Burrow支援讀取kafka topic和,zookeeper兩種方式的offset，對於新老版本kafka都可以很好支援

Burrow支援http, email型別的報警

Burrow預設只提供HTTP介面(HTTP endpoint)，資料為json格式，沒有web UI。

安裝使用:

$ Clone github.com/linkedin/Burrow to a directory outside of $GOPATH. Alternatively, you can export GO111MODULE=on to enable Go module.
$ cd to the source directory.
$ go mod tidy
$ go install

示例：

列出所有監控的Kafka叢集
curl -s http://localhost:8000/v3/kafka |jq
{
  "error": false,
  "message": "cluster list returned",
  "clusters": [
    "kafka",
    "kafka"
  ],
  "request": {
    "url": "/v3/kafka",
    "host": "kafka"
  }
}

其他的框架，還有kafka-web-console：https://github.com/claudemamo/kafka-web-console

kafkat：https://github.com/airbnb/kafkat

capillary:https://github.com/keenlabs/capillary

chaperone:https://github.com/uber/chaperone

還有很多，但是我們要結合自己的kafka版本情況進行選擇。

更多實時計算，Kafka等相關技術博文，歡迎關注實時流式計算

Kafka監控工具彙總

一、Kafka Monitoring

二、JmxTool

三、Kafka-Manager

四、kafka-monitor

五、Kafka Offset Monitor

六、Cruise-control

七、Doctorkafka

八、Burrow

Kafka監控工具彙總

Kafka監控工具kafka-monitor v0.1簡要介紹

一些好用的開源監控工具彙總

Kafka監控工具KafkaOffsetMonitor配置及使用

Kafka系列之-Kafka監控工具KafkaOffsetMonitor配置及使用

kafka 監控工具

Kafka監控工具KafkaOffsetMonitor

基於web的kafka監控工具KafkaOffsetMonitor（內部js和css已經本地化）

Kafka開源監控工具-Kafka Eagle

Kafka三款監控工具比較(轉)

docker-compose安裝kafka叢集及管理監控工具

kafka系列三、Kafka三款監控工具比較

kafka Manager監控工具的安裝與使用

centos7 kafka安裝並安裝web介面監控工具

Kafka三款監控工具比較

Linux終端下 dstat 監控工具

linux下安裝nmon監控工具

CentOS 7.x下部署和配置zabbix 3.2.x監控工具

Linux下cpu主板監控工具lm_sensors

2017年最佳開源網絡監控工具

Kafka監控工具彙總

一、Kafka Monitoring

二、JmxTool

三、Kafka-Manager

四、kafka-monitor

五、Kafka Offset Monitor

六、Cruise-control

七、Doctorkafka

八、Burrow

相關推薦