1. 程式人生 > >kafka問題解決:org.apache.kafka.common.errors.TimeoutException

kafka問題解決:org.apache.kafka.common.errors.TimeoutException

記錄使用kafka遇到的問題:
- 1.Caused by java.nio.channels.UnresolvedAddressException null
- 2.org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for t2-0: 30042 ms has passed since batch creation plus linger time

兩個報錯是同一個問題。

問題重現

java 生產者向kafka叢集生產訊息,報錯如下:

2018-06-03 00:10:02.071  INFO 80
--- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka version : 0.10.2.0 2018-06-03 00:10:02.071 INFO 80 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : 576d93a8dc0cf421 2018-06-03 00:10:32.253 ERROR 80 --- [ad | producer-1] o.s.k.support.LoggingProducerListener : Exception thrown when
sending a message with key='test1' and payload='hello122' to topic t2: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for t2-0: 30042 ms has passed since batch creation plus linger time

明顯連線kafka cluster超時,但是實際上並不知道具體是什麼原因。

排除找原因

  • 確定kafka叢集已經啟動,包括zookeeper、kafka叢集。可以通過命令
ps -ef | grep java

個人排查,kafka叢集即zookeeper已經成功啟動

  • 確定生產者程式一方的kafka配置是否有誤。
    spring boot的配置如下
spring.kafka.bootstrap-servers=39.108.61.252:9092
spring.kafka.consumer.group-id=springboot-group1
spring.kafka.consumer.auto-offset-reset=earliest

自然也沒有問題

  • 確定kafka叢集所在機器防火牆或者說安全組是否已經開放相關埠。
    若是在windows上,開啟telnet工具可以檢視是否埠開放被監聽

cmd進入命令列

telnet 39.108.61.252 9092

執行後成功連線,說明問題在程式所在的一方。


  • 開啟debug日誌級別看錯誤詳細資訊

spring boot 的設定方式是:
logging.level.root=debug
然後重啟應用
發現後臺在不停的刷錯誤
如下:

2018-06-03 00:22:37.703 DEBUG 5972 --- [       t1-0-C-1] org.apache.kafka.clients.NetworkClient   : Initialize connection to node 0 for sending metadata request
2018-06-03 00:22:37.703 DEBUG 5972 --- [       t1-0-C-1] org.apache.kafka.clients.NetworkClient   : Initiating connection to node 0 at izwz9c79fdwp9sb65vpyk3z:9092.
2018-06-03 00:22:37.703 DEBUG 5972 --- [       t1-0-C-1] org.apache.kafka.clients.NetworkClient   : Error connecting to node 0 at izwz9c79fdwp9sb65vpyk3z:9092:

java.io.IOException: Can't resolve address: izwz9c79fdwp9sb65vpyk3z:9092
    at org.apache.kafka.common.network.Selector.connect(Selector.java:182) ~[kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:629) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:57) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:768) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:684) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:347) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:203) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:138) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:216) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:193) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:275) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) [kafka-clients-0.10.2.0.jar:na]
    at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:558) [spring-kafka-1.2.2.RELEASE.jar:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_144]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_144]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_144]
Caused by: java.nio.channels.UnresolvedAddressException: null
    at sun.nio.ch.Net.checkAddress(Net.java:101) ~[na:1.8.0_144]
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622) ~[na:1.8.0_144]
    at org.apache.kafka.common.network.Selector.connect(Selector.java:179) ~[kafka-clients-0.10.2.0.jar:na]
    ... 17 common frames omitted

可知建立socket時不能解析kafka所在伺服器地址
檢視日誌可知,解析的地址是izwz9c79fdwp9sb65vpyk3z
這個地址是遠端伺服器的例項名稱(阿里雲伺服器)。自己配置的明明是ip,程式內部卻去獲取他的別名,那如果生產者所在機器上沒有配置這個ip的別名,就不能解析到對應的ip,所以連線失敗報錯。

解決

windows則去新增一條host對映

C:\Windows\System32\drivers\etc\hosts

39.108.61.252 izwz9c79fdwp9sb65vpyk3z
127.0.0.1 localhost

linux則

vi /etc/hosts 修改方式一致

此時重啟生產者應用
日誌如下部分,成功啟動,後臺會一直在心跳檢測連線和更新offset,所以debug日誌一直在刷,此時就可以把日誌級別修改為info

2018-06-03 00:29:46.543 DEBUG 12772 --- [       t2-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : Group springboot-group1 committed offset 10 for partition t2-0
2018-06-03 00:29:46.543 DEBUG 12772 --- [       t2-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : Completed auto-commit of offsets {t2-0=OffsetAndMetadata{offset=10, metadata=''}} for group springboot-group1
2018-06-03 00:29:46.563 DEBUG 12772 --- [       t1-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : Group springboot-group1 committed offset 0 for partition t1-0
2018-06-03 00:29:46.563 DEBUG 12772 --- [       t1-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : Completed auto-commit of offsets {t1-0=OffsetAndMetadata{offset=0, metadata=''}} for group springboot-group1
2018-06-03 00:29:46.672 DEBUG 12772 --- [       t2-0-C-1] o.a.k.c.consumer.internals.Fetcher       : Sending fetch for partitions [t2-0] to broker izwz9c79fdwp9sb65vpyk3z:9092 (id: 0 rack: null)
2018-06-03 00:29:46.867 DEBUG 12772 --- [       t1-0-C-1] essageListenerContainer$ListenerConsumer : Received: 0 records
2018-06-03 00:29:46.872 DEBUG 12772 --- [       t2-0-C-1] essageListenerContainer$ListenerConsumer : Received: 0 records

驗證

進行生產資料,成功執行!