kafka問題解決:org.apache.kafka.common.errors.TimeoutException
記錄使用kafka遇到的問題:
- 1.Caused by java.nio.channels.UnresolvedAddressException null
- 2.org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for t2-0: 30042 ms has passed since batch creation plus linger time
兩個報錯是同一個問題。
問題重現
java 生產者向kafka叢集生產訊息,報錯如下:
2018-06-03 00:10:02.071 INFO 80 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka version : 0.10.2.0
2018-06-03 00:10:02.071 INFO 80 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : 576d93a8dc0cf421
2018-06-03 00:10:32.253 ERROR 80 --- [ad | producer-1] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='test1' and payload='hello122' to topic t2:
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for t2-0: 30042 ms has passed since batch creation plus linger time
明顯連線kafka cluster超時,但是實際上並不知道具體是什麼原因。
排除找原因
- 確定kafka叢集已經啟動,包括zookeeper、kafka叢集。可以通過命令
ps -ef | grep java
個人排查,kafka叢集即zookeeper已經成功啟動
- 確定生產者程式一方的kafka配置是否有誤。
spring boot的配置如下
spring.kafka.bootstrap-servers=39.108.61.252:9092
spring.kafka.consumer.group-id=springboot-group1
spring.kafka.consumer.auto-offset-reset=earliest
自然也沒有問題
- 確定kafka叢集所在機器防火牆或者說安全組是否已經開放相關埠。
若是在windows上,開啟telnet
工具可以檢視是否埠開放被監聽
cmd進入命令列
telnet 39.108.61.252 9092
執行後成功連線,說明問題在程式所在的一方。
- 開啟debug日誌級別看錯誤詳細資訊
spring boot 的設定方式是:
logging.level.root=debug
然後重啟應用
發現後臺在不停的刷錯誤
如下:
2018-06-03 00:22:37.703 DEBUG 5972 --- [ t1-0-C-1] org.apache.kafka.clients.NetworkClient : Initialize connection to node 0 for sending metadata request
2018-06-03 00:22:37.703 DEBUG 5972 --- [ t1-0-C-1] org.apache.kafka.clients.NetworkClient : Initiating connection to node 0 at izwz9c79fdwp9sb65vpyk3z:9092.
2018-06-03 00:22:37.703 DEBUG 5972 --- [ t1-0-C-1] org.apache.kafka.clients.NetworkClient : Error connecting to node 0 at izwz9c79fdwp9sb65vpyk3z:9092:
java.io.IOException: Can't resolve address: izwz9c79fdwp9sb65vpyk3z:9092
at org.apache.kafka.common.network.Selector.connect(Selector.java:182) ~[kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:629) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:57) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:768) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:684) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:347) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:203) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:138) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:216) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:193) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:275) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030) [kafka-clients-0.10.2.0.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) [kafka-clients-0.10.2.0.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:558) [spring-kafka-1.2.2.RELEASE.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_144]
Caused by: java.nio.channels.UnresolvedAddressException: null
at sun.nio.ch.Net.checkAddress(Net.java:101) ~[na:1.8.0_144]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622) ~[na:1.8.0_144]
at org.apache.kafka.common.network.Selector.connect(Selector.java:179) ~[kafka-clients-0.10.2.0.jar:na]
... 17 common frames omitted
可知建立socket時不能解析kafka所在伺服器地址
檢視日誌可知,解析的地址是izwz9c79fdwp9sb65vpyk3z
這個地址是遠端伺服器的例項名稱(阿里雲伺服器)。自己配置的明明是ip,程式內部卻去獲取他的別名,那如果生產者所在機器上沒有配置這個ip的別名,就不能解析到對應的ip,所以連線失敗報錯。
解決
windows則去新增一條host對映
C:\Windows\System32\drivers\etc\hosts
39.108.61.252 izwz9c79fdwp9sb65vpyk3z
127.0.0.1 localhost
linux則
vi /etc/hosts 修改方式一致
此時重啟生產者應用
日誌如下部分,成功啟動,後臺會一直在心跳檢測連線和更新offset,所以debug日誌一直在刷,此時就可以把日誌級別修改為info
2018-06-03 00:29:46.543 DEBUG 12772 --- [ t2-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : Group springboot-group1 committed offset 10 for partition t2-0
2018-06-03 00:29:46.543 DEBUG 12772 --- [ t2-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : Completed auto-commit of offsets {t2-0=OffsetAndMetadata{offset=10, metadata=''}} for group springboot-group1
2018-06-03 00:29:46.563 DEBUG 12772 --- [ t1-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : Group springboot-group1 committed offset 0 for partition t1-0
2018-06-03 00:29:46.563 DEBUG 12772 --- [ t1-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : Completed auto-commit of offsets {t1-0=OffsetAndMetadata{offset=0, metadata=''}} for group springboot-group1
2018-06-03 00:29:46.672 DEBUG 12772 --- [ t2-0-C-1] o.a.k.c.consumer.internals.Fetcher : Sending fetch for partitions [t2-0] to broker izwz9c79fdwp9sb65vpyk3z:9092 (id: 0 rack: null)
2018-06-03 00:29:46.867 DEBUG 12772 --- [ t1-0-C-1] essageListenerContainer$ListenerConsumer : Received: 0 records
2018-06-03 00:29:46.872 DEBUG 12772 --- [ t2-0-C-1] essageListenerContainer$ListenerConsumer : Received: 0 records
驗證
進行生產資料,成功執行!