1. 程式人生 > >kafka報Commit cannot be completed since the group has already rebalanced and assigned the partitions

kafka報Commit cannot be completed since the group has already rebalanced and assigned the partitions

問題描述:
新版本的kafka訊息處理程式中,當訊息量特別大時不斷出現如下錯誤,並且多個相同groupId的消費者重複消費訊息。

2018-10-12 19:49:34,903 WARN [DESKTOP-8S2E5H7 id2-1-C-1] Caller+0 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$4.onComplete(ConsumerCoordinator.java:649)
Auto-commit of offsets {xxxTopic-5=OffsetAndMetadata{offset=359, metadata=’’}} failed for group My-Group-Name: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured

max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.

解決辦法:
分析:
1, 根據問題的描述,處理訊息的時間太長,優化訊息處理(整個訊息的處理時間有所減少),該告警有所減少,但是依然存在。
2,根據問題描述,將max.poll.records值設定為200(預設值是500),並增加了session timeout(session.timeout.ms=60000, 預設值是5000,也就是5s),檢測日誌,問題有所改善,但是依然存在

至於訊息被重複消費,這是因為傳送大consumer1(group.id=abc)時,訊息處理時間太長,而comsumer設定的是自動提交,因為不能在預設的自動提交時間內處理完畢,所以自動提交失敗,導致kafka認為個訊息沒有訊息成功,因此consumer2(group.id=abc,同一個group.id的多個消費例項)又獲得該訊息開始重新消費。可以通過檢視kafka中該topic對應的group的lag來驗證。

最終決絕辦法,增加auto.commit.interval.ms , 預設值是5000,增加到7000之後,同等kafka訊息量下,基本沒有了該告警訊息。
為什麼修改該引數,因為該告警的本質原因是, 訊息處理時間過長,不能在設定的自動提交間隔時間內完成訊息確認提交。

總結:
這只是我遇到該問題的解決辦法,純屬個人解決辦法。非官方提供的解決方法,僅供參考。