Kafka學習整理七(producer和consumer程式設計實踐)

阿新 • • 發佈：2019-01-11

實踐程式碼採用kafka-clients V0.10.0.0 編寫

一、編寫producer

第一步：使用./kafka-topics.sh 命令建立topic及partitions 分割槽數

./kafka-topics.sh --create--zookepper "172.16.49.173:2181" --topic "producer_test" --partitions 10 replication-factor 3

第二步：實現org.apache.kafka.clients.producer.Partitioner 分割槽介面，以實現自定義的訊息分割槽

import java.util.List;
import 
 java.util.Map;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.PartitionInfo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class MyPartition implements Partitioner {
    private static Logger LOG = LoggerFactory.getLogger(MyPartition.class);
    public 
 MyPartition() {
        // TODO Auto-generated constructor stub
    }

    @Override
    public void configure(Map<String, ?> configs) {
        // TODO Auto-generated method stub

    }

    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        // TODO Auto-generated method stub 

        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        int partitionNum = 0;
        try {
            partitionNum = Integer.parseInt((String) key);
        } catch (Exception e) {
            partitionNum = key.hashCode() ;
        }
        LOG.info("the message sendTo topic:"+ topic+" and the partitionNum:"+ partitionNum);
        return Math.abs(partitionNum  % numPartitions);
    }

    @Override
    public void close() {
        // TODO Auto-generated method stub

    }

}

第三步：編寫 producer

import java.util.Properties;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class PartitionTest {
    private static Logger LOG = LoggerFactory.getLogger(PartitionTest.class);

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        Properties props = new Properties();
        props.put("bootstrap.servers", "172.16.49.173:9092;172.16.49.173:9093");

        props.put("retries", 0);
        // props.put("batch.size", 16384);
        props.put("linger.ms", 1);
        // props.put("buffer.memory", 33554432);
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("partitioner.class", "com.goodix.kafka.MyPartition");
        KafkaProducer<String, String> producer = new KafkaProducer<String, String>(props);
        ProducerRecord<String, String> record = new ProducerRecord<String, String>("producer_test", "2223132132",
                "test23_60");
        producer.send(record, new Callback() {
            @Override
            public void onCompletion(RecordMetadata metadata, Exception e) {
                // TODO Auto-generated method stub
                if (e != null)
                    LOG.error("the producer has a error:" + e.getMessage());
                else {
                    LOG.info("The offset of the record we just sent is: " + metadata.offset());
                    LOG.info("The partition of the record we just sent is: " + metadata.partition());
                }

            }
        });
        try {
            Thread.sleep(1000);
            producer.close();
        } catch (InterruptedException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }

    }

}

備註：要先用命令建立topic及partitions 分割槽數;否則在自定義的分割槽中如果有大於1的情況下，傳送資料訊息到kafka時會報expired due to timeout while requesting metadata from brokers錯誤

二、使用Old Consumer High Level API編寫consumer

第一步：編寫具體處理訊息的類

import java.io.UnsupportedEncodingException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.message.MessageAndMetadata;


public class Consumerwork implements Runnable {
    private static Logger LOG = LoggerFactory.getLogger(Consumerwork.class);
     @SuppressWarnings("rawtypes")
    private KafkaStream m_stream;
     private int m_threadNumber;
     @SuppressWarnings("rawtypes")
    public Consumerwork(KafkaStream a_stream,int a_threadNumber) {
        // TODO Auto-generated constructor stub
         m_threadNumber = a_threadNumber;
         m_stream = a_stream;
    }

    @SuppressWarnings("unchecked")
    @Override
    public void run() {
        // TODO Auto-generated method stub
        ConsumerIterator<byte[], byte[]> it = m_stream.iterator();
           while (it.hasNext())
                try {
                    MessageAndMetadata<byte[], byte[]> thisMetadata=it.next();
                    String jsonStr = new String(thisMetadata.message(),"utf-8") ;
                    LOG.info("Thread " + m_threadNumber + ": " +jsonStr);
                    LOG.info("partion"+thisMetadata.partition()+",offset:"+thisMetadata.offset());
                    try {
                        Thread.sleep(1000);
                    } catch (InterruptedException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }
                } catch (UnsupportedEncodingException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
    }
}

第二步：編寫啟動Consumer主類

mport java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.Scanner;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
public class ConsumerGroup {
    private final ConsumerConnector consumer;
    private final String topic;
    private ExecutorService executor;
    private static Logger LOG = LoggerFactory.getLogger(ConsumerGroup.class);
    public ConsumerGroup(String a_zookeeper, String a_groupId, String a_topic) {
        consumer = kafka.consumer.Consumer.createJavaConsumerConnector(createConsumerConfig(a_zookeeper, a_groupId));
        this.topic = a_topic;
    }
    public static void main(String[] args) {
        Scanner sc = new Scanner(System.in);
        System.out.println("請輸入zookeeper叢集地址(如zk1:2181,zk2:2181,zk3:2181)：");
        String zooKeeper = sc.nextLine(); 
        System.out.println("請輸入指定的消費group名稱：");
        String groupId = sc.nextLine(); 
        System.out.println("請輸入指定的消費topic名稱：");
        String topic = sc.nextLine(); 
        System.out.println("請輸入指定的消費處理執行緒數：");
        int threads = sc.nextInt();
        LOG.info("Starting consumer kafka messages with zk:" + zooKeeper + " and the topic is " + topic);
        ConsumerGroup example = new ConsumerGroup(zooKeeper, groupId, topic);
        example.run(threads);

        try {
            Thread.sleep(1000);
        } catch (InterruptedException ie) {

        }
        // example.shutdown();
    }

    private void shutdown() {
        // TODO Auto-generated method stub
        if (consumer != null)
            consumer.shutdown();
        if (executor != null)
            executor.shutdown();
        try {
            if (!executor.awaitTermination(5000, TimeUnit.MILLISECONDS)) {
                LOG.info("Timed out waiting for consumer threads to shut down, exiting uncleanly");
            }
        } catch (InterruptedException e) {
            LOG.info("Interrupted during shutdown, exiting uncleanly");
        }
    }

    private void run(int a_numThreads) {
        // TODO Auto-generated method stub
        Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
        topicCountMap.put(topic, new Integer(a_numThreads));
        Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
        List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);

        // now launch all the threads
        //
        executor = Executors.newFixedThreadPool(a_numThreads);

        // now create an object to consume the messages
        //
        int threadNumber = 0;
        LOG.info("the streams size is "+streams.size());
        for (final KafkaStream stream : streams) {
            executor.submit(new com.goodix.kafka.oldconsumer.Consumerwork(stream, threadNumber));
    //      consumer.commitOffsets();
            threadNumber++;
        }

    }

    private ConsumerConfig createConsumerConfig(String a_zookeeper, String a_groupId) {
        // TODO Auto-generated method stub
        Properties props = new Properties();
        props.put("zookeeper.connect", a_zookeeper);
        props.put("group.id", a_groupId);
        props.put("zookeeper.session.timeout.ms", "60000");
        props.put("zookeeper.sync.time.ms", "200");
        props.put("auto.commit.interval.ms", "1000");
        props.put("auto.offset.reset", "smallest");
//      props.put("rebalance.max.retries", "5");
//      props.put("rebalance.backoff.ms", "15000");
        return new ConsumerConfig(props);
    }

}

1. topicCountMap.put(topic, new Integer(a_numThreads)) 是告訴Kafka我有多少個執行緒來處理訊息。

(1). 這個執行緒數必須是小等於topic的partition分割槽數；可以通過./kafka-topics.sh --describe --zookeeper "172.16.49.173:2181" --topic "producer_test"命令來檢視分割槽的情況
(2). kafka會根據partition.assignment.strategy指定的分配策略來指定執行緒消費那些分割槽的訊息；這裡沒有單獨配置該項即是採用的預設值range策略（按照階段平均分配）。比如分割槽有10個、執行緒數有3個，則執行緒 1消費0,1,2,3，執行緒2消費4,5,6,執行緒3消費7,8,9。另外一種是roundrobin(迴圈分配策略)，官方文件中寫有使用該策略有兩個前提條件的，所以一般不要去設定。
(3). 經過測試：consumerMap.get(topic).size()，應該是獲得的目前該topic有資料的分割槽數
(4). stream即指的是來自一個或多個伺服器上的一個或者多個partition的訊息。每一個stream都對應一個單執行緒處理。因此，client能夠設定滿足自己需求的stream數目。總之，一個stream也許代表了多個伺服器partion的訊息的聚合，但是每一個 partition都只能到一個stream

2. Executors.newFixedThreadPool(a_numThreads)是建立一個建立固定容量大小的緩衝池：每次提交一個任務就建立一個執行緒，直到執行緒達到執行緒池的最大大小。執行緒池的大小一旦達到最大值就會保持不變，如果某個執行緒因為執行異常而結束，那麼執行緒池會補充一個新執行緒。

3. props.put(“auto.offset.reset”, “smallest”) 是指定從最小沒有被消費offset開始；如果沒有指定該項則是預設的為largest，這樣的話該consumer就得不到生產者先產生的訊息。

4. 要使用old consumer API需要引用kafka_2.11以及kafka-clients。

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>0.10.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.0.0</version>
</dependency>

三、使用Old SimpleConsumerAPI編寫consumer

這是一個更加底層和複雜的API

使用的場景

由於使用該API需要自己控制的項比較多，也比較複雜，官方給出了一些合適的適用場景，也可以理解成為這些場景是High Level Consumer API 不能夠做到的

1. 針對一個訊息讀取多次
2. 在一個process中，僅僅處理一個topic中的一個partitions
3. 使用事務，確保每個訊息只被處理一次

需要處理的事情

1. 必須在程式中跟蹤offset值
2. 必須找出指定Topic Partition中的lead broker
3. 必須處理broker的變動

使用SimpleConsumer的步驟

首先，你必須知道讀哪個topic的哪個partition
然後，找到負責該partition的broker leader，從而找到存有該partition副本的那個broker
再者，自己去寫request並fetch資料
最終，還要注意需要識別和處理broker leader的改變

示例

package com.goodix.kafka.oldconsumer;
import kafka.api.FetchRequest;
import kafka.api.FetchRequestBuilder;
import kafka.api.PartitionOffsetRequestInfo;
import kafka.common.ErrorMapping;
import kafka.common.TopicAndPartition;
import kafka.javaapi.*;
import kafka.javaapi.consumer.SimpleConsumer;
import kafka.message.MessageAndOffset;

import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Scanner;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class SimpleExample {
    private static Logger LOG = LoggerFactory.getLogger(SimpleExample.class);
    public static void main(String args[]) {
        SimpleExample example = new SimpleExample();
        Scanner sc = new Scanner(System.in);
        System.out.println("請輸入broker節點的ip地址(如172.16.49.173)");
        String brokerIp = sc.nextLine(); 
        List<String> seeds = new ArrayList<String>();
        seeds.add(brokerIp);
        System.out.println("請輸入broker節點埠號(如9092)");
        int port = Integer.parseInt( sc.nextLine());
        System.out.println("請輸入要訂閱的topic名稱(如test)");
        String topic = sc.nextLine();
        System.out.println("請輸入要訂閱要查詢的分割槽(如0)");
        int partition = Integer.parseInt( sc.nextLine());
        System.out.println("請輸入最大讀取訊息數量(如10000)");
        long maxReads = Long.parseLong( sc.nextLine());

        try {
            example.run(maxReads, topic, partition, seeds, port);
        } catch (Exception e) {
            LOG.error("Oops:" + e);
             e.printStackTrace();
        }
    }

    private List<String> m_replicaBrokers = new ArrayList<String>();

    public SimpleExample() {
        m_replicaBrokers = new ArrayList<String>();
    }

    public void run(long a_maxReads, String a_topic, int a_partition, List<String> a_seedBrokers, int a_port) throws Exception {
        // find the meta data about the topic and partition we are interested in
        //獲取指定Topic partition的元資料  
        PartitionMetadata metadata = findLeader(a_seedBrokers, a_port, a_topic, a_partition);
        if (metadata == null) {
            LOG.error("Can't find metadata for Topic and Partition. Exiting");
            return;
        }
        if (metadata.leader() == null) {
            LOG.error("Can't find Leader for Topic and Partition. Exiting");
            return;
        }
        String leadBroker = metadata.leader().host();
        String clientName = "Client_" + a_topic + "_" + a_partition;
        SimpleConsumer consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);
        long readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.EarliestTime(), clientName);

        int numErrors = 0;
        while (a_maxReads > 0) {
            if (consumer == null) {
                consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);
            }
            FetchRequest req = new FetchRequestBuilder()
                    .clientId(clientName)
                    .addFetch(a_topic, a_partition, readOffset, 100000) // Note: this fetchSize of 100000 might need to be increased if large batches are written to Kafka
                    .build();
            FetchResponse fetchResponse = consumer.fetch(req);

            if (fetchResponse.hasError()) {
                numErrors++;
                // Something went wrong!
                short code = fetchResponse.errorCode(a_topic, a_partition);
                LOG.error("Error fetching data from the Broker:" + leadBroker + " Reason: " + code);
                if (numErrors > 5) break;
                if (code == ErrorMapping.OffsetOutOfRangeCode())  {
                    // We asked for an invalid offset. For simple case ask for the last element to reset
                    readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.LatestTime(), clientName);
                    continue;
                }
                consumer.close();
                consumer = null;
                leadBroker = findNewLeader(leadBroker, a_topic, a_partition, a_port);
                continue;
            }
            numErrors = 0;

            long numRead = 0;
            for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) {
                long currentOffset = messageAndOffset.offset();
                if (currentOffset < readOffset) {
                    LOG.error("Found an old offset: " + currentOffset + " Expecting: " + readOffset);
                    continue;
                }
                readOffset = messageAndOffset.nextOffset();
                ByteBuffer payload = messageAndOffset.message().payload();

                byte[] bytes = new byte[payload.limit()];
                payload.get(bytes);
                LOG.info("the messag's offset is :"+String.valueOf(messageAndOffset.offset()) + " and the value is :" + new String(bytes, "UTF-8"));
                numRead++;
                a_maxReads--;
            }

            if (numRead == 0) {
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException ie) {
                }
            }
        }
        if (consumer != null) consumer.close();
    }

    public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,
                                     long whichTime, String clientName) {
        TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);
        Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();
        requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));
        kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(
                requestInfo, kafka.api.OffsetRequest.CurrentVersion(), clientName);
        OffsetResponse response = consumer.getOffsetsBefore(request);

        if (response.hasError()) {
            LOG.error("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition) );
            return 0;
        }
        long[] offsets = response.offsets(topic, partition);
        return offsets[0];
    }
    /**
     * 找一個leader broker
     * 遍歷每個broker，取出該topic的metadata，然後再遍歷其中的每個partition metadata，如果找到我們要找的partition就返回
     * 根據返回的PartitionMetadata.leader().host()找到leader broker
     * @param a_oldLeader
     * @param a_topic
     * @param a_partition
     * @param a_port
     * @return
     * @throws Exception
     */
    private String findNewLeader(String a_oldLeader, String a_topic, int a_partition, int a_port) throws Exception {
        for (int i = 0; i < 3; i++) {
            boolean goToSleep = false;
            PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition);
            if (metadata == null) {
                goToSleep = true;
            } else if (metadata.leader() == null) {
                goToSleep = true;
            } else if (a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) {
                // first time through if the leader hasn't changed give ZooKeeper a second to recover
                // second time, assume the broker did recover before failover, or it was a non-Broker issue
                //
                goToSleep = true;
            } else {
                return metadata.leader().host();
            }
            if (goToSleep) {
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException ie) {
                }
            }
        }
        LOG.error("Unable to find new leader after Broker failure. Exiting");
        throw new Exception("Unable to find new leader after Broker failure. Exiting");
    }
    /**
     * 
     * @param a_seedBrokers
     * @param a_port
     * @param a_topic
     * @param a_partition
     * @return
     */
    private PartitionMetadata findLeader(List<String> a_seedBrokers, int a_port, String a_topic, int a_partition) {
        PartitionMetadata returnMetaData = null;

        loop:
        for (String seed : a_seedBrokers) { //遍歷每個broker 
            SimpleConsumer consumer = null;
            try {
                // 建立Simple Consumer，
                consumer = new SimpleConsumer(seed, a_port, 100000, 64 * 1024, "leaderLookup");
                List<String> topics = Collections.singletonList(a_topic);
                TopicMetadataRequest req = new TopicMetadataRequest(topics);
                //傳送TopicMetadata Request請求
                kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);
                //取到Topic的Metadata 
                List<TopicMetadata> metaData = resp.topicsMetadata();
                //遍歷每個partition的metadata
                for (TopicMetadata item : metaData) {
                    for (PartitionMetadata part : item.partitionsMetadata()) {
                        // 判斷是否是要找的partition
                        if (part.partitionId() == a_partition) {
                            returnMetaData = part;
                            //找到就返回
                            break loop;
                        }
                    }
                }
            } catch (Exception e) {
                LOG.info("Error communicating with Broker [" + seed + "] to find Leader for [" + a_topic
                        + ", " + a_partition + "] Reason: " + e);
            } finally {
                if (consumer != null) consumer.close();
            }
        }
        if (returnMetaData != null) {
            m_replicaBrokers.clear();
            for (kafka.cluster.BrokerEndPoint replica : returnMetaData.replicas()) {
                m_replicaBrokers.add(replica.host());
            }
        }
        return returnMetaData;
    }
}

四、使用NewConsumer API

(一)、自動提交offset偏移量

Properties props = new Properties();
//brokerServer(kafka)ip地址,不需要把所有叢集中的地址都寫上，可是一個或一部分
props.put("bootstrap.servers", "172.16.49.173:9092");
//設定consumer group name,必須設定
props.put("group.id", a_groupId);
//設定自動提交偏移量(offset),由auto.commit.interval.ms控制提交頻率
props.put("enable.auto.commit", "true");
//偏移量(offset)提交頻率
props.put("auto.commit.interval.ms", "1000");
//設定使用最開始的offset偏移量為該group.id的最早。如果不設定，則會是latest即該topic最新一個訊息的offset
//如果採用latest，消費者只能得道其啟動後，生產者生產的訊息
props.put("auto.offset.reset", "earliest");
//設定心跳時間
props.put("session.timeout.ms", "30000");
//設定key以及value的解析（反序列）類
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
//訂閱topic
consumer.subscribe(Arrays.asList("topic_test"));
while (true) {
    //每次取100條資訊
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records)
    System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
 }

需要注意的:

group.id :必須設定
auto.offset.reset：如果想獲得消費者啟動前生產者生產的訊息，則必須設定為earliest；如果只需要獲得消費者啟動後生產者生產的訊息，則不需要設定該項
enable.auto.commit(預設值為true)：如果使用手動commit offset則需要設定為false，並再適當的地方呼叫consumer.commitSync()，否則每次啟動消費折後都會從頭開始消費資訊(在auto.offset.reset=earliest的情況下);

（二）、自己控制偏移量提交

很多時候，我們是希望在獲得訊息並經過一些邏輯處理後，才認為該訊息已被消費，這可以通過自己控制偏移量提交來實現。

示例1：批量提交偏移量

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Properties;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;



/**
 * 手動批量提交偏移量
 * @author lxh
 *
 */
public class ManualOffsetConsumer {
    private static Logger LOG = LoggerFactory.getLogger(ManualOffsetConsumer.class);
    public ManualOffsetConsumer() {
        // TODO Auto-generated constructor stub
    }

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        Properties props = new Properties();
        //props.put("bootstrap.servers", bootstrapServers);//"172.16.49.173:9092;172.16.49.173:9093");
        //設定brokerServer(kafka)ip地址
        props.put("bootstrap.servers", "172.16.49.173:9092");
        //設定consumer group name
        props.put("group.id","manual_g1");

        props.put("enable.auto.commit", "false");

        //設定使用最開始的offset偏移量為該group.id的最早。如果不設定，則會是latest即該topic最新一個訊息的offset
        //如果採用latest，消費者只能得道其啟動後，生產者生產的訊息
        props.put("auto.offset.reset", "earliest");
        //
        props.put("session.timeout.ms", "30000");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        KafkaConsumer<String ,String> consumer = new KafkaConsumer<String ,String>(props);
        consumer.subscribe(Arrays.asList("producer_test"));
        final int minBatchSize = 5;  //批量提交數量
         List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
         while (true) {
             ConsumerRecords<String, String> records = consumer.poll(100);
             for (ConsumerRecord<String, String> record : records) {
                 LOG.info("consumer message values is "+record.value()+" and the offset is "+ record.offset());
                 buffer.add(record);
             }
             if (buffer.size() >= minBatchSize) {
                 LOG.info("now commit offset");
                 consumer.commitSync();
                 buffer.clear();
             }
         }
    }

}

示例2：消費完一個分割槽後手動提交偏移量

package com.goodix.kafka;

import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Properties;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * 消費完一個分割槽後手動提交偏移量
 * @author lxh
 *
 */
public class ManualCommitPartion {
    private static Logger LOG = LoggerFactory.getLogger(ManualCommitPartion.class);
    public ManualCommitPartion() {
        // TODO Auto-generated constructor stub
    }

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        Properties props = new Properties();
        //props.put("bootstrap.servers", bootstrapServers);//"172.16.49.173:9092;172.16.49.173:9093");
        //設定brokerServer(kafka)ip地址
        props.put("bootstrap.servers", "172.16.49.173:9092");
        //設定consumer group name
        props.put("group.id","manual_g2");

        props.put("enable.auto.commit", "false");

        //設定使用最開始的offset偏移量為該group.id的最早。如果不設定，則會是latest即該topic最新一個訊息的offset
        //如果採用latest，消費者只能得道其啟動後，生產者生產的訊息
        props.put("auto.offset.reset", "earliest");
        //
        props.put("session.timeout.ms", "30000");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        KafkaConsumer<String ,String> consumer = new KafkaConsumer<String ,String>(props);
        consumer.subscribe(Arrays.asList("producer_test"));
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Long.MAX_VALUE);
            for (TopicPartition partition : records.partitions()) {
                List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
                for (ConsumerRecord<String, String> record : partitionRecords) {
                    LOG.info("now consumer the message it's offset is :"+record.offset() + " and the value is :" + record.value());
                }
                long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset();
                LOG.info("now commit the partition[ "+partition.partition()+"] offset");
                consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
            }
        }
    }

}

（三）、指定消費某個分割槽的訊息

import java.util.Arrays;
import java.util.Properties;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * 消費指定分割槽的訊息
 * @author lxh
 *
 */
public class ManualPartion {
    private static Logger LOG = LoggerFactory.getLogger(ManualPartion.class);
    public ManualPartion() {
        // TODO Auto-generated constructor stub
    }

    public static void main(String[] args) {
        Properties props = new Properties();
        //設定brokerServer(kafka)ip地址
        props.put("bootstrap.servers", "172.16.49.173:9092");
        //設定consumer group name
        props.put("group.id", "manual_g4");
        //設定自動提交偏移量(offset),由auto.commit.interval.ms控制提交頻率
        props.put("enable.auto.commit", "true");
        //偏移量(offset)提交頻率
        props.put("auto.commit.interval.ms", "1000");
        //設定使用最開始的offset偏移量為該group.id的最早。如果不設定，則會是latest即該topic最新一個訊息的offset
        //如果採用latest，消費者只能得道其啟動後，生產者生產的訊息
        props.put("auto.offset.reset", "earliest");
        //
        props.put("session.timeout.ms", "30000");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        TopicPartition partition0 = new TopicPartition("producer_test", 0);
        TopicPartition partition1 = new TopicPartition("producer_test", 1);
        KafkaConsumer<String ,String> consumer = new KafkaConsumer<String ,String>(props);
        consumer.assign(Arrays.asList(partition0, partition1));
        while (true) {
              ConsumerRecords<String, String> records = consumer.poll(Long.MAX_VALUE);
              for (ConsumerRecord<String, String> record : records)
                  System.out.printf("offset = %d, key = %s, value = %s  \r\n", record.offset(), record.key(), record.value());
              try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
    }

}

總結

使用newConsumer API 只需要引用kafka-clients即可
newConsumer API 更加易懂、易用

<dependency>
&l

 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    Kafka學習整理七(producer和consumer程式設計實踐)
      
							
							
							實踐程式碼採用kafka-clients V0.10.0.0  編寫



一、編寫producer

第一步：使用./kafka-topics.sh 命令建立topic及partitions 分割槽數 

./kafka-topics.sh --create- 

  
 

    

    
    Kafka使用Java進行Producer和Consumer程式設計
      
                比較舊的kafka_2.10-0.8.2.0版本：（參考自http://chengjianxiaoxue.iteye.com/blog/2190488）生產者程式碼：import java.util.Properties;  
import java.util.concurr 

  
 

    

    
    kafka執行Producer和Consumer時出現Failed to load class org.slf4j.impl.StaticLoggerBinder錯誤
      
								
								            
						
                
當執行這一步時：

報瞭如下的錯誤：






解決辦法：
通過ps -aux | grep kafka找到該程序的程序pid，然後通過
lsof -p pid   //pid為上面你找到的該程序的 

  
 

    

    
    安裝部署（六） Kafka叢集安裝部署以及Producer和Consumer的JAVA程式碼測試
      
                
Kafka叢集安裝部署以及Producer和Consumer的JAVA程式碼測試
kafka scala2.11_0.10.0.0
ubuntu 14.04.04 x64
hadoop 2.7.2spark 2.0.0
scala 2.11.8
jdk 1.8.0_101
 

  
 

    

    
    Kafka入門，producer和consumer與hive
      
                {

       "name":"hdfs-hive-sink-03",

       "config":{

              "connector.class":"io.confluent.connect.hdfs.HdfsSinkConnector",

 

  
 

    

    
    kafka客戶端Producer和Consumer關於自定義訊息序列和反序列
      
                一、背景    最近在學習kafka相關的知識，正好遇到一個疑問，在寫demo的過程中發現，投遞的資料都是字串型別，那麼就想想在實際應用中應該會有大量的需求投遞自定義資料型別，那麼如何才能投遞自定義資料型別呢？這裡面就涉及到了kafka提供的介面序列化和反序列化的功能。二、k 

  
 

    

    
    Java連接ActiveMQ代碼示例(Producer和Consumer)
      生產   exce   org   默認   main   isp   用戶   close   trac   

import org.apache.activemq.ActiveMQConnection;
import org.apache.activemq.ActiveMQConnectionFacto 

  
 

    

    
    Kafka學習筆記 --- bootstrap-servers 和 broker.list區別
       
 
 在使用的時候會用到bootstrap，與broker.list 
 本以為是兩個引數，其實是實現一個功能，檢視原始碼後發現broker.list是舊版本命令 
  val bootsrapServers = { if(properties.containsKey("metadata.broker.l 

  
 

    

    
    swift學習第七天 ?和!
      
Swift語言使用var定義變數，但和別的語言不同，Swift裡不會自動給變數賦初始值，也就是說變數不會有預設值，所以要求使用變數之前必須要對其初始化。如果在使用變數之前不進行初始化就會報錯：
var stringValue : String  //error: variable 'stringValue' 

  
 

    

    
    RocketMQ中Producer和Consumer啟動出錯
      
								
								            
						
                //錯誤描述com.alibaba.rocketmq.remoting.exception.RemotingConnectException: connect to <192.168.237.13 

  
 

    

    
    Kafka學習之七 為什麼說Kafka使用磁碟比記憶體快
      
                
        Kafka最核心的思想是使用磁碟，而不是使用記憶體，可能所有人都會認為，記憶體的速度一定比磁碟快，我也不例外。在看了Kafka的設計思想，查閱了相應資料再加上自己的測試後，發現磁碟的順序讀寫速度和記憶體持平。
       而且Linux對於磁碟的讀寫優化也 

  
 

    

    
    學習整理——多程序和多執行緒概念理解
      
                
程序
        一個程序，包括了程式碼、資料和分配給程序的資源（記憶體），在計算機系統裡直觀地說一個程序就是一個PID。作業系統保護程序空間不受外部程序干擾，即一個程序不能訪問到另一個程序的記憶體。有時候程序間需要進行通訊，這時可以使用作業系統提供程序間通訊機制。通常 

  
 

    

    
    Kafka學習整理九(叢集的擴容)
      
							
							
							第一步 配置新得broker


  
  將現有的叢集上任一個伺服器上的kafka目錄拷貝到新的伺服器上
  修改config/server.properties中的broker.id、log.dirs、listeners
  建立logs.dirs指定的目錄 

  
 

    

    
    學習筆記:從0開始學習大資料-14. java spark程式設計實踐
       
 
 上節搭建好了eclipse spark程式設計環境 
 在測試執行scala 或java 編寫spark程式 ，在eclipse平臺都可以執行，但打包匯出jar，提交 spark-submit執行，都不能執行，最後確定是版本問題，就是你在eclipse除錯的spark版本需和spark-submit 

  
 

    

    
    spark Streaming 直接消費Kafka資料，儲存到 HDFS 實戰程式設計實踐
      
                最近在學習spark streaming 相關知識，現在總結一下

主要程式碼如下

def createStreamingContext():StreamingContext ={
  val sparkConf = new SparkConf().setAppName(" 

  
 

    

    
    Kafka 學習筆記之 Kafka0.11之console-producer/console-consumer
      scribe   tor   新的   producer   建立   actor   sum   consumer   creat   Kafka 學習筆記之 Kafka0.11之console-producer/console-consumer:
 
啟動Zookeeper
啟動Kafka0.11
創建一 

  
 

    

    
    Kafka系列3-python版本producer生產者和consumer消費者例項
      
                
直接上程式碼了：

# -*- coding: utf-8 -*-

'''
    使用kafka-Python 1.3.3模組
'''

import sys
import time
import json

from kafka import KafkaProduce 

  
 

    

    
    kafka學習筆記：知識點整理
      一個   eight   true   med   分組   pos   間接   fig   ges   
一、為什麽需要消息系統


1.解耦：　　允許你獨立的擴展或修改兩邊的處理過程，只要確保它們遵守同樣的接口約束。
2.冗余：
　　消息隊列把數據進行持久化直到它們已經被完全處理，通過這一方式規避了數據 

  
 

    

    
    JAVA學習（七）：方法重載與方法重寫、thiskeyword和superkeyword
      格式   hello   new   初始   per   而且   方法重寫   學習   方式   

方法重載與方法重寫、thiskeyword和superkeyword


1、方法重載


重載可以使具有同樣名稱但不同數目和類型參數的類傳遞給方法。
註：
一是重載方法的參數列表必須與被重載的方法不同 

  
 

    

    
    如何確定Kafka的分區數、key和consumer線程數
      為什麽   打包   lower   匹配   到來   har   mit   技術分享   每一個   轉自：http://www.tuicool.com/articles/Aj6fAj3
 
如何確定Kafka的分區數、key和consumer線程數


在Kafak中國社區的qq群中，這個問題被提及的