1. 程式人生 > >kafka資料儲存格式

kafka資料儲存格式

轉自:http://www.hemingliang.site/308.html

閱讀目錄

檢視主題資料分佈

 

[[email protected] kafka_2.10-0.10.2.1]$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
[2017-06-22 15:01:02,628] WARN Connected to an old server; r-o mode will be unavailable (org.apache.zookeeper.ClientCnxnSocket)
Topic:test      PartitionCount:1        ReplicationFactor:1     Configs:
        Topic: test     Partition: 0    Leader: 1       Replicas: 1     Isr: 1

Leader:指定主分割槽的broker id
Replicas: 副本在那些機器上
Isr:可以做為主分割槽的broker id

      由上面可以知道test的分割槽在broker id為1的機器上,進入kafka_2.10-0.10.2.1/kafka-logs,這個目錄是在server.properties中配置的log.dirs指定的目錄

      當前目錄下有一個test-0的目錄,日誌資料夾的命名規則是 主題名-分割槽號,進入test-0,內容如下

[[email protected] kafka-logs]$ cd test-0/
[
[email protected]
test-0]$ ls 00000000000000000000.index 00000000000000000000.log 00000000000000000000.timeindex

     可以發現數據檔案由.index檔案、.log檔案、.timeindex檔案組成

     可以通過kafka安裝目錄bin目錄下的kafka-run-class.sh檢視這些檔案的內容

檢視log檔案

[[email protected] test-0]$ ../../bin/kafka-run-class.sh  kafka.tools.DumpLogSegments --files 00000000000000000000.index  --print-data-log  
Dumping 00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 CreateTime: 1498104812192 isvalid: true payloadsize: 11 magic: 1 compresscodec: NONE crc: 3271928089 payload: hello world
offset: 1 position: 45 CreateTime: 1498104813269 isvalid: true payloadsize: 14 magic: 1 compresscodec: NONE crc: 242183772 payload: hello everyone

檢視index檔案

[[email protected] test-0]$ ../../bin/kafka-run-class.sh  kafka.tools.DumpLogSegments --files 00000000000000000000.index  --print-data-log  
Dumping 00000000000000000000.index
offset: 0 position: 0

檢視timeindex檔案

[[email protected] test-0]$ ../../bin/kafka-run-class.sh  kafka.tools.DumpLogSegments --files 00000000000000000000.timeindex  --print-data-log  
Dumping 00000000000000000000.timeindex
timestamp: 1498104813269 offset: 1
Found timestamp mismatch in :/home/hadoop/apps/kafka_2.10-0.10.2.1/kafka-logs/test-0/00000000000000000000.timeindex
  Index timestamp: 0, log timestamp: 1498104812192
Found out of order timestamp in :/home/hadoop/apps/kafka_2.10-0.10.2.1/kafka-logs/test-0/00000000000000000000.timeindex
  Index timestamp: 0, Previously indexed timestamp: 1498104813269

      index件和log檔案組成segment,segment檔案的命名規則是,partion全域性的第一個segment從0開始,後續每個segment檔名為上一個全域性partion的最大offset(偏移message數)。數值最大為64位long大小,19位數字字元長度,沒有數字用0填充。log.segment.bytes引數配置了一個log檔案的大小,檔案大小超過這個值就會生成新的檔案