1. 程式人生 > >設定Flume監聽檔案內容

設定Flume監聽檔案內容

應用場景

按照Hadoop完全分散式安裝Flume博文,測試使用了Flume監聽資料夾,當資料夾中添加了檔案,Flume設定會立馬進行收集資料夾中的新增的檔案,那麼這是一種應用場景,但是如果我們想收集檔案中的內容,該如何辦呢?比如,linux目錄下有一個檔案,我會往這個檔案裡不斷的新增內容,那麼怎麼才能實時寫入到HDFS呢?

操作方案

Hadoop完全分散式安裝Flume博文,中監控資料夾,如果linux目錄的資料夾下,有檔案新增,那麼會自動採集到HDFS目錄,如果需要監控具體的檔案內容,如果該檔案中有資料更新,那麼需要修改flume-conf.properties檔案為如下,其他不變!

 # cd /opt/flume1.7.0/conf
# vim flume-conf.properties

# a.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/log/exec.text
a1.sources.r1.fileHeader
= true
a1.sources.r1.deserializer.outputCharset=UTF-8
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop0:9000/log
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.maxOpenFiles = 1
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0

a1.sinks.k1.hdfs.rollSize = 1000000
a1.sinks.k1.hdfs.batchSize = 100000
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
 # cd /opt/flume1.7.0/
# bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console