設定Flume監聽檔案內容
阿新 • • 發佈:2019-01-03
應用場景
按照Hadoop完全分散式安裝Flume博文,測試使用了Flume監聽資料夾,當資料夾中添加了檔案,Flume設定會立馬進行收集資料夾中的新增的檔案,那麼這是一種應用場景,但是如果我們想收集檔案中的內容,該如何辦呢?比如,linux目錄下有一個檔案,我會往這個檔案裡不斷的新增內容,那麼怎麼才能實時寫入到HDFS呢?
操作方案
Hadoop完全分散式安裝Flume博文,中監控資料夾,如果linux目錄的資料夾下,有檔案新增,那麼會自動採集到HDFS目錄,如果需要監控具體的檔案內容,如果該檔案中有資料更新,那麼需要修改flume-conf.properties檔案為如下,其他不變!
# cd /opt/flume1.7.0/conf
# vim flume-conf.properties
# a.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/log/exec.text
a1.sources.r1.fileHeader = true
a1.sources.r1.deserializer.outputCharset=UTF-8
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop0:9000/log
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.maxOpenFiles = 1
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 1000000
a1.sinks.k1.hdfs.batchSize = 100000
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
# cd /opt/flume1.7.0/
# bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console