Flume安裝-配置-除錯
apache-flume-1.6.0-bin.tar.gz 安裝包
1.Linux虛擬機器Centos 7.0,伺服器CPU:i5 雙核以上,記憶體:2G以上
2.JDK1.7.0以上、Hadoop -2.7 .1、
3.機器名 ip地址 安裝軟體
Master1 192.168.114.38
Slave1 192.168.114.39
Slave1 192.168.114.40
1、軟體解壓:
將軟體包放在/data目錄下,並解壓到/soft目錄下,拷貝叢集上的hadoop安裝檔案到/so ft目錄下
2、新增環境變數:
Master1 192.168.114.38
Slave1 192.168.114.39
Slave1 192.168.114.40
所有機器vim /etc/profile 向檔案中新增以下變數:
export HADOOP_HOME=/soft/hadoop-2.7.1
exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
exportFLUME_HOME=/soft/apache-flume-1.6.0-bin/
export PATH=$PATH:$FLUME_HOME/bin
儲存退出,並使檔案生效:
3、複製flume到其他節點
cd /soft/
scp -r [email protected]: /soft
scp -r [email protected]: /soft
5、配置agent啟動檔案
Master1節點上,在flume conf目錄中flume-conf.properties.template重新命名為agent0.conf
修改為以下內容:
agent0.sources = source1
agent0.channels = memoryChannel
agent0.sinks = sink1
agent0.sources.source1.type = avro
agent0.sources.source1.bind =192.168.114.38
agent0.sources.source1.port = 23004
agent0.sources.source1.channels =memoryChannel
agent0.sources.source1.interceptors = i1
agent0.sources.source1.interceptors.i1.type= timestamp
agent0.channels.memoryChannel.type = memory
agent0.channels.memoryChannel.capacity =2000
agent0.channels.memoryChannel.keep-alive =100
agent0.sinks.sink1.type = hdfs
agent0.sinks.sink1.hdfs.path =hdfs://192.168.114.20:8020/input/%y-%m-%d
agent0.sinks.sink1.hdfs.fileType =DataStream
agent0.sinks.sink1.hdfs.writeFormat = TEXT
agent0.sinks.sink1.hdfs.rollInterval = 1
agent0.sinks.sink1.hdfs.rollSize = 400000
agent0.sinks.sink1.hdfs.rollCount = 100
agent0.sinks.sink1.channel = memoryChannel
agent0.sinks.sink1.hdfs.filePrefix =events-
如下圖所示:
(注:rollCount不設定的話預設是10行,上傳的檔案內容如果多於10行,會被切分成兩個檔案,適當改大些.
source1.port埠選用前檢視該埠是否被佔用:netstat -tunlp | grep 23004)
slave1節點上,在flume conf目錄中flume-conf.properties.template重新命名為agent1.conf
vim agent1.conf
修改為以下內容:
agent1.sources= source1
agent1.channels=Channel1
agent1.sinks= sink1
#agent1.sources.source1.type= spooldir
agent1.sources.source1.type= exec
#agent1.sources.source1.spoolDir= /usr/local/flumelog
agent1.sources.source1.command= tail -F /usr/local/flumelog/flume_test1.txt
agent1.sources.source1.channels=Channel1
agent1.channels.Channel1.type= file
agent1.channels.Channel1.checkpointDir= /usr/local/tmp/checkpoint
agent1.channels.Channel1.dataDirs= /usr/local/tmp/datadir
agent1.sinks.sink1.type= avro
agent1.sinks.sink1.hostname= 192.168.114.38
agent1.sinks.sink1.port= 23004
agent1.sinks.sink1.channel=Channel1
如下圖所示:
Slave2節點上,在flume conf目錄中flume-conf.properties.template重新命名為agent2.conf
vim agent2.conf
修改內容為:
agent2.sources= source1
agent2.channels=Channel1
agent2.sinks= sink1
agent2.sources.source1.type= spooldir
agent2.sources.source1.spoolDir= /usr/local/flumelog
agent2.sources.source1.channels=Channel1
agent2.channels.Channel1.type= file
agent2.channels.Channel1.checkpointDir= /usr/local/tmp/checkpoint
agent2.channels.Channel1.dataDirs= /usr/local/tmp/datadir
agent2.sinks.sink1.type= avro
agent2.sinks.sink1.hostname= 192.168.114.38
agent2.sinks.sink1.port= 23004
agent2.sinks.sink1.channel=Channel1
如下圖所示:
6、啟動flume
進入當前機器flume 目錄下面進行啟動,先啟動master1節點的flume,不然會報異常,並且確保hadoop之前是啟動的
在/soft/apache-flume-1.6.0-bin/下分別啟動:
master1: flume-ng agent--conf ./conf/ -f ./conf/agent0.conf -n agent0 -Dflume.root.logger=INFO,console
slave1:flume-ng agent--conf ./conf/ -f ./conf/agent1.conf -n agent1 -Dflume.root.logger=INFO,console
slave2:flume-ng agent--conf ./conf/ -f ./conf/agent2.conf -n agent2 -Dflume.root.logger=INFO,console
(注意此處的-n agent0名稱對應agent0.conf中的agent0)
3臺機器服務都啟動之後, 可以往 slave1或者slave2 機器中的 資料夾下面(配置好的spooldir路徑)加入新的檔案,flume 叢集會將這些資料 都收集到master1上面,然後寫到hdfs 中