Flume跨伺服器實時採集資料
整體架構如下圖,有兩臺伺服器,在伺服器之間傳輸一般用avro 或者Thrift比較多,這裡選擇avro source和sink:
一、Flume配置
1.在A伺服器新建aserver.conf
#伺服器A(192.168.116.10) a1.sources = r1 a1.sinks = k1 a1.channels = c1 # 配置監控檔案 a1.sources.r1.type = exec a1.sources.r1.command =tail -F /usr/tmp/flume/1.log a1.sources.r1.shell = /bin/sh -c # 配置sink a1.sinks.k1.type = avro a1.sinks.k1.hostname=192.168.116.11 a1.sinks.k1.port = 44444 # 配置channel a1.channels.c1.type = memory # 將三者串聯 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
2.在B伺服器新建bserver.conf
#伺服器B(192.168.116.11) b1.sources = r2 b1.sinks = k2 b1.channels = c2 # 配置監控檔案 b1.sources.r2.type = avro b1.sources.r2.bind=192.168.116.11 b1.sources.r2.port = 44444 #b1.sources.r2.interceptors = i1 #b1.sources.r2.interceptors.i1.type = timestamp # 配置sink b1.sinks.k2.type =logger # 配置channel b1.channels.c2.type = memory # 將三者串聯 b1.sources.r2.channels = c2 b1.sinks.k2.channel = c2 ~
二、測試
1.先啟動bserver.conf
flume-ng agent -n b1 -c /usr/local/src/apache-flume-1.6.0-bin/conf -f /usr/local/src/apache-flume-1.6.0-bin/conf/bserver.conf -Dflume.root.logger=INFO,console
2.再啟動aserver.conf
flume-ng agent -n a1 -c /usr/local/src/apache-flume-1.6.0-bin/conf -f /usr/local/src/apache-flume-1.6.0-bin/conf/aserver.conf -Dflume.root.logger=INFO,console
往監控檔案裡面新增東西
可以看到控制檯已經監控到內容了。
把sink改成hdfs就可以採集到hdfs上了
三、踩坑說明
1.啟動順序,一定要先啟動B伺服器再啟動A伺服器
org.jboss.netty.channel.ChannelException: Failed to bind to: master/192.168.116.10:44444
Caused by: java.net.BindException: Cannot assign requested address
這說明你的IP地址配置錯了,要配置成B伺服器的,不是A伺服器的。
3.如果啟動成功,但是沒有監控到內容輸出,可能是flume的配置錯了,比如avro source 和avro sink 的ip配置是不一樣的,一個叫做hostname,一個叫做bind,,這個坑了我很久才注意到。