21.Flume概述和企業開發案例

阿新 • • 發佈：2020-08-07

一、Flume概述

1.1 Flume定義

Flume是Cloudera提供的一個高可用的，高可靠的，分散式的海量日誌採集、聚合和傳輸的系統。Flume基於流式架構，靈活簡單。

Flume最主要的作用就是，實時讀取伺服器本地磁碟的資料，將資料寫入到HDFS。

1.2 Flume的優點

可以和任意儲存程序整合
輸入的的資料速率大於寫入目的儲存的速率，Flume會進行緩衝，減小HDFS的壓力。
Flume中的事務基於Channel，使用了兩個事務模型（sender + receiver），確保訊息被可靠傳送。

Flume使用兩個獨立的事務分別負責從soucrce到channel，以及從channel

到sink的事件傳遞。一旦事務中所有的資料全部成功提交到channel，那麼source才認為該資料讀取完成。同理，只有成功被sink寫出去的資料，才會從channel中移除。

1.3 Flume組成架構

Put事務流程：

doPut：將批資料先寫入臨時緩衝區putList
doCommit：檢查channel記憶體佇列是否足夠合併。
doRollback：channel記憶體佇列空間不足，回滾資料

Take事務：

doTake：先將資料取到臨時緩衝區takeList
doCommit：如果資料全部發送成功，則清除臨時緩衝區takeList
doRollback：資料傳送過程中如果出現異常，rollback

將臨時緩衝區takeList中的資料歸還給channel記憶體佇列。

下面我們來詳細介紹一下Flume架構中的元件。

①Agent

Agent是一個JVM程序，它以事件的形式將資料從源頭送至目的。
Agent主要有3個部分組成：Source、Channel、Sink。

②Source

Source是負責接收資料到Flume Agent的元件。Source元件可以處理各種型別、各種格式的日誌資料，包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy。

③Channel

Channel

是位於Source和Sink之間的緩衝區。因此，Channel允許Source和Sink運作在不同的速率上。Channel是執行緒安全的，可以同時處理幾個Source的寫入操作和幾個Sink的讀取操作。

Flume自帶兩種Channel：Memory Channel和File Channel。

Memory Channel：記憶體中的佇列。Memory Channel在不需要關心資料丟失的情景下適用。如果需要關心資料丟失，那麼Memory Channel就不應該使用，因為程式死亡、機器宕機或者重啟都會導致資料丟失。

File Channel：將所有事件寫到磁碟。因此在程式關閉或機器宕機的情況下不會丟失資料。

④Sink

Sink不斷地輪詢Channel中的事件且批量地移除它們，並將這些事件批量寫入到儲存或索引系統、或者被髮送到另一個Flume Agent。

Sink是完全事務性的。在從Channel批量刪除資料之前，每個Sink用Channel啟動一個事務。批量事件一旦成功寫出到儲存系統或下一個Flume Agent，Sink就利用Channel提交事務。事務一旦被提交，該Channel從自己的內部緩衝區刪除事件。

Sink元件目的地包括hdfs、logger、avro、thrift、ipc、file、null、HBase、solr、自定義

⑤Event

Flume資料傳輸的基本單元，以事件的形式將資料從源頭送至目的地。Event由可選的header和載有資料的一個byte array構成。Header是容納了key-value字串對的HashMap。

1.4 Flume拓撲結構

①Flume Agent連線

這種模式是將多個Flume給順序連線起來了，從最初的Source開始到最終Sink傳送的目的儲存系統。此模式不建議橋接過多的Flume數量，Flume數量過多不僅會影響傳輸速率，而且一旦傳輸過程中某個節點Flume宕機，會影響整個傳輸系統。

②單source，多channel、sink

Flume支援將事件流向一個或者多個目的地。這種模式將資料來源複製到多個Channel中，每個Channel都有相同的資料，Sink可以選擇傳送的不同的目的地。

③Flume負載均衡

Flume支援使用將多個Sink邏輯上分到一個Sink組，Flume將資料傳送到不同的Sink，主要解決負載均衡和故障轉移問題。

④ Flume Agent聚合

這種模式是我們最常見的，也非常實用，日常web應用通常分佈在上百個伺服器，大者甚至上千個、上萬個伺服器。產生的日誌，處理起來也非常麻煩。用Flume的這種組合方式能很好的解決這一問題，每臺伺服器部署一個Flume採集日誌，傳送到一個集中收集日誌的Flume，再由此Flume上傳到hdfs、hive、hbase、jms等，進行日誌分析。

1.5 Flume Agent內部原理

1.6 Flume安裝

解壓apache-flume-1.7.0-bin.tar.gz到/opt/module/目錄下

[root@hadoop100 software]$ tar -zxf apache-flume-1.7.0-bin.tar.gz -C 
/opt/module/

2.複製conf下的flume-env.sh.template為flume-env.sh，並配置JAVA_HOME

[root@hadoop100 conf]$ mv flume-env.sh.template flume-env.sh
[root@hadoop100 conf]$ vi flume-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

二、企業開發案例

2.1 監控埠資料

案例需求： 首先啟動Flume任務，監控本機44444埠，服務端；然後通過netcat工具向本機44444埠傳送訊息，客戶端；最後Flume將監聽的資料實時顯示在控制檯。

實現步驟：

① 建立Flume Agent配置檔案flume-netcat-logger.conf

在Flume目錄下建立Job資料夾並進入Job資料夾

[root@hadoop100 flume]# mkdir job
[root@hadoop100 flume]# cd job/

建立Flume Agent配置檔案flume-netcat-logger.conf

[root@hadoop100 flume]# vim flume-netcat-logger.conf
# Name the components on this agent
# a1 :表示agent的名稱
a1.sources = r1  #r1 :表示a1的輸入源
a1.sinks = k1 #k1 :表示a1的輸出目的地
a1.channels = c1 #c1：表示a1的緩衝區

# Describe/configure the source
a1.sources.r1.type = netcat #表示a1的輸入源型別為netcat埠型別
a1.sources.r1.bind = localhost #表示a1的監聽的主機
a1.sources.r1.port = 44444 #表示a1的監聽的埠號

# Describe the sink
a1.sinks.k1.type = logger #表示a1的輸出目的地是控制檯logger型別

# Use a channel which buffers events in memory
a1.channels.c1.type = memory #表示a1的channel型別是memory記憶體型
a1.channels.c1.capacity = 1000 #表示al的channel總容量1000個event
#表示a1的channel傳輸時收集到了100條event以後再去提交事務
a1.channels.c1.transactionCapacity = 100 
										
# Bind the source and sink to the channel
a1.sources.r1.channels = c1 #表示將r1和c1連線起來
a1.sinks.k1.channel = c1 # 表示將k1和c1連線起來

②開啟Flume監聽埠

第一種寫法：

[root@hadoop100 flume]# bin/flume-ng agent --conf conf/ --name a1 
--conf-file job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

第二種寫法：

[root@hadoop100 flume]$ bin/flume-ng agent -c conf/ -n a1 –f 
job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

引數說明：

--conf conf/：表示配置檔案儲存在conf/目錄
--name a1：表示給agent起名為a1
--conf-file job/flume-netcat.conf ：flume本次啟動讀取的配置檔案是在job資料夾下的flume-telnet.conf檔案。
-Dflume.root.logger==INFO,console ：-D表示flume執行時動態修改flume.root.logger引數屬性值，並將控制檯日誌列印級別設定為INFO級別。日誌級別包括:log、info、warn、error。

③ 使用netcat工具向44444埠傳送內容

[root@hadoop100 flume]$ nc localhost 44444
Hello Flume

④在Flume監聽頁面觀察接收資料情況

2.2 實時讀取本地檔案到HDFS

案例需求： 實時監控Hive日誌，並上傳到HDFS中

實現步驟：

①Flume要想將資料輸出到HDFS，必須持有Hadoop相關jar包

將commons-configuration-1.6.jar、hadoop-auth-2.7.2.jar、hadoop-common-2.7.2.jar、hadoop-hdfs-2.7.2.jar、commons-io-2.4.jar、htrace-core-3.1.0-incubating.jar拷貝到/opt/module/flume/lib資料夾下。

②建立flume-file-hdfs.conf檔案

內容如下：

# Name the components on this agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2

# Describe/configure the source
a2.sources.r2.type = exec
a2.sources.r2.command = tail -F /opt/module/hive-1.2.1/logs/hive.log
a2.sources.r2.shell = /bin/bash -c

# Describe the sink
a2.sinks.k2.type = hdfs
a2.sinks.k2.hdfs.path = hdfs://hadoop100:9000/flume/%Y%m%d/%H
#上傳檔案的字首
a2.sinks.k2.hdfs.filePrefix = logs-
#是否按照時間滾動資料夾
a2.sinks.k2.hdfs.round = true
#多少時間單位建立一個新的資料夾
a2.sinks.k2.hdfs.roundValue = 1
#重新定義時間單位
a2.sinks.k2.hdfs.roundUnit = hour
#是否使用本地時間戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000
#設定檔案型別，可支援壓縮
a2.sinks.k2.hdfs.fileType = DataStream
#多久生成一個新的檔案
a2.sinks.k2.hdfs.rollInterval = 60
#設定每個檔案的滾動大小
a2.sinks.k2.hdfs.rollSize = 134217700
#檔案的滾動與Event數量無關
a2.sinks.k2.hdfs.rollCount = 0

# Use a channel which buffers events in memory
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

注意：對於所有與時間相關的轉義序列，Event Header中必須存在以"timestamp"的key（除非hdfs.useLocalTimeStamp設定為true，此方法會使用TimestampInterceptor自動新增timestamp），即設定a3.sinks.k3.hdfs.useLocalTimeStamp = true。

③ 執行監控配置

[root@hadoop100 flume]# bin/flume-ng agent --conf conf/ --name a2 
--conf-file job/flume-file-hdfs.conf

④開啟Hadoop和Hive並操作Hive產生日誌

[root@hadoop100 hadoop-2.7.2]$ sbin/start-dfs.sh
[root@hadoop101 hadoop-2.7.2]$ sbin/start-yarn.sh

[root@hadoop100 hive]$ bin/hive
hive (default)>

⑤在HDFS上檢視檔案

2.3 實時讀取目錄檔案到HDFS

案例需求： 使用Flume監聽整個目錄的檔案

實現步驟：

① 建立配置檔案flume-dir-hdfs.conf

a3.sources = r3
a3.sinks = k3
a3.channels = c3

# Describe/configure the source
a3.sources.r3.type = spooldir
#監控的地址
a3.sources.r3.spoolDir = /opt/module/flume-1.7.0/upload
a3.sources.r3.fileSuffix = .COMPLETED
a3.sources.r3.fileHeader = true
#忽略所有以.tmp結尾的檔案，不上傳
a3.sources.r3.ignorePattern = ([^ ]*\.tmp)

# Describe the sink
a3.sinks.k3.type = hdfs
#檔案上傳到hdfs的路徑
a3.sinks.k3.hdfs.path = hdfs://hadoop100:9000/flume/upload/%Y%m%d/%H
#上傳檔案的字首
a3.sinks.k3.hdfs.filePrefix = upload-
#是否按照時間滾動資料夾
a3.sinks.k3.hdfs.round = true
#多少時間單位建立一個新的資料夾
a3.sinks.k3.hdfs.roundValue = 1
#重新定義時間單位
a3.sinks.k3.hdfs.roundUnit = hour
#是否使用本地時間戳
a3.sinks.k3.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a3.sinks.k3.hdfs.batchSize = 100
#設定檔案型別，可支援壓縮
a3.sinks.k3.hdfs.fileType = DataStream
#多久生成一個新的檔案
a3.sinks.k3.hdfs.rollInterval = 60
#設定每個檔案的滾動大小大概是128M
a3.sinks.k3.hdfs.rollSize = 134217700
#檔案的滾動與Event數量無關
a3.sinks.k3.hdfs.rollCount = 0

# Use a channel which buffers events in memory
a3.channels.c3.type = memory
a3.channels.c3.capacity = 1000
a3.channels.c3.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

②啟動監控資料夾命令

[root@hadoop100 flume]$ bin/flume-ng agent --conf conf/ --name a3 --conf-file job/flume-dir-hdfs.conf

說明：
加粗樣式

不要在監控目錄中建立並持續修改檔案
上傳完成的檔案會以.COMPLETED結尾
被監控資料夾每500毫秒掃描一次檔案變動

②向upload資料夾中新增檔案

[root@hadoop100 flume]$ mkdir upload
[root@hadoop100 upload]$ vim  test.txt
123
456

③檢視HDFS上的資料

、
④檢視upload資料夾

2.4 單資料來源多出口案例(選擇器)

案例需求： 使用Flume-1監控檔案變動，Flume-1將變動內容傳遞給Flume-2，Flume-2負責儲存到HDFS。同時Flume-1將變動內容傳遞給Flume-3，Flume-3負責輸出到Local FileSystem。

實現步驟：

①準備工作

在/opt/module/flume/job目錄下建立group1資料夾：
[root@hadoop100 job]# mkdir group1/

在/opt/module/data/目錄下建立flume3資料夾
[root@hadoop100 data]# mkdir flume3

②建立flume-file-flume.conf

配置1個接收日誌檔案的source和兩個channel、兩個sink，分別輸送給flume-flume-hdfs和flume-flume-dir。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# 將資料流複製給所有channel
a1.sources.r1.selector.type = replicating

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive-1.2.1/logs/hive.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
# sink端的avro是一個數據傳送者
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop100 
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop100
a1.sinks.k2.port = 4142

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

注：Avro是由Hadoop創始人Doug Cutting建立的一種語言無關的資料序列化和RPC框架。

③建立flume-flume-hdfs.conf

配置上級Flume輸出的Source，輸出是到HDFS的Sink。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# 將資料流複製給所有channel
a1.sources.r1.selector.type = replicating

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive-1.2.1/logs/hive.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
# sink端的avro是一個數據傳送者
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop100 
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop100
a1.sinks.k2.port = 4142

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

[root@hadoop100 group1]# 
[root@hadoop100 group1]# cat flume-flume-hdfs.conf 
# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
# source端的avro是一個數據接收服務
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop100
a2.sources.r1.port = 4141

# Describe the sink
a2.sinks.k1.type = hdfs
a2.sinks.k1.hdfs.path = hdfs://hadoop100:9000/flume2/%Y%m%d/%H
#上傳檔案的字首
a2.sinks.k1.hdfs.filePrefix = flume2-
#是否按照時間滾動資料夾
a2.sinks.k1.hdfs.round = true
#多少時間單位建立一個新的資料夾
a2.sinks.k1.hdfs.roundValue = 1
#重新定義時間單位
a2.sinks.k1.hdfs.roundUnit = hour
#是否使用本地時間戳
a2.sinks.k1.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a2.sinks.k1.hdfs.batchSize = 100
#設定檔案型別，可支援壓縮
a2.sinks.k1.hdfs.fileType = DataStream
#多久生成一個新的檔案
a2.sinks.k1.hdfs.rollInterval = 600
#設定每個檔案的滾動大小大概是128M
a2.sinks.k1.hdfs.rollSize = 134217700
#檔案的滾動與Event數量無關
a2.sinks.k1.hdfs.rollCount = 0

# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

④建立flume-flume-dir.conf

配置上級Flume輸出的Source，輸出是到本地目錄的Sink。

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c2

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop100
a3.sources.r1.port = 4142

# Describe the sink
a3.sinks.k1.type = file_roll
a3.sinks.k1.sink.directory = /opt/module/data/flume3

# Describe the channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2

⑤執行配置檔案

[root@hadoop100 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a3
 --conf-file job/group1/flume-flume-dir.conf

[root@hadoop100 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a2
 --conf-file job/group1/flume-flume-hdfs.conf

[root@hadoop100 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a1
 --conf-file job/group1/flume-file-flume.conf

⑥執行Hive命令

[root@hadoop100 hive]$ bin/hive
hive (default)> select * from stu;

⑦檢查資料

HDFS：

本地：

2.5 單資料來源多出口案例(Sink組)

案例需求： 使用Flume-1監控控制檯實時輸入資料，Flume-1將內容輪訓分別傳遞給Flume-2，Flume-3然後在控制檯列印。

實現步驟：

①準備工作

在/opt/module/flume/job下group2資料夾

[root@hadoop100 job]# mkdir group2

②建立flume-netcat-flume.conf

配置1個接收日誌檔案的source和1個channel、兩個sink，分別輸送給flume-flume-console1和flume-flume-console2。

# Name the components on this agent
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin
a1.sinkgroups.g1.processor.selector.maxTimeOut=10000

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop100
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop100
a1.sinks.k2.port = 4142

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

③建立flume-flume-console1.conf和flume-flume-console2.conf

配置上級Flume輸出的Source，輸出是到本地控制檯。

flume-flume-console1.conf：

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop100
a2.sources.r1.port = 4141

# Describe the sink
a2.sinks.k1.type = logger

# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

flume-flume-console2.conf：

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c2

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop100
a3.sources.r1.port = 4142

# Describe the sink
a3.sinks.k1.type = logger

# Describe the channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2

④執行配置檔案

[root@hadoop100 flume-1.7.0]#  bin/flume-ng agent --conf conf/ --name a3
--conf-file job/group2/flume-flume-console2.conf 
-Dflume.root.logger=INFO,console

[root@hadoop100 flume-1.7.0]#  bin/flume-ng agent --conf conf/ --name a2
--conf-file job/group2/flume-flume-console1.conf 
-Dflume.root.logger=INFO,console

[root@hadoop100 flume-1.7.0]#  bin/flume-ng agent --conf conf/ --name a1 
--conf-file job/group2/flume-netcat-flume.conf

⑤檢視Flume2及Flume3的控制檯列印日誌

2.6 多資料來源彙總案例

案例需求：

Hadoop100上的Flume-1監控檔案/opt/module/group.log；
Hadoop101上的Flume-2監控某一個埠的資料流；
Flume-1與Flume-2將資料傳送給hadoop102上的Flume-3，Flume-3將最終資料列印到控制檯。

實現步驟：

①準備工作
在/opt/module/flume/job下group3資料夾

[root@hadoop100 job]# mkdir group3

②建立flume1-logger-flume.conf

Hadoop100：配置Source用於監控hive.log檔案，配置Sink輸出資料到下一級Flume。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive-1.2.1/logs/hive.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

③建立flume2-netcat-flume.conf

Hadoop101：配置Source監控埠44444資料流，配置Sink資料到下一級Flume。

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = netcat
a2.sources.r1.bind = hadoop101
a2.sources.r1.port = 44444

# Describe the sink
a2.sinks.k1.type = avro
a2.sinks.k1.hostname = hadoop102
a2.sinks.k1.port = 4141

# Use a channel which buffers events in memory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

④建立flume3-flume-logger.conf

Hadoop102：配置source用於接收flume1與flume2傳送過來的資料流，最終合併後sink到控制檯。

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop102
a3.sources.r1.port = 4141

# Describe the sink
# Describe the sink
a3.sinks.k1.type = logger

# Describe the channel
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1

⑤執行配置檔案

[root@hadoop102 flume-1.7.0]#   bin/flume-ng agent --conf conf/ --name a3 
--conf-file job/group3/flume3-flume-logger.conf
 -Dflume.root.logger=INFO,console

[root@hadoop101 flume-1.7.0]#   bin/flume-ng agent --conf conf/ --name a2 
--conf-file job/group3/flume2-netcat-flume.conf

[root@hadoop100 flume-1.7.0]#   bin/flume-ng agent --conf conf/ --name a1 
--conf-file job/group3/flume1-logger-flume.conf

21.Flume概述和企業開發案例

一、Flume概述

1.1 Flume定義

1.2 Flume的優點

1.3 Flume組成架構

1.4 Flume拓撲結構

1.5 Flume Agent內部原理

1.6 Flume安裝

二、企業開發案例

2.1 監控埠資料

2.2 實時讀取本地檔案到HDFS

2.3 實時讀取目錄檔案到HDFS

2.4 單資料來源多出口案例(選擇器)

2.5 單資料來源多出口案例(Sink組)

2.6 多資料來源彙總案例

21.Flume概述和企業開發案例

Flume基礎（七）：企業開發案例（四）

Flume基礎（八）：企業開發案例（五）

Flume基礎（九）：企業開發案例（六）

Spring註解開發@Bean和@ComponentScan使用案例

1.Python語言概述和開發環境

HADOOP 優化（7）：Hadoop綜合調優(2)企業開發場景案例

SAP Business ByDesign 和支付寶與釘釘整合的一個原型開發案例

閒來無事瞭解下資料庫 - SQL概述和資料定義 - 1

app開發之原生開發、H5開發和混合開發的區別

python面向物件之類屬性和類方法案例分析

Vue如何使用混合Mixins和外掛開發詳解

VSCode提高 Node 和 Vue 開發效率的外掛推薦

使用Angular9和TypeScript開發RPG遊戲的方法

Random類的概述和基本使用---Java

ArrayList集合概述和基本使用---Java

6.使用者組和許可權管理-案例分析

DockerFile----CMD和ENTRYPOINT區別案例

ActiveMQ概述和安裝

vue 介面請求地址字首本地開發和線上開發設定方式

21.Flume概述和企業開發案例

一、Flume概述

1.1 Flume定義

1.2 Flume的優點

1.3 Flume組成架構

1.4 Flume拓撲結構

1.5 Flume Agent內部原理

1.6 Flume安裝

二、企業開發案例

2.1 監控埠資料

2.2 實時讀取本地檔案到HDFS

2.3 實時讀取目錄檔案到HDFS

2.4 單資料來源多出口案例(選擇器)

2.5 單資料來源多出口案例(Sink組)

2.6 多資料來源彙總案例

相關推薦