Flume案例：本地檔案到HDFS

阿新 • • 發佈：2020-09-08

1）案例需求：實時監控Hive日誌，並上傳到HDFS中

2）需求分析：

3）實現步驟：

Flume要想將資料輸出到HDFS，必須持有Hadoop相關jar包

將commons-configuration-1.6.jar、

hadoop-auth-2.7.2.jar、

hadoop-common-2.7.2.jar、

hadoop-hdfs-2.7.2.jar、

commons-io-2.4.jar、

htrace-core-3.1.0-incubating.jar拷貝到/opt/module/flume/lib檔案夾下。

提示：標紅的jar為1.99版本flume必須引用的jar。其他版本可以不引用。

建立flume-file-hdfs.conf檔案

[jason@hadoop102 job]$ vim flume-file-hdfs.conf

新增如下內容

# Name the components on this agent

a2.sources = r2   #定義source

a2.sinks = k2     #定義sink

a2.channels = c2  #定義channel

 

# Describe/configure the source

a2.sources.r2.type = exec  #定義source型別為exec可執行命令

a2.sources.r2.command  
= tail -F /opt/module/hive/logs/hive.log

a2.sources.r2.shell = /bin/bash -c     #執行shell指令碼的絕對路徑

 

# Describe the sink

a2.sinks.k2.type = hdfs

a2.sinks.k2.hdfs.path = hdfs://hadoop102:9000/flume/%Y%m%d/%H

#上傳檔案的字首
a2.sinks.k2.hdfs.filePrefix = logs-

#是否按照時間滾動資料夾
a2.sinks.k2.hdfs.round = true

#多少時間單位建立一個新的資料夾
a2.sinks.k2.hdfs.roundValue  
= 1

#重新定義時間單位
a2.sinks.k2.hdfs.roundUnit = hour

#是否使用本地時間戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true

#積攢多少個Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000

#設定檔案型別，可支援壓縮
a2.sinks.k2.hdfs.fileType = DataStream

#多久生成一個新的檔案
a2.sinks.k2.hdfs.rollInterval = 600

#設定每個檔案的滾動大小
a2.sinks.k2.hdfs.rollSize = 134217700

#檔案的滾動與Event數量無關
a2.sinks.k2.hdfs.rollCount = 0

#最小冗餘數
a2.sinks.k2.hdfs.minBlockReplicas = 1

 

# Use a channel which buffers events in memory

a2.channels.c2.type = memory

a2.channels.c2.capacity = 1000

a2.channels.c2.transactionCapacity = 100

 

# Bind the source and sink to the channel

a2.sources.r2.channels = c2

a2.sinks.k2.channel = c2

注：要想讀取Linux系統中的檔案，就得按照Linux命令的規則執行命令。由於hive日誌在Linux系統中所以讀取檔案的型別選擇：exec即execute執行的意思。表示執行Linux命令來讀取檔案。

4）執行監控配置

[jason@hadoop102 flume]$ bin/flume-ng agent --conf conf/ --name a2 --conf-file job/flume-file-hdfs.conf

5）開啟hadoop和hive並操作hive產生日誌

[jason@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh

[jason@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

 

[jason@hadoop102 hive]$ bin/hive

hive (default)>

6）在HDFS上檢視檔案

Flume案例：本地檔案到HDFS

1）案例需求：實時監控Hive日誌，並上傳到HDFS中 2）需求分析： 3）實現步驟：

Flume案例：目錄檔案到HDFS

實時讀取目錄檔案到HDFS案例 1）案例需求：使用flume監聽整個目錄的檔案 2）需求分析：

flume案例：netcat-console

flume案例：netcat-console Flume 1.8 1、一個hello world案例。　　 # example.conf: 一個單節點的 Flume 例項配置

Flume案例：多資料來源彙總案例

多資料來源彙總案例 1）案例需求： hadoop103上的flume-1監控檔案hive.log， hadoop104上的flume-2監控某一個埠的資料流，

[Python] 自動化測試案例：從檔案中讀取用例資料，進行介面功能測試（Selenium）

目錄準備工作待測功能資料檔案測試環境測試程式碼準備工作待測功能取值範圍

DataX案例：讀取MongoDB的資料匯入到HDFS

讀取MongoDB的資料匯入到HDFS 1）編寫配置檔案 [jason@hadoop102 datax]$ vim job/mongdb2hdfs.json

DataX案例：讀取Oracle的資料存入HDFS中

讀取Oracle的資料存入HDFS中 1）編寫配置檔案 [oracle@hadoop102 datax]$ vim job/oracle2hdfs.json

Sqoop案例-匯入：RDBMS到HDFS

Sqoop的簡單使用案例 4.1.1、RDBMS到HDFS 1) 確定Mysql服務開啟正常 2) 在Mysql中新建一張表並插入一些資料

Android studio：關於訪問模擬器本地檔案遇到的幾個問題

問題一：如何在Android Studio中，將檔案上傳到模擬器：（參考：https://jingyan.baidu.com/article/d169e1861e8d9c436611d8fa.html ）

H50065：manifest 快取本地檔案

1，html程式碼 <!doctype html> <html> <head> <meta name=\'viewport\' content=\'width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no\' />

FileFilter過濾器的原理和使用。案例：將.java結尾的檔案輸出

package com.chunzhi.Test03Filter; import java.io.File; public class Test01Filter { public static void main(String[] args) {

cdh 配置檔案更新失敗僅完成 1/6 個步驟。首個失敗：在服務 HDFS 上執行命令 Deploy Client Configuration 失敗

環境： cdh5.15 parcels部署登陸對應節點，檢視對應的日誌 tail -10000f /var/run/cloudera-scm-agent/process/ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_626044950178360505//logs/stderr.log