flume 對日誌監控,和日誌資料正則清洗最後實時集中到hbase中的示例
阿新 • • 發佈:2019-01-05
今天學習了flume的簡單用法,順便思考了一下,對標準日誌格式的資料實時清洗和集中儲存
今天介紹一下運用正則表示式對資料進行實時清洗,將資料儲存到hbase中,前面簡單的不分列的儲存,就直接貼程式碼
1、運用flume的HBasesink--SimpleHbaseEventSerializer
程式碼如下
###define agent a5_hbase.sources = r5 a5_hbase.channels = c5 a5_hbase.sinks = k5 #define sources a5_hbase.sources.r5.type = exec a5_hbase.sources.r5.command = tail -f /opt/module/cdh/hive-0.13.1-cdh5.3.6/logs/hive.log a5_hbase.sources.r5.checkperiodic = 50 #define channels a5_hbase.channels.c5.type = file a5_hbase.channels.c5.checkpointDir = /opt/module/cdh/flume-1.5.0-cdh5.3.6/flume_file/checkpoint a5_hbase.channels.c5.dataDirs = /opt/module/cdh/flume-1.5.0-cdh5.3.6/flume_file/data #define sinks a5_hbase.sinks.k5.type = org.apache.flume.sink.hbase.AsyncHBaseSink a5_hbase.sinks.k5.table = flume_table5 a5_hbase.sinks.k5.columnFamily = hivelog_info a5_hbase.sinks.k5.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer a5_hbase.sinks.k5.serializer.payloadColumn = hiveinfo #bind a5_hbase.sources.r5.channels = c5 a5_hbase.sinks.k5.channel = c5
可以對hive.log的日誌更新,儲存到hiveinfo這個列中,具體儲存的值,可以執行幾個hql,看看具體儲存的值,以及對value是按照怎樣的格式進行取值儲存
2、利用RegexHbaseEventSerializer序列化模式,對日誌資料進行解析
日誌格式如下
抽取其中兩條做實驗
"27.38.5.159" "-" "31/Aug/2015:00:04:37 +0800" "GET /course/view.php?id=27 HTTP/1.1" "303" "440" - "http://www.ibeifeng.com/user.php?act=mycourse" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" "-" "learn.ibeifeng.com" "27.38.5.159" "-" "31/Aug/2015:00:04:37 +0800" "GET /login/index.php HTTP/1.1" "303" "465" - "http://www.ibeifeng.com/user.php?act=mycourse" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" "-" "learn.ibeifeng.com"
針對這個日誌資料我們正則表示式如下所示
###define agent a6_hive.sources = r6 a6_hive.channels = c6 a6_hive.sinks = k6 #define sources a6_hive.sources.r6.type = exec a6_hive.sources.r6.command = tail -f /opt/module/cdh/hive-0.13.1-cdh5.3.6/logs/hive.log a6_hive.sources.r6.checkperiodic = 50 #define channels a6_hive.channels.c6.type = file a6_hive.channels.c6.checkpointDir = /opt/module/cdh/flume-1.5.0-cdh5.3.6/flume_file/checkpoint a6_hive.channels.c6.dataDirs = /opt/module/cdh/flume-1.5.0-cdh5.3.6/flume_file/data #define sinks a6_hive.sinks.k6.type = org.apache.flume.sink.hbase.HBaseSink #a6_hive.sinks.k6.type = hbase a6_hive.sinks.k6.table = flume_table_regx2 a6_hive.sinks.k6.columnFamily = log_info a6_hive.sinks.k6.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer a6_hive.sinks.k6.serializer.regex = \\"(.*?)\\"\\ \\"(.*?)\\"\\ \\"(.*?)\\"\\ \\"(.*?)\\"\\ \\"(.*?)\\"\\ \\"(.*?)\\"\\ (.*?)\\ \\"(.*?)\\"\\ \\"(.*?)\\"\\ \\"(.*?)\\"\\ \\"(.*?)\\" a6_hive.sinks.k6.serializer.colNames = ip,x1,date_now,web,statu1,statu2,user,web2,type,user2,web3 #bind a6_hive.sources.r6.channels = c6 a6_hive.sinks.k6.channel = c6
我們先啟動flume的這個agent,開啟監控頁面
bin/flume-ng agent \
--name a6_hive \
--conf conf \
--conf-file conf/a6_hive.conf \
-Dflume.root.logger=DEBUG,console
對hive.log資料進行更新
echo '"27.38.5.159" "-" "31/Aug/2015:00:04:37 +0800" "GET /login/index.php HTTP/1.1" "303" "465" - "http://www.ibeifeng.com/user.php?act=mycourse" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" "-" "learn.ibeifeng.com"' >> /opt/module/cdh/hive-0.13.1-cdh5.3.6/logs/hive.log
我們可以看到監控頁面的情況,在對資料進行解析
隨後我們檢視,hbase的表中的資料,如下,可以看到得到的資料是符合我們的要求的,只要在對其中的資料進行函式分析,就可以得到我們想要的結果,這個當然在 hive中完成,可以參考如下
https://blog.csdn.net/maketubu7/article/details/80513072
以上。