Flume hive sink採坑記錄
阿新 • • 發佈:2019-01-02
一、hive sink概述
hive sink與hdfs sink 想對比,hive sink可以近實時的把資料採集到hive表中,hdfs sink要構建hive外部表去關聯hdfs路徑,並且實時性沒辣麼高。
二、注意事項
1、Hive表必須設定bucket並且 stored as orc
2、flume配置的hive列名必須都是小寫,即fieldnames的配置都必須是小寫
3、要手動構建分割槽,即autoCreatePartitions = false
三、Configure hive sink
```a1.sinks.k2.type = hivea1.sinks.k2.channel = c2
#hive元儲存的url
a1.sinks.k2.hive.metastore = thrift://192.168.3.150:9083
#hive表庫名
a1.sinks.k2.hive.database = test
#hive表表名
a1.sinks.k2.hive.table = ods_table
#hive表分割槽,逗號分隔,%Y代表2018,&y代表18
a1.sinks.k2.hive.partition = %Y-%m-%d
#此處自動建立分割槽必須關閉,否則會報錯。使用手動構建分割槽
a1.sinks.k2.autoCreatePartitions = false
#使用本地時間(而不是事件頭的時間戳)
a1.sinks.k2.useLocalTimeStamp = false
#a1.sinks.k2.round = true
#a1.sinks.k2.roundValue = 1
#a1.sinks.k2.roundUnit = minute
a1.sinks.k2.serializer = DELIMITED
#切記切記,一定要記得轉義
a1.sinks.k2.serializer.delimiter = "\\001"
#a1.sinks.k2.serializer.serdeSeparator = "\\001"
#在Flume配置的Hive 列名必須都為小寫字母。Hive表必須設定bucket並且 stored as orc。
a1.sinks.k2.serializer.fieldnames = dstype,id,type,lastuploadtime
```
四、hive
create table test.ods_table(
dsType string ,
id string ,
type string ,
lastUploadTime string
)
partitioned by (dt string)
clustered by (id) into 2 buckets
stored as orc
TBLPROPERTIES ('transactional'='true');
alter table test.ods_table add if not exists partition ( dt='2018-05-18');