Hbase中的列式表對映到hive的外表
在做資料ETL中,可能原始資料在列式儲存Hbase中,這個時候,如果我們想清洗資料,可以考慮把Hbase表對映為Hive的外表,然後使用Hive的HQL來清除處理資料,具體過程參考下面例子:
步驟
1,建立Hbase表
2,對映Hive表
步驟一
說明:cf 列簇名,只put了少量測試列
create ‘cofeed_info’,{NAME => ‘cf’, REPLICATION_SCOPE => 1}
put ‘cofeed_info’, ‘100001’, ‘cf:id’, ‘101’
put ‘cofeed_info’, ‘100001’, ‘cf:title’, ‘這是測試用的資料’
put ‘cofeed_info’, ‘100001’, ‘cf:insert_time’, ‘45679848161564’
步驟二
說明:雖然很多列在Hbase表中當前並沒有,不過沒關係,:key表示rowkey
CREATE EXTERNAL TABLE cofeed_info
(
rowkey string,
id string,
title string,
tourl string,
content string,
data_provider string,
b_class string,
b_catogory string,
source string,
insert_time timestamp,
dt string
) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH
SERDEPROPERTIES (“hbase.columns.mapping”=
“:key,
cf:id,
cf:title,
cf:tourl,
cf:content,
cf:data_provider,
cf:b_class,
cf:b_catogory,
cf:source,
cf:insert_time,
cf:dt”) TBLPROPERTIES (“hbase.table.name” = “cofeed_info”);
結果
hive> desc cofeed_info;
OK
rowkey string from deserializer
id string from deserializer
title string from deserializer
tourl string from deserializer
content string from deserializer
data_provider string from deserializer
b_class string from deserializer
b_catogory string from deserializer
source string from deserializer
insert_time timestamp from deserializer
dt string from deserializer
說明:Hbase中沒有的列簇為NULL了.
hive> select * from cofeed_info;
OK
100001 101 這是測試用的資料 NULL NULL NULL NULL NULL NULL NULL NULL