Hive 載入HDFS資料建表, 掛載分割槽遇到問題及解決方法
阿新 • • 發佈:2018-11-22
1.建立臨時表:
CREATE EXTERNAL TABLE IF NOT EXISTS tmp.tmp_tb_jinritoutiao_log
(
content string COMMENT 'json內容格式'
)
COMMENT '今日頭條視訊內容'
PARTITIONED BY (`day` string)
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/datastream/portal/jinritoutiao/video/';
2.載入HDFS資料
alter table tmp.tmp_tb_jinritoutiao_log add partition(day='20180810') location '/data/jinritoutiao/video/2018-08-10';
問題: 第一次載入時報錯:
ValidationFailureSemanticException table is not partitioned but partition spec exists
意思是建的表不是分割槽表, 但明明加了day的分割槽,不知為何; 嘗試很多次, 最終給day加了引號, 才解決問題..
PARTITIONED BY (`day` string)
3.將已有的資料新增到對應分割槽當中
alter table tmp.tmp_tb_jinritoutiao_log add partition(day='20180810') location '/datastream/portal/jinritoutiao/video/2018-08-10';
4.根據需求建立新表, 並將log中的一列解析拆分, 拆入新表當中
CREATE EXTERNAL TABLE IF NOT EXISTS tmp.tmp_jinritoutiao_video
(
id string comment'' ,
class string comment'',
userId string comment'')
partitioned by (day string comment '分割槽欄位')
STORED AS ORC
location '/user/portal/tmp_jinritoutiao_video';
insert overwrite table tmp.tmp_jinritoutiao_video partition (day='20180810')
select
get_json_object(content,'$.id') as id,
get_json_object(content,'$.class') as class,
get_json_object(userId,'$.class') as user_id
from tmp.tmp_tb_jinritoutiao_log where day='20180810' limit 10
5.done