Hive專案實戰三
阿新 • • 發佈:2018-12-09
建立表
這裡總共需要建立4張表,明明只有兩個資料檔案,為什麼要建立4張表呢?因為這裡建立的表要使用orc的壓縮方式,而不使用預設的textfile的方式,orc的壓縮方式要想向表中匯入資料需要使用子查詢的方式匯入,即把從另一張表中查詢到的資料插入orc壓縮格式的表匯中,所以這裡需要四張表,兩張textfile型別的表user和video,兩張orc型別的表user_orc和video_orc
1.先建立textfile型別的表
create table user( videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int, relatedId array<string>) row format delimited fields terminated by "\t" collection items terminated by "&" stored as textfile;
create table video(
uploader string,
videos int,
friends int)
row format delimited
fields terminated by "\t"
stored as textfile;
向兩張表中匯入資料,從hdfs中匯入
load data inpath '資料檔案在hdfs中的位置' into table user;
2.建立兩張orc型別的表
create table user_orc( videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int, relatedId array<string>) clustered by (uploader) into 8 buckets row format delimited fields terminated by "\t" collection items terminated by "&" stored as orc;
create table video_orc(
uploader string,
videos int,
friends int)
clustered by (uploader) into 24 buckets
row format delimited
fields terminated by "\t"
stored as orc;
向兩張表中匯入資料
insert into table user_orc select *from user;
insert into table video_orc select *from video;
這時候資料就載入到兩張表中了,可以進行簡單的檢視
select *from user_orc limit 10;
select *from video_orc limit 10;