1. 程式人生 > >Hive專案實戰三

Hive專案實戰三

建立表

這裡總共需要建立4張表,明明只有兩個資料檔案,為什麼要建立4張表呢?因為這裡建立的表要使用orc的壓縮方式,而不使用預設的textfile的方式,orc的壓縮方式要想向表中匯入資料需要使用子查詢的方式匯入,即把從另一張表中查詢到的資料插入orc壓縮格式的表匯中,所以這裡需要四張表,兩張textfile型別的表user和video,兩張orc型別的表user_orc和video_orc

1.先建立textfile型別的表

create table user(
videoId string,
uploader string,
age int,
category array<string>,
length int,
views int,
rate float,
ratings int,
comments int,
relatedId array<string>)
row format delimited
fields terminated by "\t"
collection items terminated by "&"
stored as textfile;
create table video(
uploader string,
videos int,
friends int)
row format delimited
fields terminated by "\t"
stored as textfile;

向兩張表中匯入資料,從hdfs中匯入

load data inpath '資料檔案在hdfs中的位置' into table user;

2.建立兩張orc型別的表

create table user_orc(
videoId string,
uploader string,
age int,
category array<string>,
length int,
views int,
rate float,
ratings int,
comments int,
relatedId array<string>)
clustered by (uploader) into 8 buckets
row format delimited fields terminated by "\t"
collection items terminated by "&"
stored as orc;
create table video_orc(
uploader string,
videos int,
friends int)
clustered by (uploader) into 24 buckets
row format delimited
fields terminated by "\t"
stored as orc;

向兩張表中匯入資料

insert into table user_orc select *from user;
insert into table video_orc select *from video;

這時候資料就載入到兩張表中了,可以進行簡單的檢視

select *from user_orc limit 10;
select *from video_orc limit 10;