hive中rcfile orcfile和parquetfile對比
阿新 • • 發佈:2019-02-16
一.開始建立三種格式的表:
create table rcfile (name string,age int,addr string,desc string) row format delimited fields terminated by ',' stored as rcfile;
create table rcfile (name string,age int,addr string,desc string) row format delimited fields terminated by ',' stored as orcfile;
create table rcfile (name string,age int ,addr string,desc string) row format delimited fields terminated by ',' stored as parquetfile;
二.用shell生成1000W條資料,以”,”隔開,並且load data overwrite到 textfile表裡面
三.分別把資料insert到三個表中:
insert into rcfile select * from lijie.textfile;
insert into orcfile select * from lijie.textfile;
insert into parquetfile select * from lijie.textfile;
四.開始測試
1.select * from xxfile rcfile Time taken: 47.604 seconds, Fetcheds 13756317 row(s) orcfile Time taken: 2.563 seconds, Fetcheds 13756317 row(s) parquetfile Time taken: 43.454 seconds, Fetcheds 13756317 row(s) 結論orcfile 小於 rcfile 小於 parquet 2.select name,addr from xxfile rcfile Time taken: 36.937 seconds, Fetcheds 13756317 row(s) orcfile Time taken: 2.514 seconds, Fetcheds 13756317 row(s) parquetfile Time taken: 43.454 seconds, Fetcheds 13756317 row(s) 結論orcfile 小於 rcfile 小於 parquet 3.select max(name) from xxfile rcfile Time taken: 34.375 seconds, Fetcheds 13756317 row(s) orcfile Time taken: 30.073 seconds, Fetcheds 13756317 row(s) parquetfile Time taken: 38.352 seconds, Fetcheds 13756317 row(s) 結論orcfile 小於 rcfile 小於 parquet 4.select count(1) from xxfile rcfile Time taken: 32.261 seconds, Fetcheds 13756317 row(s) orcfile Time taken: 28.959 seconds, Fetcheds 13756317 row(s) parquetfile Time taken: 32.265 seconds, Fetcheds 13756317 row(s) 結論orcfile 小於 rcfile=parquet
五.總結
總資料量13756317
列:name,age,addr,desc
orcfile 查詢效果更優,rcfile效果略好於parquetfile