hadoop hive 壓縮引數測試

阿新 • • 發佈：2019-01-01

hive的壓縮一般分為三類

(1)從hive輸出層面的壓縮

set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;

(2)從mapreduce層面

set mapreduce.map.output.compress=true;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=false;
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;

(3)從hive表結構層面
比如說hive表的格式：text，rcfile,orc等。

建表

  /tmp/lgh/compression

create table c_text
(
  `t` string, 
  `cip` string , 
  `u` string , 
  `ur` string , 
  `ar` string , 
  `ua` string , 
  `pvid` string , 
  `ut` string , 
  `tt` string , 
  `tp` string , 
  `tu` string , 
  `cp` string 
  )
 PARTITIONED BY ( 
  `dt` string)
stored as textfile
location '/tmp/lgh/compression/c_text';



create table c_text_compress
(
  `t` string, 
  `cip` string , 
  `u` string , 
  `ur` string , 
  `ar` string , 
  `ua` string , 
  `pvid` string , 
  `ut` string , 
  `tt` string , 
  `tp` string , 
  `tu` string , 
  `cp` string 
)PARTITIONED BY ( 
  `dt` string)
stored as textfile
location '/tmp/lgh/compression/c_text_compress';





create table c_orc
(
  `t` string, 
  `cip` string , 
  `u` string , 
  `ur` string , 
  `ar` string , 
  `ua` string , 
  `pvid` string , 
  `ut` string , 
  `tt` string , 
  `tp` string , 
  `tu` string , 
  `cp` string 
)PARTITIONED BY ( 
  `dt` string)
stored as orc
location '/tmp/lgh/compression/c_orc';



create table c_orc_compress
(
  `t` string, 
  `cip` string , 
  `u` string , 
  `ur` string , 
  `ar` string , 
  `ua` string , 
  `pvid` string , 
  `ut` string , 
  `tt` string , 
  `tp` string , 
  `tu` string , 
  `cp` string 
)
PARTITIONED BY ( 
  `dt` string)
stored as orc
location '/tmp/lgh/compression/c_orc_compress';


create table c_rcfile_compress
(
  `t` string, 
  `cip` string , 
  `u` string , 
  `ur` string , 
  `ar` string , 
  `ua` string , 
  `pvid` string , 
  `ut` string , 
  `tt` string , 
  `tp` string , 
  `tu` string , 
  `cp` string 
)
PARTITIONED BY ( 
  `dt` string)
stored as rcfile
location '/tmp/lgh/compression/c_rcfile_compress';

一.完全不使用任何壓縮，textfile格式

set hive.exec.compress.intermediate=false;  
set hive.exec.compress.output=false;
set mapreduce.map.output.compress=false;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=false;   
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;
insert overwrite table c_text partition(dt='20170505') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

二.完全不使用任何壓縮，orc格式

set hive.exec.compress.intermediate=false;  
set hive.exec.compress.output=false;
set mapreduce.map.output.compress=false;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=false;   
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;
insert overwrite table c_orc partition(dt='20170505') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

三.使用hive壓縮，textfile

set hive.exec.compress.intermediate=true;   
set hive.exec.compress.output=true;
set mapreduce.map.output.compress=false;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=false;   
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;
insert overwrite table c_text_compress partition(dt='20170505_hive') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

四.使用mapreduce壓縮，textfile

set hive.exec.compress.intermediate=false;  
set hive.exec.compress.output=false;
set mapreduce.map.output.compress=true;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=true;    
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;
insert overwrite table c_text_compress partition(dt='20170505_mr') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

五.同時使用hive和mapreduce壓縮，textfile

set hive.exec.compress.intermediate=true;   
set hive.exec.compress.output=true;
set mapreduce.map.output.compress=true;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=true;    
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;
insert overwrite table c_text_compress partition(dt='20170505_mr_hive') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

六.使用hive壓縮，orc格式(預設的zlib壓縮，hive設定的壓縮不生效)

set hive.exec.compress.intermediate=true;   
set hive.exec.compress.output=true;
set mapreduce.map.output.compress=false;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=false;   
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;
set orc.compress=ZLIB;

insert overwrite table c_orc_compress partition(dt='20170505_hive') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

七.orc格式+NONE

set hive.exec.compress.intermediate=true;   
set hive.exec.compress.output=true;
set mapreduce.map.output.compress=false;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=false;   
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;
set orc.compress=NONE;

insert overwrite table c_orc_compress partition(dt='20170505_none') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

八.orc格式+SPNAY

set hive.exec.compress.intermediate=true;   
set hive.exec.compress.output=true;
set mapreduce.map.output.compress=false;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=false;   
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;
set orc.compress=SNAPPY;

insert overwrite table c_orc_compress partition(dt='20170505_snappy') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

九.rcfile使用hive壓縮

set hive.exec.compress.intermediate=true;   
set hive.exec.compress.output=true;
set mapreduce.map.output.compress=false;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=false;   
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;

insert overwrite table c_rcfile_compress partition(dt='20170505_hive') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

十.rcfile不使用hive壓縮

set hive.exec.compress.intermediate=false;  
set hive.exec.compress.output=false;
set mapreduce.map.output.compress=false;
set mapreduce.map.output.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set mapreduce.output.fileoutputformat.compress=false;   
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.LzoCodec;

insert overwrite table c_rcfile_compress partition(dt='20170505_none') select t,cip,u,ur,ar,ua,pvid,ut,tt,tp,tu,cp from logs.logservice where dt='20170505';

結果：

hadoop fs -du -h /tmp/lgh/compression/c_text/
7.3 G  /tmp/lgh/compression/c_text/dt=20170505

hadoop fs -du -h /tmp/lgh/compression/c_text_compress/
2.8 G  /tmp/lgh/compression/c_text_compress/dt=20170505_hive
7.3 G  /tmp/lgh/compression/c_text_compress/dt=20170505_mr
2.8 G  /tmp/lgh/compression/c_text_compress/dt=20170505_mr_hive

hadoop fs -du -h /tmp/lgh/compression/c_orc/
688.3 M  /tmp/lgh/compression/c_orc/dt=20170505

hadoop fs -du -h /tmp/lgh/compression/c_orc_compress/
688.3 M  /tmp/lgh/compression/c_orc_compress/dt=20170505_hive
2.4 G    /tmp/lgh/compression/c_orc_compress/dt=20170505_none
974.5 M  /tmp/lgh/compression/c_orc_compress/dt=20170505_snappy

hadoop fs -du -h /tmp/lgh/compression/c_rcfile_compress/
2.1 G  /tmp/lgh/compression/c_rcfile_compress/dt=20170505_hive
7.2 G  /tmp/lgh/compression/c_rcfile_compress/dt=20170505_none

結論：

1.hive的壓縮引數對orc格式沒有效果，對text,rcfile格式起作用(but why?個人猜測原因可能是orc不支援lzo壓縮)

set hive.exec.compress.intermediate=true;   
set hive.exec.compress.output=true;

2.mapreduce的壓縮引數對hive任務沒有起作用，以下引數只在單純的mapreduce作業中生效

set mapreduce.map.output.compress=true;
set mapreduce.output.fileoutputformat.compress=true;

3.hive對於orz的壓縮格式，可以設定orc.compress引數或者hive.exec.orc.default.compress來實現。（可選值有NONE, ZLIB, SNAPPY）

測試結果

格式	大小
text	7.3GB
lzo	2.8GB
orc+none	2.4GB
orc+snappy	974.5MB
orc+zlib	688.3MB
rcfile+nono	7.2GB
rcfile+lzo	2.1GB
gzip	1.8GB

hadoop hive 壓縮引數測試

hive的壓縮一般分為三類 (1)從hive輸出層面的壓縮 set hive.exec.compress.intermediate=true; set hive.exec.compress.output=true; (2)從mapreduce層面 se

手把手教你搭建hadoop+hive測試環境(新手向)

接著修改yarn-site.xml<configuration><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle<

Hive壓縮測試

Hive儲存格式操作方式：可以在建表的時候指定表的儲存格式：stored as orc tblproperties ("orc.compress"="SNNAPY")，不指定表屬性則預設壓縮採用ZLIB。比如：create table Addresses ( name st

解決kylin報錯 ClassCastException org.apache.hadoop.hive.ql.exec.ConditionalTask cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask

conf lan exe hive oop ann 關於 .exe map 方法：去掉參數SET hive.auto.convert.join=true; 從配置文件$KYLIN_HOME/conf/kylin_hive_conf.xml刪掉或 kylin-gui的cu

排查Hive報錯：org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array expected

arr .json span 問題 catalog pan 不支持 led open CREATE TABLE json_nested_test ( count string, usage string, pkg map<string

008-Hadoop Hive sql語法詳解3-DML 操作:元數據存儲

pan 查詢寫入所有 not insert語句 int 寫入文件文件系統一、概述 hive不支持用insert語句一條一條的進行插入操作，也不支持update操作。數據是以load的方式加載到建立好的表中。數據一旦導入就不可以修改。 DML包括：INSERT插入

hive報錯 Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections,

pre mysq rom ges character base ddltask for latin 學習hive 使用mysql作為元數據 hive創建數據庫和切換數據庫都是可以的但是創建表就是出問題百度之後發現是編碼問題特別記錄一下~~~ 1.報錯前如圖:

22-hadoop-hive搭建

外部過濾更改會同 ive client模式 get share dfs 1, hive簡介 hive是基於Hadoop的一個數據倉庫工具，可以將結構化的數據文件映射為一張數據庫表，並提供簡單的sql查詢功能，可以將sql語句轉換為MapReduce任務進行運行。其優

hadoop環境搭建與測試

sla pre person n! count track 查看 4.2 lin 搭建參看： http://blog.csdn.net/w13770269691/article/details/16883663/ 查看集群狀態： [[email pr

hive壓縮

sequence record bsp zip 而在 cfi file color cef 1. 常用 rcfile + gzip parquet + snappy 2. 壓縮比，參考 TextFile默認格式，加載速度最快，可以采用Gzip進

shematool -initschema -dbtype mysql error org.apache.hadoop.hive.metastore.hivemetaexception:Failed to get schema version

hang my.cnf blog address com rest chang init edit 命令：schematool -initSchema -dbType mysql Fix the issue: edit /etc/mysql/my.cnf change b

hadoop hive 壓縮引數測試

建表

結果：

結論：

測試結果

hadoop hive 壓縮引數測試

手把手教你搭建hadoop+hive測試環境(新手向)

Hive壓縮測試

解決kylin報錯 ClassCastException org.apache.hadoop.hive.ql.exec.ConditionalTask cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask

排查Hive報錯：org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array expected

008-Hadoop Hive sql語法詳解3-DML 操作:元數據存儲

hive報錯 Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections,

22-hadoop-hive搭建

hadoop環境搭建與測試

hive壓縮

shematool -initschema -dbtype mysql error org.apache.hadoop.hive.metastore.hivemetaexception:Failed to get schema version

017-Hadoop Hive sql語法詳解7-去重排序、數據傾斜

016-Hadoop Hive sql語法詳解6-job輸入輸出優化、數據剪裁、減少job數、動態分區

Error, return code 1 from org.apache.hadoop.hive.

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask

hive中刪除表的錯誤Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/util/JSON

利用JUnit實現對hadoop中javaAPI的測試

一、hadoop單節點安裝測試

ubuntu系統的mysql+hadoop+hive環境搭建

hadoop hive 壓縮引數測試

建表

結果：

結論：

測試結果

相關推薦