hive 壓縮 最終結果 中間結果
阿新 • • 發佈:2019-02-06
1.hive壓縮
hive>set mapred.output.compress=true;
hive> set mapred.compress.map.output=true;hive> set hive.exec.compress.output=true;
hive> set mapred.map.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec;
hive> set hive.exec.compress.intermediate=true;
hive> set io.compression.codecs=org.apache.hadoop.io.compress.BZip2Codec;
hive> SET io.seqfile.compression.type=BLOCK;
最後hive表資料是.bz2字尾
奇怪現象true false引數在sql指令碼中使用可以起作用,而mapred.map.output.compression.codec不起作用,需要在hive的xml中配置。
2.mapreduce壓縮
conf.setBoolean("mapred.output.compress", true);
conf.setClass("mapred.output.compression.codec", BZip2Codec.class, CompressionCodec.class);
壓縮後有字尾
3.hive壓縮後的表,可以用使用sql+python呼叫,資料會自動解壓。
說明:
最終的結果資料開啟壓縮:
<property>
<name>hive.exec.compress.output</name>
<value>true</value>
<description> This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
</property>
中間的結果資料是否壓縮,當sql生成多個MR,最後mr輸出不壓縮,之前MR的結果資料壓縮。
<property>
<name>hive.exec.compress.intermediate</name>
<value>true</value>
<description> This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
</property>
<property>
<name>hive.intermediate.compression.codec</name>
<value>org.apache.hadoop.io.compress.LzoCodec</value>
</property>