HIVE 之壓縮格式
阿新 • • 發佈:2019-02-08
中間壓縮就是處理作業map任務和reduce任務之間的資料,對於中間壓縮,最好選擇一個節省CPU耗時的壓縮方式 <property> <name>hive.exec.compress.intermediate</name> <value>true</value> <description> This controls whether intermediate files produced by Hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description> </property> hadoop壓縮有一個預設的壓縮格式,當然可以通過修改mapred.map.output.compression.codec屬性,使用新的壓縮格式,這個變數可以在 mapred-site.xml 中設定或者在 hive-site.xml檔案。 SnappyCodec 是一個較好的壓縮格式,CPU消耗較低。 <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> <description> This controls whether intermediate files produced by Hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description> </property>