1. 程式人生 > >Tez優化引數設定

Tez優化引數設定

Tez記憶體優化

1、AM、Container大小設定

tez.am.resource.memory.mb

引數說明:Set tez.am.resource.memory.mb tobe the same as yarn.scheduler.minimum-allocation-mb the YARNminimum container size.

hive.tez.container.size

引數說明:Set hive.tez.container.size to be the same as or a small multiple(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb

 but NEVER more than yarn.scheduler.maximum-allocation-mb.

2、AM、Container JVM引數設定

tez.am.launch.cmd-opts

預設值:80%*tez.am.resource.memory.mb

引數說明:一般不需要調整

hive.tez.java.ops

預設值:80%*hive.tez.container.size

       引數說明:Hortonworks建議“–server –Djava.net.preferIPv4Stack=true–XX:NewRatio=8 –XX:+UseNUMA –XX:UseG1G”

tez.container.max.java.heap.fraction

預設值:0.8

       引數說明:task\AM佔用JVM Xmx的比例,該引數建議調整,需根據具體業務情況修改;

3、Hive記憶體Map Join引數設定

tez.runtime.io.sort.mb

預設值:100

引數說明:輸出排序需要的記憶體大小。建議值:40%*hive.tez.container.size,一般不超過2G;

hive.auto.convert.join.noconditionaltask

預設值:true

引數說明:是否將多個mapjoin合併為一個,使用預設值

hive.auto.convert.join.noconditionaltask.size

預設值:

引數說明:多個mapjoin轉換為1個時,所有小表的檔案大小總和的最大值,這個值只是限制輸入的表文件的大小,並不代表實際mapjoin時hashtable的大小。 建議值:1/3* hive.tez.container.size

tez.runtime.unordered.output.buffer.size-mb

預設值:100

引數說明:Size of the buffer to use if not writing directly to disk.。 建議值:10%* hive.tez.container.size

4、Container重用設定

tez.am.container.reuse.enabled

預設值:true

引數說明:Container重用開關

Mapper/Reducer優化

1、Mapper數設定

預設值:50*1024*1024

引數說明:Lower bound on thesize (in bytes) of a grouped split, to avoid generating too many small splits.

tez.grouping.max-size

預設值:1024*1024*1024

引數說明:Upper bound on thesize (in bytes) of a grouped split, to avoid generating excessively largesplits.

;

2、Reducer數設定

hive.tez.auto.reducer.parallelism

預設值:false

引數說明:Turn on Tez' autoreducer parallelism feature. When enabled, Hive will still estimate data sizesand set parallelism estimates. Tez will sample source vertices' output sizesand adjust the estimates at runtime as necessary.

建議設定為true.

hive.tex.min.partition.factor

預設值:0.25

引數說明:When auto reducerparallelism is enabled this factor will be used to put a lower limit to thenumber of reducers that Tez specifies.

hive.tez.max.partition.factor

預設值:2.0

引數說明:When auto reducerparallelism is enabled this factor will be used to over-partition data inshuffle edges.

hive.exec.reducers.bytes.per.reducer

預設值:256,000,000

引數說明:Sizeper reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if theinput size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later thedefault is 256 MB, that is, if the input size is 1 GB then 4 reducers willbe used.

以下公式確認Reducer個數:

Max(1, Min(hive.exec.reducers.max [1009], ReducerStage estimate/hive.exec.reducers.bytes.per.reducer))x hive.tez.max.partition.factor [2]

3、Shuffle引數設定

tez.shuffle-vertex-manager.min-src-fraction

預設值:0.25

引數說明:thefraction of source tasks which should complete before tasks for the currentvertex are scheduled.

tez.shuffle-vertex-manager.max-src-fraction

預設值:0.75

引數說明:oncethis fraction of source tasks have completed, all tasks on the current vertexcan be scheduled. Number of tasks ready for scheduling on the current vertexscales linearly between min-fraction and max-fraction.

例子:

hive.exec.reducers.bytes.per.reducer=1073741824;// 1gb

tez.shuffle-vertex-manager.min-src-fraction=0.25

tez.shuffle-vertex-manager.max-src-fraction=0.75

This indicates thatthe decision will be made between 25% of mappers finishing and 75% of mappersfinishing, provided there's at least 1Gb of data being output (i.e if 25% ofmappers don't send 1Gb of data, we will wait till at least 1Gb is sent out).