Tez優化引數設定

阿新 • • 發佈：2019-02-06

Tez記憶體優化

1、AM、Container大小設定

tez.am.resource.memory.mb

引數說明：Set tez.am.resource.memory.mb tobe the same as yarn.scheduler.minimum-allocation-mb the YARNminimum container size.

hive.tez.container.size

引數說明：Set hive.tez.container.size to be the same as or a small multiple(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb

but NEVER more than yarn.scheduler.maximum-allocation-mb.

2、AM、Container JVM引數設定

tez.am.launch.cmd-opts

預設值：80%*tez.am.resource.memory.mb

引數說明：一般不需要調整

hive.tez.java.ops

預設值：80%*hive.tez.container.size

引數說明：Hortonworks建議“–server –Djava.net.preferIPv4Stack=true–XX:NewRatio=8 –XX:+UseNUMA –XX:UseG1G”

tez.container.max.java.heap.fraction

預設值：0.8

引數說明：task\AM佔用JVM Xmx的比例，該引數建議調整，需根據具體業務情況修改；

3、Hive記憶體Map Join引數設定

tez.runtime.io.sort.mb

預設值：100

引數說明：輸出排序需要的記憶體大小。建議值：40%*hive.tez.container.size，一般不超過2G；

hive.auto.convert.join.noconditionaltask

預設值：true

引數說明：是否將多個mapjoin合併為一個，使用預設值

hive.auto.convert.join.noconditionaltask.size

預設值：

引數說明：多個mapjoin轉換為1個時，所有小表的檔案大小總和的最大值，這個值只是限制輸入的表文件的大小，並不代表實際mapjoin時hashtable的大小。建議值：1/3* hive.tez.container.size

tez.runtime.unordered.output.buffer.size-mb

預設值：100

引數說明：Size of the buffer to use if not writing directly to disk.。建議值：10%* hive.tez.container.size

4、Container重用設定

tez.am.container.reuse.enabled

預設值：true

引數說明：Container重用開關

Mapper/Reducer優化

1、Mapper數設定

預設值：50*1024*1024

引數說明：Lower bound on thesize (in bytes) of a grouped split, to avoid generating too many small splits.

tez.grouping.max-size

預設值：1024*1024*1024

引數說明：Upper bound on thesize (in bytes) of a grouped split, to avoid generating excessively largesplits.

;

2、Reducer數設定

hive.tez.auto.reducer.parallelism

預設值：false

引數說明：Turn on Tez' autoreducer parallelism feature. When enabled, Hive will still estimate data sizesand set parallelism estimates. Tez will sample source vertices' output sizesand adjust the estimates at runtime as necessary.

建議設定為true.

hive.tex.min.partition.factor

預設值：0.25

引數說明：When auto reducerparallelism is enabled this factor will be used to put a lower limit to thenumber of reducers that Tez specifies.

hive.tez.max.partition.factor

預設值：2.0

引數說明：When auto reducerparallelism is enabled this factor will be used to over-partition data inshuffle edges.

hive.exec.reducers.bytes.per.reducer

預設值：256,000,000

引數說明：Sizeper reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if theinput size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later thedefault is 256 MB, that is, if the input size is 1 GB then 4 reducers willbe used.

以下公式確認Reducer個數：

Max(1, Min(hive.exec.reducers.max [1009], ReducerStage estimate/hive.exec.reducers.bytes.per.reducer))x hive.tez.max.partition.factor [2]

3、Shuffle引數設定

tez.shuffle-vertex-manager.min-src-fraction

預設值：0.25

引數說明：thefraction of source tasks which should complete before tasks for the currentvertex are scheduled.

tez.shuffle-vertex-manager.max-src-fraction

預設值：0.75

引數說明：oncethis fraction of source tasks have completed, all tasks on the current vertexcan be scheduled. Number of tasks ready for scheduling on the current vertexscales linearly between min-fraction and max-fraction.

例子：

hive.exec.reducers.bytes.per.reducer=1073741824;// 1gb

tez.shuffle-vertex-manager.min-src-fraction=0.25；

tez.shuffle-vertex-manager.max-src-fraction=0.75；

This indicates thatthe decision will be made between 25% of mappers finishing and 75% of mappersfinishing, provided there's at least 1Gb of data being output (i.e if 25% ofmappers don't send 1Gb of data, we will wait till at least 1Gb is sent out).

Tez優化引數設定

Tez記憶體優化

Mapper/Reducer優化

Tez優化引數設定

Mysql配置檔案引數設定及優化

[Keras] SGD 隨機梯度下降優化器引數設定

JAVA效能優化—IBM JDK JVM引數設定

X264編碼---基本引數設定

3.MySQL快速匯入資料LOAD DATA INFILE（帶詳細優化引數）

pytorch遷移學習中parameters requires_grad=False和optimizer優化引數的探討

學習筆記1:深入理解Java虛擬機器——JVM高階特性與最佳實踐_OOM(記憶體溢位)_虛擬機器引數設定_MAT

request 裡面引數設定 (有空瞄下)

linux 核心引數設定 - sysctl

Matplotlib命令與格式：tick_params引數設定

JAVA——JVM引數設定規則以及引數含義

MySQL wait_timeout引數設定與網上常見錯誤小糾

unity RectTransform的引數設定

JAVA jvm引數設定

JIRA應用的記憶體引數設定不當+容器沒有對資源進行限制導致服務掛掉的例子

MySQL的常見儲存引擎介紹與引數設定調優（轉載）

解決 Flask 專案無法用 .env 檔案中解析的引數設定環境變數的錯誤

關於執行緒池ThreadPoolExecutor引數設定那些事

caffe 超引數設定

Tez優化引數設定

Tez記憶體優化

Mapper/Reducer優化

相關推薦