入門大資料---Hive計算引擎Tez簡介和使用
一、前言
Hive預設計算引擎時MR,為了提高計算速度,我們可以改為Tez引擎。至於為什麼提高了計算速度,可以參考下圖:
用Hive直接編寫MR程式,假設有四個有依賴關係的MR作業,上圖中,綠色是Reduce Task,雲狀表示寫遮蔽,需要將中間結果持久化寫到HDFS。
Tez可以將多個有依賴的作業轉換為一個作業,這樣只需寫一次HDFS,且中間節點較少,從而大大提升作業的計算效能。
二、安裝包準備
1)下載tez的依賴包:http://tez.apache.org
2)拷貝apache-tez-0.9.1-bin.tar.gz到hadoop102的/opt/module目錄
[[email protected] module]$ ls
apache-tez-0.9.1-bin.tar.gz
3)解壓縮apache-tez-0.9.1-bin.tar.gz
[[email protected] module]$ tar -zxvf apache-tez-0.9.1-bin.tar.gz
4)修改名稱
[[email protected] module]$ mv apache-tez-0.9.1-bin/ tez-0.9.1
三、在Hive中配置Tez
1)進入到Hive的配置目錄:/opt/module/hive/conf
[[email protected] conf]$ pwd
/opt/module/hive/conf
2)在hive-env.sh檔案中新增tez環境變數配置和依賴包環境變數配置
[[email protected] conf]$ vim hive-env.sh
新增如下配置
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/opt/module/hadoop-2.7.2
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/module/hive/conf
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export TEZ_HOME=/opt/module/tez-0.9.1 #是你的tez的解壓目錄
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS
3)在hive-site.xml檔案中新增如下配置,更改hive計算引擎
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
四、配置Tez
1)在Hive的/opt/module/hive/conf下面建立一個tez-site.xml檔案
[[email protected] conf]$ pwd
/opt/module/hive/conf
[[email protected] conf]$ vim tez-site.xml
新增如下內容
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name> <value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
<name>tez.lib.uris.classpath</name> <value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>tez.history.logging.service.class</name> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
</configuration>
五、上傳Tez到叢集
1)將/opt/module/tez-0.9.1上傳到HDFS的/tez路徑
[[email protected] conf]$ hadoop fs -mkdir /tez
[[email protected] conf]$ hadoop fs -put /opt/module/tez-0.9.1/ /tez
[[email protected] conf]$ hadoop fs -ls /tez
/tez/tez-0.9.1
六、測試
1)啟動Hive
[[email protected] hive]$ bin/hive
2)建立LZO表
hive (default)> create table student(
id int,
name string);
3)向表中插入資料
hive (default)> insert into student values(1,"zhangsan");
4)如果沒有報錯就表示成功了
hive (default)> select * from student;
1 zhangsan
七、小結
1)執行Tez時檢查到用過多記憶體而被NodeManager殺死程式問題:
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1546781144082_0005 failed 2 times due to AM Container for appattempt_1546781144082_0005_000002 exited with exitCode: -103
For more detailed output, check application tracking page:http://hadoop103:8088/cluster/app/application_1546781144082_0005Then, click on links to logs of each attempt.
Diagnostics: Container [pid=11116,containerID=container_1546781144082_0005_02_000001] is running beyond virtual memory limits. Current usage: 216.3 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
這種問題是從機上執行的Container試圖使用過多的記憶體,而被NodeManager kill掉了。
[摘錄] The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want.
解決方法:
方案一:或者是關掉虛擬記憶體檢查。我們選這個,修改yarn-site.xml
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
方案二:mapred-site.xml中設定Map和Reduce任務的記憶體配置如下:(value中實際配置的記憶體需要根據自己機器記憶體大小及應用情況進行修改)
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>