hdp2.4整合spark2.X
Hdp2.4整合spark2
整合步驟
1. 從官網下載http://spark.apache.org/downloads.html 下載spark2.3 包
2. 把spark2.3包上傳到需要安裝的機器上。
cd /usr/hdp/2.4.0.0-169
tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz
mv spark-2.3.0-bin-hadoop2.7 spark2
3. 修改spark2 的使用者名稱和使用者組chown -R root:root *
4. 建立軟連線指向spark2實際目錄。
ln -s spark2-client /usr/hdp/2.4.0.0-169/spark2
ln -s spark2-historyserver /usr/hdp/2.4.0.0-169/spark2
ln -s spark2-thriftserver /usr/hdp/2.4.0.0-169/spark2
5. 進入spark2修改conf目錄下的配置檔案。
cd conf
cp spark-env.sh.template spark-env.sh
cp spark-defaults.conf.template spark-defaults.conf
6修改檔案 vi spark-env.sh 。在起檔案末尾新增內容
# Alternate conf dir. (Default: ${SPARK_HOME}/conf)
export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/usr/hdp/current/spark2-historyserver/conf}
# Where log files are stored.(Default:${SPARK_HOME}/logs)
#export SPARK_LOG_DIR=${SPARK_HOME:-/usr/hdp/current/spark2-historyserver}/logs
export SPARK_LOG_DIR=/var/log/spark2
# Where the pid file is stored. (Default: /tmp)
export SPARK_PID_DIR=/var/run/spark2
#Memory for Master, Worker and history server (default: 1024MB)
export SPARK_DAEMON_MEMORY=1024m
# A string representing this instance of spark.(Default: $USER)
SPARK_IDENT_STRING=$USER
# The scheduling priority for daemons. (Default: 0)
SPARK_NICENESS=0
export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/current/hadoop-client/conf}
# The java implementation to use.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_60
7修改vi spark-defaults.conf 的配置檔案內容。在檔案結尾新增如下內容
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native
spark.eventLog.dir hdfs:///spark2-history
spark.eventLog.enabled true
# Required: setting this parameter to 'false' turns off ATS timeline server for Spark
spark.hadoop.yarn.timeline-service.enabled false
spark.driver.extraJavaOptions -Dhdp.version=2.4.0.0-169
spark.yarn.am.extraJavaOptions -Dhdp.version=2.4.0.0-169
spark.history.fs.logDirectory hdfs:///spark2-history
#spark.history.kerberos.keytab none
#spark.history.kerberos.principal none
#spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
#spark.history.ui.port 18080
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 200
spark.yarn.executor.memoryOverhead 200
#spark.yarn.historyServer.address sandbox.hortonworks.com:18080
spark.yarn.max.executor.failures 3
spark.yarn.preserve.staging.files false
spark.yarn.queue default
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.submit.file.replication 3
spark.ui.port 4041
8. 在ambari介面修改yarn的引數。
yarn.scheduler.maximum-allocation-mb = 2500MB
yarn.nodemanager.resource.memory-mb = 7800MB
9測試hdp整合spark2
提交job測試
spark2執行測試:
export SPARK_MAJOR_VERSION=2
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--num-executors 3 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
examples/jars/spark-examples*.jar 10
./bin/spark-submit \
--class org.apache.spark.examples.SparkTC \
--master yarn-client \
--num-executors 3 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
examples/jars/spark-examples*.jar 10
10.在ambari頁面yarn 參考job執行狀態。
參考連結:
Ps:
如果遇到HDFS寫許可權問題,可以轉換角色。或者在設定許可權
dfs.permissions.enabled=false
Over
2018.6.11