1. 程式人生 > >hdp2.4整合spark2.X

hdp2.4整合spark2.X

Hdp2.4整合spark2

整合步驟

1. 從官網下載http://spark.apache.org/downloads.html  下載spark2.3

2. spark2.3包上傳到需要安裝的機器上。

cd  /usr/hdp/2.4.0.0-169

tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz

mv spark-2.3.0-bin-hadoop2.7  spark2

3. 修改spark2 的使用者名稱和使用者組chown -R root:root *

4. 建立軟連線指向spark2實際目錄。

ln -s  spark2-client  /usr/hdp/2.4.0.0-169/spark2

ln -s  spark2-historyserver  /usr/hdp/2.4.0.0-169/spark2

ln -s  spark2-thriftserver  /usr/hdp/2.4.0.0-169/spark2

5. 進入spark2修改conf目錄下的配置檔案。

cd conf

cp spark-env.sh.template spark-env.sh

cp spark-defaults.conf.template spark-defaults.conf

6修改檔案 vi   spark-env.sh  。在起檔案末尾新增內容

# Alternate conf dir. (Default: ${SPARK_HOME}/conf)

export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/usr/hdp/current/spark2-historyserver/conf}

# Where log files are stored.(Default:${SPARK_HOME}/logs)

#export SPARK_LOG_DIR=${SPARK_HOME:-/usr/hdp/current/spark2-historyserver}/logs

export SPARK_LOG_DIR=/var/log/spark2

# Where the pid file is stored. (Default: /tmp)

export SPARK_PID_DIR=/var/run/spark2

#Memory for Master, Worker and history server (default: 1024MB)

export SPARK_DAEMON_MEMORY=1024m

# A string representing this instance of spark.(Default: $USER)

SPARK_IDENT_STRING=$USER

# The scheduling priority for daemons. (Default: 0)

SPARK_NICENESS=0

export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/current/hadoop-client/conf}

# The java implementation to use.

export JAVA_HOME=/usr/jdk64/jdk1.8.0_60

7修改vi  spark-defaults.conf  的配置檔案內容。在檔案結尾新增如下內容

spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native

spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native

spark.eventLog.dir hdfs:///spark2-history

spark.eventLog.enabled true

# Required: setting this parameter to 'false' turns off ATS timeline server for Spark

spark.hadoop.yarn.timeline-service.enabled false

spark.driver.extraJavaOptions -Dhdp.version=2.4.0.0-169

spark.yarn.am.extraJavaOptions -Dhdp.version=2.4.0.0-169

spark.history.fs.logDirectory hdfs:///spark2-history

#spark.history.kerberos.keytab none

#spark.history.kerberos.principal none

#spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider

#spark.history.ui.port 18080

spark.yarn.containerLauncherMaxThreads 25

spark.yarn.driver.memoryOverhead 200

spark.yarn.executor.memoryOverhead 200

#spark.yarn.historyServer.address sandbox.hortonworks.com:18080

spark.yarn.max.executor.failures 3

spark.yarn.preserve.staging.files false

spark.yarn.queue default

spark.yarn.scheduler.heartbeat.interval-ms 5000

spark.yarn.submit.file.replication 3

spark.ui.port 4041

8. ambari介面修改yarn的引數。

yarn.scheduler.maximum-allocation-mb = 2500MB

yarn.nodemanager.resource.memory-mb = 7800MB

9測試hdp整合spark2

提交job測試

spark2執行測試:

export SPARK_MAJOR_VERSION=2

./bin/spark-submit \

    --class org.apache.spark.examples.SparkPi \

    --master yarn-client \

    --num-executors 3 \

    --driver-memory 512m \

    --executor-memory 512m \

    --executor-cores 1 \

    examples/jars/spark-examples*.jar 10    

./bin/spark-submit \

    --class org.apache.spark.examples.SparkTC \

    --master yarn-client \

    --num-executors 3 \

    --driver-memory 512m \

    --executor-memory 512m \

    --executor-cores 1 \

examples/jars/spark-examples*.jar 10

10.ambari頁面yarn  參考job執行狀態。

參考連結:

Ps:

如果遇到HDFS寫許可權問題,可以轉換角色。或者在設定許可權

dfs.permissions.enabled=false

Over

                                        2018.6.11