【原創】大數據基礎之Spark(9)spark部署方式yarn/mesos
1 下載 https://spark.apache.org/downloads.html
$ wget http://mirrors.shu.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
2 解壓
$ tar xvf spark-2.4.0-bin-hadoop2.7.tgz
$ cd spark-2.4.0-bin-hadoop2.7
3 配置環境變量SPARK_HOME
$ export SPARK_HOME=/path/to/spark-2.4.0-bin-hadoop2.7
4 啟動
以spark-sql為例
4.1 spark on yarn
只需要配置環境變量 HADOOP_CONF_DIR
$ bin/spark-sql --master yarn
更多參數
--deploy-mode cluster
--driver-memory 4g
--driver-cores 1
--executor-memory 2g
--executor-cores 1
--num-executors 1
--queue thequeue
4.2 spark on mesos
$ bin/spark-sql --master mesos://zk://172.19.28.186:2181,172.19.28.188:2181,172.19.28.190:2181/mesos
更多參數
--deploy-mode cluster
--supervise
--executor-memory 20G
--executor-cores 1
--total-executor-cores 100
註意此時沒有--num-executors參數,間接配置方法 --num-executors = --total-executor-cores / --executor-cores
Executor memory: spark.executor.memory
Executor cores: spark.executor.cores
Number of executors: spark.cores.max/spark.executor.cores
註意:spark on yarn 有可能啟動報錯
19/02/25 17:54:20 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
查看nodemanager日誌發現原因
2019-02-25 17:54:19,481 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=48342,containerID=container_1551078668160_0012_02_000001] is running beyond virtual memory limits. Current usage: 380.9 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
需要調整yarn-site.xml配置
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
or
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
</property>
【原創】大數據基礎之Spark(9)spark部署方式yarn/mesos