spark2.1叢集安裝
規劃
cancer01 master/worker
cancer02 worker
cancer03 worker
cancer04 worker
cancer05 worker
準備
su hadoop
安裝scala
每臺機器上
cd /usr/local
wget http://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz
tar zxf scala-2.11.8.tgz
mv scala-2.11.8 scala
chown -R hadoop:hadoop scala
vim /etc/profile
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
source /etc/profile
安裝spark
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.1-bin-hadoop2.7.tgz
tar zxf spark-2.0.1-bin-hadoop2.7.tgz
mv spark-2.0.1-bin-hadoop2.7/usr/local/spark
chown -R hadoop:hadoop spark
vim /etc/profile
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile
配置
cd /usr/local/spark/conf
mv spark-env.sh.template spark-env.sh
vim spark-env.sh
export SCALA_HOME=/usr/local/scala
exportHADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_MASTER_IP=192.168.11.134
export SPARK_MASTER_PORT=12345
exportSPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
複製
在cancer02|03|04|05上建立/usr/local/spark目錄
scp –r spark [email protected]:/usr/local/
scp –r spark [email protected]:/usr/local/
scp –r spark [email protected]:/usr/local/
scp –r spark hado[email protected]:/usr/local/
啟動
$HADOOP_HOME/sbin/start-all.sh
$SPARK_HOME/sbin/start-all.sh
或者
$SPARK_HOME/sbin/start-master.sh
$SPARK_HOME/sbin/start-slaves.sh
驗證
執行
./bin/run-example SparkPi 2>%1 | grep "Piis roughly"
./bin/spark-submitexamples/src/main/python/pi.py 2>%1 | grep "Pi is roughly"
執行(scala python)
./bin/spark-shell
Scala樣例:
val textFile =sc.textFile(“file:///usr/local/spark/README.md”);
textFile.count();
textFile.first();
val linesWithSpark = textFile.filter(line=> line.contains("Spark"));
linesWithSpark.count();
textFile.filter(line =>line.contains("Spark")).count();
配置conf/spark-env.sh
export SPARK_HOME=/var/lib/myspark/spark
export JAVA_HOME=/usr/java/jdk1.7.0_80
exportHADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
exportYARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
exportSPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
SPARK_MASTER_HOST=10.20.24.199
#web頁面埠
SPARK_MASTER_WEBUI_PORT=28686
#Spark的local目錄
SPARK_LOCAL_DIRS=/hadoopdata1/sparkdata/local
#worker目錄
SPARK_WORKER_DIR=/hadoopdata1/sparkdata/work
#Driver記憶體大小
SPARK_DRIVER_MEMORY=4G
#Worker的cpu核數
SPARK_WORKER_CORES=16
#worker記憶體大小
SPARK_WORKER_MEMORY=64g
#Spark的log日誌目錄
SPARK_LOG_DIR=/var/lib/myspark/spark/logs