cool-2018-10-21-centos7搭建spark1.5叢集
搭建spark叢集
前提: 配置好hadoop環境變數
時間同步
service iptables stop
ntpdate 0.asia.pool.ntp.org
解壓spark壓縮包
tar -zxvf spark-1.5.1-bin-hadoop2.4.tgz
移動到home目錄下
mv spark-1.5.1-bin-hadoop2.4 /home/spark-1.5
cd /home/spark-1.5/conf
cp spark-env.sh.template spark-env.sh
將下面的內容複製到spark-env.sh中
Yarn叢集:
需要的配置項
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/home/spark-1.5
export SPARK_JAR=/home/spark-1.5/lib/spark-assembly-1.5.1-hadoop2.4.0.jar
export PATH=$SPARK_HOME/bin:$PATH
所有節點啟動zkServer
zkServer.sh start
hadoop1節點:一定是啟動all,因為spark也要運用到計算
start-all.sh
所有節點
cd /home/spark-1.5
./sbin/start-all.sh 即可啟動spark叢集
Yarn叢集:
client模式:
結果xshell可見:
cd /home/spark-1.5
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 1G --num-executors 1 ./lib/spark-examples-1.5.1-hadoop2.4.0.jar 100
Yarn叢集:
cluster模式:我啟動的是cluster模式
結果hadoop1:8088裡面可見!
cd /home/spark-1.5
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --executor-memory 50M --num-executors 1 ./lib/spark-examples-1.5.1-hadoop2.4.0.jar 100
開啟Scala模式:可以改成local
./bin/spark-shell --master yarn-client
spark-shell 是應用程式
Scala>
讀一個文字,linux系統的本地檔案,完成一個Wordcount示例
val lines = sc.textFile("file:///usr/nginx-1.8/html/index.html")
val words = lines.flatMap(_.split(" "))
val pairs = words.map((_,1))
val results = pairs.reduceByKey(_+_)
方法沒有引數,括號可以不寫,以下這個方法,如果資料量大的話不用,因為會移動資料
results.collect
results.saveAsTextFile("hdfs://hadoop1:8020/20180527")
檢視hadoop檔案中的資料
hdfs dfs -cat /20180527/part-00000
local單機模式:
結果xshell可見:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[1] ./lib/spark-examples-1.5.1-hadoop2.4.0.jar 100
建立topic,並且往topic裡面寫東西
./bin/kafka-topics.sh --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --topic wordcount --replication-factor 1 --partitions 1 --create
寫資料
./bin/kafka-console-producer.sh --topic wordcount --broker-list hadoop1:9092,hadoop2:9092,hadoop3:9092,hadoop4:9092
檢視topic列表
./bin/kafka-topics.sh --list --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181
叢集中提交作業命令列
./bin/spark-submit --class com.spark.study.streaming.WindowBasedTopWord sparkstreaming.jar
standalone叢集模式:配置HA
需要的配置項
1, slaves檔案
2, spark-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_79
export SPARK_MASTER_IP=hadoop1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1g
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=192.168.25.151:2181,192.168.25.152:2181,192.168.25.153:2181,192.168.25.154:2181"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/home/spark-1.5
export SPARK_JAR=/home/spark-1.5/lib/spark-assembly-1.5.1-hadoop2.4.0.jar
export PATH=$SPARK_HOME/bin:$PATH
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/home/apache-mahout-distribution-0.10.2/lib/*"
先啟動zookeeper叢集
hadoop1節點
cd /home/spark-1.5
./sbin/start-all.sh 即可啟動spark叢集
hadoop1:8080
瀏覽器訪問可見
HA測試
在hadoop2-hadoop4上任意一個節點啟動一個standby-master
cd /home/spark-1.5
./sbin/start-master.sh
standalone叢集:
之client模式:
結果xshell可見:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark001:7077 --executor-memory 1G --total-executor-cores 1 ./lib/spark-examples-1.5.1-hadoop2.4.0.jar 100
standalone叢集:
之cluster模式:
結果spark001:8080裡面可見!
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark001:7077 --deploy-mode cluster --supervise --executor-memory 1G --total-executor-cores 1 ./lib/spark-examples-1.5.1-hadoop2.4.0.jar 100