Spark筆記整理(一):spark單機安裝部署、分布式集群與HA安裝部署+spark源碼編譯
阿新 • • 發佈:2018-04-24
大數據 Spark [TOC]
spark單機安裝部署
1.安裝scala 解壓:tar -zxvf soft/scala-2.10.5.tgz -C app/ 重命名:mv scala-2.10.5/ scala 配置到環境變量: export SCALA_HOME=/home/uplooking/app/scala export PATH=$PATH:$SCALA_HOME/bin # 雖然spark本身自帶scala,但還是建議安裝 2.安裝單機版spark 解壓:tar -zxvf soft/spark-1.6.2-bin-hadoop2.6.tgz -C app/ 重命名:mv spark-1.6.2-bin-hadoop2.6/ spark 配置到環境變量: export SPARK_HOME=/home/uplooking/app/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin 測試: 運行一個簡單的spark程序 spark-shell sc.textFile("./hello").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect.foreach(println)
完全分布式安裝
修改spark-env.sh 1、cd /home/uplooking/app/spark/conf 2、cp spark-env.sh.template spark-env.sh 3、vi spark-env.sh export JAVA_HOME=/opt/jdk export SCALA_HOME=/home/uplooking/app/scala export SPARK_MASTER_IP=uplooking01 export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=1 export SPARK_WORKER_INSTANCES=1 export SPARK_WORKER_MEMORY=1g export HADOOP_CONF_DIR=/home/uplooking/app/hadoop/etc/hadoop 修改slaves配置文件 添加兩行記錄 uplooking02 uplooking03 部署到uplooking02和uplooking03這兩臺機器上(這兩臺機器需要提前安裝scala) scp -r /home/uplooking/app/scala uplooking@uplooking02:/home/uplooking/app scp -r /home/uplooking/app/scala uplooking@uplooking03:/home/uplooking/app ---- scp -r /home/uplooking/app/spark uplooking@uplooking02:/home/uplooking/app scp -r /home/uplooking/app/spark uplooking@uplooking03:/home/uplooking/app ---在uplooking02和uplooking03上加載好環境變量,需要source生效 scp /home/uplooking/.bash_profile uplooking@uplooking02:/home/uplooking scp /home/uplooking/.bash_profile uplooking@uplooking03:/home/uplooking 啟動 修改事宜 為了避免和hadoop中的start/stop-all.sh腳本發生沖突,將spark/sbin/start/stop-all.sh重命名 mv start-all.sh start-spark-all.sh mv stop-all.sh stop-spark-all.sh 啟動 sbin/start-spark-all.sh 會在我們配置的主節點uplooking01上啟動一個進程Master 會在我們配置的從節點uplooking02上啟動一個進程Worker 會在我們配置的從節點uplooking03上啟動一個進程Worker 簡單的驗證 啟動spark-shell bin/spark-shell scala> sc.textFile("hdfs://ns1/data/hello").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect.foreach(println) 我們發現spark非常快速的執行了這個程序,計算出我們想要的結果 一個端口:8080/4040 8080-->spark集群的訪問端口,類似於hadoop中的50070和8088的綜合 4040-->sparkUI的訪問地址 7077-->hadoop中的9000端口
基於zookeeper的HA配置
最好在集群停止的時候來做 第一件事 註釋掉spark-env.sh中兩行內容 #export SPARK_MASTER_IP=uplooking01 #export SPARK_MASTER_PORT=7077 第二件事 在spark-env.sh中加一行內容 export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=uplooking01:2181,uplooking02:2181,uplooking03:2181 -Dspark.deploy.zookeeper.dir=/spark" 解釋 spark.deploy.recoveryMode設置成 ZOOKEEPER spark.deploy.zookeeper.urlZooKeeper URL spark.deploy.zookeeper.dir ZooKeeper 保存恢復狀態的目錄,缺省為 /spark 重啟集群 在任何一臺spark節點上啟動start-spark-all.sh 手動在集群中其他從節點上再啟動master進程:sbin/start-master.sh -->在uplooking02 通過瀏覽器方法 uplooking01:8080 /uplooking02:8080-->Status: STANDBY Status: ALIVE 驗證HA,只需要手動停掉master上spark進程Master,等一會slave01上的進程Master狀態會從STANDBY編程ALIVE # 註意,如果在uplooking02上啟動,此時uplooking02也會是master,而uplooking01則都不是, # 因為配置文件中並沒有指定master,只指定了slave # spark-start-all.sh也包括了start-master.sh的操作,所以才會在該臺機器上也啟動master.
Spark源碼編譯
安裝好maven後,並且配置好本地的spark倉庫(不然編譯時依賴從網上下載會很慢),
然後就可以在spark源碼目錄執行下面的命令:
mvn -Pyarn -Dhadoop.version=2.6.4 -Dyarn.version=2.6.4 -DskipTests clean package
編譯成功後輸出如下:
......
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 3.617 s]
[INFO] Spark Project Test Tags ............................ SUCCESS [ 17.419 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 12.102 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 11.878 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 7.324 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 16.326 s]
[INFO] Spark Project Core ................................. SUCCESS [04:31 min]
[INFO] Spark Project Bagel ................................ SUCCESS [ 11.671 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 55.420 s]
[INFO] Spark Project Streaming ............................ SUCCESS [02:03 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [02:40 min]
[INFO] Spark Project SQL .................................. SUCCESS [03:38 min]
[INFO] Spark Project ML Library ........................... SUCCESS [03:56 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 15.726 s]
[INFO] Spark Project Hive ................................. SUCCESS [02:30 min]
[INFO] Spark Project Docker Integration Tests ............. SUCCESS [ 11.961 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 42.913 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 8.391 s]
[INFO] Spark Project YARN ................................. SUCCESS [ 42.013 s]
[INFO] Spark Project Assembly ............................. SUCCESS [02:06 min]
[INFO] Spark Project External Twitter ..................... SUCCESS [ 19.155 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 22.164 s]
[INFO] Spark Project External Flume ....................... SUCCESS [ 26.228 s]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [ 3.838 s]
[INFO] Spark Project External MQTT ........................ SUCCESS [ 33.132 s]
[INFO] Spark Project External MQTT Assembly ............... SUCCESS [ 7.937 s]
[INFO] Spark Project External ZeroMQ ...................... SUCCESS [ 17.900 s]
[INFO] Spark Project External Kafka ....................... SUCCESS [ 37.597 s]
[INFO] Spark Project Examples ............................. SUCCESS [02:39 min]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 10.556 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 31:22 min
[INFO] Finished at: 2018-04-24T18:33:58+08:00
[INFO] Final Memory: 89M/1440M
[INFO] ------------------------------------------------------------------------
然後就可以在下面的目錄中看到編譯成功的文件:
[uplooking@uplooking01 scala-2.10]$ pwd
/home/uplooking/compile/spark-1.6.2/assembly/target/scala-2.10
[uplooking@uplooking01 scala-2.10]$ ls -lh
總用量 135M
-rw-rw-r-- 1 uplooking uplooking 135M 4月 24 18:28 spark-assembly-1.6.2-hadoop2.6.4.jar
在已經安裝的spark的lib目錄下也可以看到該文件:
[uplooking@uplooking01 lib]$ ls -lh
總用量 291M
-rw-r--r-- 1 uplooking uplooking 332K 6月 22 2016 datanucleus-api-jdo-3.2.6.jar
-rw-r--r-- 1 uplooking uplooking 1.9M 6月 22 2016 datanucleus-core-3.2.10.jar
-rw-r--r-- 1 uplooking uplooking 1.8M 6月 22 2016 datanucleus-rdbms-3.2.9.jar
-rw-r--r-- 1 uplooking uplooking 6.6M 6月 22 2016 spark-1.6.2-yarn-shuffle.jar
-rw-r--r-- 1 uplooking uplooking 173M 6月 22 2016 spark-assembly-1.6.2-hadoop2.6.0.jar
-rw-r--r-- 1 uplooking uplooking 108M 6月 22 2016 spark-examples-1.6.2-hadoop2.6.0.jar
Spark筆記整理(一):spark單機安裝部署、分布式集群與HA安裝部署+spark源碼編譯