spark2.3.1 安裝過程
阿新 • • 發佈:2018-12-14
1.安裝scalar
下載scalar,解壓到路徑
/usr/local/scalar
在/etc/profile檔案中加入安裝路徑
vim /etc/profile
新增以下內容
export SCALA_HOME=/usr/local/scala/scala-2.12.7
export PATH=$PATH:$SCALA_HOME/bin
執行檔案
source /etc/profile
安裝完成,驗證是否成功:
scala -version
要在每一個節點上都安裝配置scalar,可以安裝完spark後一起分發給其他節點
2.安裝spark
安裝解壓到以下路徑
/usr/local/spark
編輯/etc/profile檔案,增加:
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
配置conf目錄下的檔案: 進入目錄/usr/local/spark/spark-2.3.1-bin-hadoop2.7/conf
cd /usr/local/spark/spark-2.3.1-bin-hadoop2.7/conf
新建spark-env.h檔案:
cp spark-env.sh.template spark-env.sh
編輯spark-env.h檔案:
vim spark-env.sh
新增以下內容:
export JAVA_HOME= /usr/local/jdk-11
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.8.5
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SCALA_HOME=/usr/local/scala/scala-2.12.7
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.7
export SPARK_MASTER_IP=master
export SPARK_EXECUTOR_MEMORY=1G
新建slaves檔案:
cp slaves.template slaves
編輯slaves檔案,裡面的內容刪除,修改為:
slaver1
slaver2
slaver3
slaver4
slaver5
配置完成,分發給其他節點,並且完成/etc/profile檔案的配置
scp -r spark slave1:/usr/local/
scp -r spark slave2:/usr/local/
scp -r spark slave3:/usr/local/
scp -r spark slave4:/usr/local/
scp -r spark slave5:/usr/local/
注:遇到的小問題,spark啟動失敗,報錯如下:
[[email protected] logs]# cat spark-root-org.apache.spark.deploy.worker.Worker-1-slaver1.out
Spark Command: /usr/local/jdk-11/bin/java -cp /usr/local/spark/spark-2.3.1-bin-hadoop2.7/conf/:/usr/local/spark/spark-2.3.1-bin-hadoop2.7/jars/*:/usr/local/hadoop/hadoop-2.8.5/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077
========================================
2018-10-08 19:22:49 INFO Worker:2611 - Started daemon with process name: [email protected]
2018-10-08 19:22:49 INFO SignalUtils:54 - Registered signal handler for TERM
2018-10-08 19:22:49 INFO SignalUtils:54 - Registered signal handler for HUP
2018-10-08 19:22:49 INFO SignalUtils:54 - Registered signal handler for INT
2018-10-08 19:22:50 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[main,5,main]
java.lang.ExceptionInInitializerError
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2467)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2467)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2467)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:220)
at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:784)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:755)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
at java.base/java.lang.String.substring(String.java:1874)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
... 15 more
2018-10-08 19:22:50 INFO ShutdownHookManager:54 - Shutdown hook called
解決:重新安裝jdk
安裝完成
3.測試
- 訪問ui 害怕8080端口占用,可以修改訪問埠:
cd /usr/local/spark/spark-2.3.1-bin-hadoop2.7/sbin
vim start-master.sh
修改劃線位置: 在瀏覽器裡訪問Mster機器,我的Spark叢集裡MasterIP地址是192.168.144.130,訪問8888埠,URL是:http://192.168.144.130:8888/
- 執行Spark提供的計算圓周率的示例程式
cd /usr/local/spark/spark-2.3.1-bin-hadoop2.7
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.3.1.jar
結果
[[email protected] spark-2.3.1-bin-hadoop2.7]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.3.1.jar
2018-10-08 19:52:00 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-10-08 19:52:01 INFO SparkContext:54 - Running Spark version 2.3.1
2018-10-08 19:52:01 INFO SparkContext:54 - Submitted application: Spark Pi
2018-10-08 19:52:01 INFO SecurityManager:54 - Changing view acls to: root
2018-10-08 19:52:01 INFO SecurityManager:54 - Changing modify acls to: root
2018-10-08 19:52:01 INFO SecurityManager:54 - Changing view acls groups to:
2018-10-08 19:52:01 INFO SecurityManager:54 - Changing modify acls groups to:
2018-10-08 19:52:01 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users ith modify permissions: Set(root); groups with modify permissions: Set()
2018-10-08 19:52:02 INFO Utils:54 - Successfully started service 'sparkDriver' on port 35012.
2018-10-08 19:52:02 INFO SparkEnv:54 - Registering MapOutputTracker
2018-10-08 19:52:03 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-10-08 19:52:03 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-10-08 19:52:03 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-10-08 19:52:03 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-6e399656-ed6d-40aa-9c10-0259128b94e4
2018-10-08 19:52:03 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-10-08 19:52:03 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-10-08 19:52:03 INFO log:192 - Logging initialized @5541ms
2018-10-08 19:52:03 INFO Server:346 - jetty-9.3.z-SNAPSHOT
2018-10-08 19:52:04 INFO Server:414 - Started @5888ms
2018-10-08 19:52:04 INFO AbstractConnector:278 - Started [email protected]{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-10-08 19:52:04 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/jobs,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/jobs/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/jobs/job,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/jobs/job/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/stages,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/stages/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/stages/stage,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/stages/stage/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/stages/pool,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/stages/pool/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/storage,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/storage/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/storage/rdd,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/environment,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/environment/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/executors,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/executors/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/executors/threadDump,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/static,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/api,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO ContextHandler:781 - Started [email protected]{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://slaver1:4040
2018-10-08 19:52:04 INFO SparkContext:54 - Added JAR file:/usr/local/spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar at spark://slaver1:35012/jars/spark-examples_211-2.3.1.jar with timestamp 1538999524828
2018-10-08 19:52:05 INFO Executor:54 - Starting executor ID driver on host localhost
2018-10-08 19:52:05 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41601.
2018-10-08 19:52:05 INFO NettyBlockTransferService:54 - Server created on slaver1:41601
2018-10-08 19:52:05 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-10-08 19:52:05 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, slaver1, 41601, None)
2018-10-08 19:52:05 INFO BlockManagerMasterEndpoint:54 - Registering block manager slaver1:41601 with 366.3 MB RAM, BlockManagerId(driver, slaver1, 41601, None)
2018-10-08 19:52:05 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, slaver1, 41601, None)
2018-10-08 19:52:05 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, slaver1, 41601, None)
2018-10-08 19:52:06 INFO ContextHandler:781 - Started [email protected]{/metrics/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:07 INFO SparkContext:54 - Starting job: reduce at SparkPi.scala:38
2018-10-08 19:52:07 INFO DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
2018-10-08 19:52:07 INFO DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
2018-10-08 19:52:07 INFO DAGScheduler:54 - Parents of final stage: List()
2018-10-08 19:52:07 INFO DAGScheduler:54 - Missing parents: List()
2018-10-08 19:52:07 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
2018-10-08 19:52:09 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB)
2018-10-08 19:52:09 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1181.0 B, free 366.3 MB)
2018-10-08 19:52:09 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on slaver1:41601 (size: 1181.0 B, free: 366.3 MB)
2018-10-08 19:52:09 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1039
2018-10-08 19:52:09 INFO DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
2018-10-08 19:52:09 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks
2018-10-08 19:52:09 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7853 bytes)
2018-10-08 19:52:09 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-10-08 19:52:09 INFO Executor:54 - Fetching spark://slaver1:35012/jars/spark-examples_2.11-2.3.1.jar with timestamp 1538999524828
2018-10-08 19:52:10 INFO TransportClientFactory:267 - Successfully created connection to slaver1/192.168.144.131:35012 after 141 ms (0 ms spent in bootstraps)
2018-10-08 19:52:10 INFO Utils:54 - Fetching spark://slaver1:35012/jars/spark-examples_2.11-2.3.1.jar to /tmp/spark-c0f10a32-518d-466e-81e7-6a287f0a86ba/userFiles-5881d268-d800-4d3a-9b90-d892ceb9aaa/fetchFileTemp909421448294962008.tmp
2018-10-08 19:52:10 INFO Executor:54 - Adding file:/tmp/spark-c0f10a32-518d-466e-81e7-6a287f0a86ba/userFiles-5881d268-d800-4d3a-9b90-d8929ceb9aaa/spark-examples_2.11-2.3.1.jar to class loader
2018-10-08 19:52:10 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 824 bytes result sent to driver
2018-10-08 19:52:10 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7853 bytes)
2018-10-08 19:52:10 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2018-10-08 19:52:10 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 1143 ms on localhost (executor driver) (1/2)
2018-10-08 19:52:10 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver
2018-10-08 19:52:10 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 84 ms on localhost (executor driver) (2/2)
2018-10-08 19:52:10 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2018-10-08 19:52:10 INFO DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.866 s
2018-10-08 19:52:10 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 3.180690 s
Pi is roughly 3.133915669578348
2018-10-08 19:52:11 INFO AbstractConnector:318 - Stopped [email protected]{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-10-08 19:52:11 INFO SparkUI:54 - Stopped Spark web UI at http://slaver1:4040
2018-10-08 19:52:11 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-10-08 19:52:11 INFO MemoryStore:54 - MemoryStore cleared
2018-10-08 19:52:11 INFO BlockManager:54 - BlockManager stopped
2018-10-08 19:52:11 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2018-10-08 19:52:11 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-10-08 19:52:11 INFO SparkContext:54 - Successfully stopped SparkContext
2018-10-08 19:52:11 INFO ShutdownHookManager:54 - Shutdown hook called
2018-10-08 19:52:11 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-c0f10a32-518d-466e-81e7-6a287f0a86ba
2018-10-08 19:52:11 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-24283d1b-7be7-4383-b6a8-8f58f8abed50