1. 程式人生 > >spark2.3.1 安裝過程

spark2.3.1 安裝過程

1.安裝scalar

下載scalar,解壓到路徑

/usr/local/scalar

在這裡插入圖片描述 在/etc/profile檔案中加入安裝路徑

vim /etc/profile

新增以下內容

export SCALA_HOME=/usr/local/scala/scala-2.12.7
export PATH=$PATH:$SCALA_HOME/bin

執行檔案

source /etc/profile

安裝完成,驗證是否成功:

scala -version

要在每一個節點上都安裝配置scalar,可以安裝完spark後一起分發給其他節點

2.安裝spark

在這裡插入圖片描述 安裝解壓到以下路徑

/usr/local/spark

編輯/etc/profile檔案,增加:

export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin

配置conf目錄下的檔案: 進入目錄/usr/local/spark/spark-2.3.1-bin-hadoop2.7/conf

cd /usr/local/spark/spark-2.3.1-bin-hadoop2.7/conf

新建spark-env.h檔案:

cp spark-env.sh.template spark-env.sh

編輯spark-env.h檔案:

vim spark-env.sh

新增以下內容:

export JAVA_HOME=
/usr/local/jdk-11 export HADOOP_HOME=/usr/local/hadoop/hadoop-2.8.5 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SCALA_HOME=/usr/local/scala/scala-2.12.7 export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.7 export SPARK_MASTER_IP=master export SPARK_EXECUTOR_MEMORY=1G

新建slaves檔案:

cp    slaves.template   slaves

編輯slaves檔案,裡面的內容刪除,修改為:

slaver1
slaver2
slaver3
slaver4
slaver5

配置完成,分發給其他節點,並且完成/etc/profile檔案的配置

scp -r spark slave1:/usr/local/
scp -r spark slave2:/usr/local/
scp -r spark slave3:/usr/local/
scp -r spark slave4:/usr/local/
scp -r spark slave5:/usr/local/

注:遇到的小問題,spark啟動失敗,報錯如下: 在這裡插入圖片描述

[[email protected] logs]# cat spark-root-org.apache.spark.deploy.worker.Worker-1-slaver1.out
Spark Command: /usr/local/jdk-11/bin/java -cp /usr/local/spark/spark-2.3.1-bin-hadoop2.7/conf/:/usr/local/spark/spark-2.3.1-bin-hadoop2.7/jars/*:/usr/local/hadoop/hadoop-2.8.5/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077
========================================
2018-10-08 19:22:49 INFO  Worker:2611 - Started daemon with process name: [email protected]
2018-10-08 19:22:49 INFO  SignalUtils:54 - Registered signal handler for TERM
2018-10-08 19:22:49 INFO  SignalUtils:54 - Registered signal handler for HUP
2018-10-08 19:22:49 INFO  SignalUtils:54 - Registered signal handler for INT
2018-10-08 19:22:50 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[main,5,main]
java.lang.ExceptionInInitializerError
	at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
	at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
	at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
	at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2467)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2467)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2467)
	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:220)
	at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:784)
	at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:755)
	at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
	at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
	at java.base/java.lang.String.substring(String.java:1874)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
	... 15 more
2018-10-08 19:22:50 INFO  ShutdownHookManager:54 - Shutdown hook called

解決:重新安裝jdk

安裝完成 在這裡插入圖片描述

3.測試

  • 訪問ui 害怕8080端口占用,可以修改訪問埠:
cd /usr/local/spark/spark-2.3.1-bin-hadoop2.7/sbin
vim start-master.sh

修改劃線位置: 在這裡插入圖片描述 在瀏覽器裡訪問Mster機器,我的Spark叢集裡MasterIP地址是192.168.144.130,訪問8888埠,URL是:http://192.168.144.130:8888/

  • 執行Spark提供的計算圓周率的示例程式
cd /usr/local/spark/spark-2.3.1-bin-hadoop2.7
./bin/spark-submit  --class  org.apache.spark.examples.SparkPi  --master local   examples/jars/spark-examples_2.11-2.3.1.jar 

結果

[[email protected] spark-2.3.1-bin-hadoop2.7]# ./bin/spark-submit  --class  org.apache.spark.examples.SparkPi  --master local   examples/jars/spark-examples_2.11-2.3.1.jar 
2018-10-08 19:52:00 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-10-08 19:52:01 INFO  SparkContext:54 - Running Spark version 2.3.1
2018-10-08 19:52:01 INFO  SparkContext:54 - Submitted application: Spark Pi
2018-10-08 19:52:01 INFO  SecurityManager:54 - Changing view acls to: root
2018-10-08 19:52:01 INFO  SecurityManager:54 - Changing modify acls to: root
2018-10-08 19:52:01 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-10-08 19:52:01 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-10-08 19:52:01 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  ith modify permissions: Set(root); groups with modify permissions: Set()
2018-10-08 19:52:02 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 35012.
2018-10-08 19:52:02 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-10-08 19:52:03 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-10-08 19:52:03 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-10-08 19:52:03 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-10-08 19:52:03 INFO  DiskBlockManager:54 - Created local directory at /tmp/blockmgr-6e399656-ed6d-40aa-9c10-0259128b94e4
2018-10-08 19:52:03 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-10-08 19:52:03 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-10-08 19:52:03 INFO  log:192 - Logging initialized @5541ms
2018-10-08 19:52:03 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-10-08 19:52:04 INFO  Server:414 - Started @5888ms
2018-10-08 19:52:04 INFO  AbstractConnector:278 - Started [email protected]{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-10-08 19:52:04 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/jobs,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/jobs/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/jobs/job,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/jobs/job/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/stages,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/stages/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/stages/stage,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/stages/stage/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/stages/pool,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/stages/pool/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/storage,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/storage/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/storage/rdd,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/environment,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/environment/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/executors,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/executors/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/executors/threadDump,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/static,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/api,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  ContextHandler:781 - Started [email protected]{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-10-08 19:52:04 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://slaver1:4040
2018-10-08 19:52:04 INFO  SparkContext:54 - Added JAR file:/usr/local/spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar at spark://slaver1:35012/jars/spark-examples_211-2.3.1.jar with timestamp 1538999524828
2018-10-08 19:52:05 INFO  Executor:54 - Starting executor ID driver on host localhost
2018-10-08 19:52:05 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41601.
2018-10-08 19:52:05 INFO  NettyBlockTransferService:54 - Server created on slaver1:41601
2018-10-08 19:52:05 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-10-08 19:52:05 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, slaver1, 41601, None)
2018-10-08 19:52:05 INFO  BlockManagerMasterEndpoint:54 - Registering block manager slaver1:41601 with 366.3 MB RAM, BlockManagerId(driver, slaver1, 41601, None)
2018-10-08 19:52:05 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, slaver1, 41601, None)
2018-10-08 19:52:05 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, slaver1, 41601, None)
2018-10-08 19:52:06 INFO  ContextHandler:781 - Started [email protected]{/metrics/json,null,AVAILABLE,@Spark}
2018-10-08 19:52:07 INFO  SparkContext:54 - Starting job: reduce at SparkPi.scala:38
2018-10-08 19:52:07 INFO  DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
2018-10-08 19:52:07 INFO  DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
2018-10-08 19:52:07 INFO  DAGScheduler:54 - Parents of final stage: List()
2018-10-08 19:52:07 INFO  DAGScheduler:54 - Missing parents: List()
2018-10-08 19:52:07 INFO  DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
2018-10-08 19:52:09 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB)
2018-10-08 19:52:09 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1181.0 B, free 366.3 MB)
2018-10-08 19:52:09 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on slaver1:41601 (size: 1181.0 B, free: 366.3 MB)
2018-10-08 19:52:09 INFO  SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1039
2018-10-08 19:52:09 INFO  DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
2018-10-08 19:52:09 INFO  TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks
2018-10-08 19:52:09 INFO  TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7853 bytes)
2018-10-08 19:52:09 INFO  Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-10-08 19:52:09 INFO  Executor:54 - Fetching spark://slaver1:35012/jars/spark-examples_2.11-2.3.1.jar with timestamp 1538999524828
2018-10-08 19:52:10 INFO  TransportClientFactory:267 - Successfully created connection to slaver1/192.168.144.131:35012 after 141 ms (0 ms spent in bootstraps)
2018-10-08 19:52:10 INFO  Utils:54 - Fetching spark://slaver1:35012/jars/spark-examples_2.11-2.3.1.jar to /tmp/spark-c0f10a32-518d-466e-81e7-6a287f0a86ba/userFiles-5881d268-d800-4d3a-9b90-d892ceb9aaa/fetchFileTemp909421448294962008.tmp
2018-10-08 19:52:10 INFO  Executor:54 - Adding file:/tmp/spark-c0f10a32-518d-466e-81e7-6a287f0a86ba/userFiles-5881d268-d800-4d3a-9b90-d8929ceb9aaa/spark-examples_2.11-2.3.1.jar to class loader
2018-10-08 19:52:10 INFO  Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 824 bytes result sent to driver
2018-10-08 19:52:10 INFO  TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7853 bytes)
2018-10-08 19:52:10 INFO  Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2018-10-08 19:52:10 INFO  TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 1143 ms on localhost (executor driver) (1/2)
2018-10-08 19:52:10 INFO  Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver
2018-10-08 19:52:10 INFO  TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 84 ms on localhost (executor driver) (2/2)
2018-10-08 19:52:10 INFO  TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool 
2018-10-08 19:52:10 INFO  DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.866 s
2018-10-08 19:52:10 INFO  DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 3.180690 s
Pi is roughly 3.133915669578348
2018-10-08 19:52:11 INFO  AbstractConnector:318 - Stopped [email protected]{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-10-08 19:52:11 INFO  SparkUI:54 - Stopped Spark web UI at http://slaver1:4040
2018-10-08 19:52:11 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-10-08 19:52:11 INFO  MemoryStore:54 - MemoryStore cleared
2018-10-08 19:52:11 INFO  BlockManager:54 - BlockManager stopped
2018-10-08 19:52:11 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-10-08 19:52:11 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-10-08 19:52:11 INFO  SparkContext:54 - Successfully stopped SparkContext
2018-10-08 19:52:11 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-10-08 19:52:11 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-c0f10a32-518d-466e-81e7-6a287f0a86ba
2018-10-08 19:52:11 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-24283d1b-7be7-4383-b6a8-8f58f8abed50