在docker上安裝 Spark 1.2.0
好久沒有寫部落格了,最近有點時間打算寫點。
1.什麼docker
Docker 是一個開源專案,誕生於 2013年初,最初是 dotCloud 公司內部的一個業餘專案。它基於 Google 公司推出的 Go 語言實現。 專案後來加入了 Linux 基金會,遵從了 Apache 2.0 協議,專案程式碼在 GitHub上進行維護。
Docker 自開源後受到廣泛的關注和討論,以至於 dotCloud公司後來都改名為 Docker Inc。Redhat 已經在其 RHEL6.5 中集中支援 Docker;Google 也在其 PaaS 產品中廣泛應用。
Docker 專案的目標是實現輕量級的作業系統虛擬化解決方案。 Docker
在 LXC 的基礎上 Docker 進行了進一步的封裝,讓使用者不需要去關心容器的管理,使得操作更為簡便。使用者操作 Docker 的容器就像操作一個快速輕量級的虛擬機器一樣簡單。
下面的圖片比較了 Docker 和傳統虛擬化方式的不同之處,可見容器是在作業系統層面上實現虛擬化,直接複用本地主機的作業系統,而傳統方式則是在硬體層面實現。
2.為什麼要用docder
作為一種新興的虛擬化方式,Docker 跟傳統的虛擬化方式相比具有眾多的優勢。
首先,Docker 容器的啟動可以在秒級實現,這相比傳統的虛擬機器方式要快得多。 其次,Docker對系統資源的利用率很高,一臺主機上可以同時執行數千個 Docker 容器。
容器除了執行其中應用外,基本不消耗額外的系統資源,使得應用的效能很高,同時系統的開銷儘量小。傳統虛擬機器方式執行 10 個不同的應用就要起 10 個虛擬機器,而Docker 只需要啟動 10 個隔離的應用即可。
具體說來,Docker 在如下幾個方面具有較大的優勢。
更快速的交付和部署
對開發和運維(devop)人員來說,最希望的就是一次建立或配置,可以在任意地方正常執行。
開發者可以使用一個標準的映象來構建一套開發容器,開發完成之後,運維人員可以直接使用這個容器來部署程式碼。 Docker 可以快速建立容器,快速迭代應用程式,並讓整個過程全程可見,使團隊中的其他成員更容易理解應用程式是如何建立和工作的。 Docker 容器很輕很快!容器的啟動時間是秒級的,大量地節約開發、測試、部署的時間。
更高效的虛擬化
Docker 容器的執行不需要額外的 hypervisor 支援,它是核心級的虛擬化,因此可以實現更高的效能和效率。
更輕鬆的遷移和擴充套件
Docker 容器幾乎可以在任意的平臺上執行,包括物理機、虛擬機器、公有云、私有云、個人電腦、伺服器等。這種相容性可以讓使用者把一個應用程式從一個平臺直接遷移到另外一個。
更簡單的管理
使用 Docker,只需要小小的修改,就可以替代以往大量的更新工作。所有的修改都以增量的方式被分發和更新,從而實現自動化並且高效的管理。
對比傳統虛擬機器總結
特性 |
容器 |
虛擬機器 |
啟動 |
秒級 |
分鐘級 |
硬碟使用 |
一般為 MB |
一般為 GB |
效能 |
接近原生 |
弱於 |
系統支援量 |
單機支援上千個容器 |
一般幾十個 |
3.CentOS 系列安裝 Docker
CentOS7系統 CentOS-Extras 庫中已帶 Docker,可以直接安裝:
$sudo yum install docker
安裝之後啟動 Docker 服務,並讓它隨系統啟動自動載入。
$sudo service docker start
$sudo chkconfig docker on
4.安裝Spark
在當前的文章中,我們想幫助你開始用docker安裝最新的是Spark- 1.2.0。
Docker和Spark是最近炒作非常火的兩種技術。所以我們把Spark和Docker放在一起,容器的程式碼是我們的GitHub庫中找到.
4.1從Docker倉庫拉取映象
[[email protected] ~]# docker pullsequenceiq/spark:1.2.0
Pulling repository sequenceiq/spark
334aabfef5f1: Pulling dependent layers
89b52f216c6c: Download complete
0dd5f7a357f5: Download complete
ae2537991743: Download complete
b38f87063c35: Download complete
36bf8ea12ad2: Download complete
c605a0ffb1d4: Download complete
0bd9464ce7fd: Download complete
7b5528f018cf: Download complete
e8f8ccba56cc: Download complete
d3808d6c73c4: Download complete
36fa609d2102: Download complete
5258b4da874d: Download complete
0bd02d3d7a4b: Download complete
bbad7d38a70e: Download complete
c6fbec816602: Download complete
3f5e48be180b: Download complete
ef4e09c06ac5: Download complete
334aabfef5f1: Download complete
ee2f8cf16677: Download complete
70c2821718e6: Download complete
0b0f13b6c16b: Download complete
8a17a79e13f5: Download complete
d2d8a13706fd: Download complete
dde2d8f01c66: Download complete
0165d67b327e: Download complete
afcddf83915d: Download complete
e0786d842672: Download complete
5c3542c1d6d2: Download complete
c04119d3b78c: Download complete
e2a6f40fbee4: Download complete
7c5e5f584526: Download complete
bbfe93940f8c: Download complete
0dae8995a865: Download complete
bd0a4bca6161: Download complete
5c09c81ffffd: Download complete
89b0655a34d7: Download complete
d2ca8f2c26eb: Download complete
aced545fc0a4: Download complete
82a5db38e8f3: Download complete
cc7d6c137a30: Download complete
f52a6540835d: Download complete
aa33b1563fe1: Download complete
944a6e9c3824: Download complete
f0ec3c14378c: Download complete
48ac51d3df99: Download complete
abfbbcb93f01: Download complete
e1f3493e6f14: Download complete
83ca5ab18a47: Download complete
63966e034d6e: Download complete
8aebb7338718: Download complete
0da4a51ce952: Download complete
2e2ffaf055bc: Download complete
dcdbfb337435: Download complete
865c6212c08c: Download complete
8c791638517c: Download complete
8ef7e34a3049: Download complete
873131a3f2d7: Download complete
944971358eb0: Download complete
d828dda7ad02: Download complete
04cecee6f836: Download complete
42460f40dc71: Download complete
4f1e85a3c877: Download complete
3f212fb7286c: Download complete
5b4955b94732: Download complete
83308e1cae94: Download complete
55bf8341ea4d: Download complete
2f5f4034cbe9: Download complete
6a2c6e8b5d08: Download complete
6047f6052c38: Download complete
Status: Downloaded newer image forsequenceiq/spark:1.2.0
這個過程時間比較長,映象檔案大概2G左右。我打算將映象匯出到本地檔案,然後上傳百度盤,方便大家下載。
然後可以使用 docker load 從匯出的本地檔案中再匯入到本地映象庫,例如
sudo docker load --input spark.tar
4.2執行映象
一旦從docker倉庫拉取完了映象就可以執行啦。 [[email protected] ~]# docker run -i -t -h sandbox sequenceiq/spark:1.2.0 /etc/bootstrap.sh -bash/
Starting sshd: [ OK ]
Starting namenodes on [sandbox]
sandbox: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-sandbox.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-sandbox.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-sandbox.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-sandbox.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-sandbox.out
bash-4.1# jps
304 SecondaryNameNode
625 Jps
505 ResourceManager
188 DataNode
112 NameNode
588 NodeManager
4.3測試
當這些都執行完了,我測試一下看看是不是安裝好了。 bash-4.1# cd /usr/local/sparkbash-4.1# ./bin/spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/02/11 20:56:58 INFO spark.SecurityManager: Changing view acls to: root
15/02/11 20:56:58 INFO spark.SecurityManager: Changing modify acls to: root
15/02/11 20:56:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/02/11 20:56:59 INFO spark.HttpServer: Starting HTTP Server
15/02/11 20:56:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/11 20:56:59 INFO server.AbstractConnector: Started [email protected]:45752
15/02/11 20:56:59 INFO util.Utils: Successfully started service 'HTTP class server' on port 45752.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.2.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
Type in expressions to have them evaluated.
Type :help for more information.
15/02/11 20:57:17 INFO spark.SecurityManager: Changing view acls to: root
15/02/11 20:57:17 INFO spark.SecurityManager: Changing modify acls to: root
15/02/11 20:57:17 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/02/11 20:57:18 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/11 20:57:19 INFO Remoting: Starting remoting
15/02/11 20:57:20 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:58553]
15/02/11 20:57:20 INFO util.Utils: Successfully started service 'sparkDriver' on port 58553.
15/02/11 20:57:20 INFO spark.SparkEnv: Registering MapOutputTracker
15/02/11 20:57:20 INFO spark.SparkEnv: Registering BlockManagerMaster
15/02/11 20:57:20 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150211205720-f7e6
15/02/11 20:57:20 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB
15/02/11 20:57:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/11 20:57:24 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-d90ad2bb-e82f-4446-8ccd-e79ff4c6d076
15/02/11 20:57:24 INFO spark.HttpServer: Starting HTTP Server
15/02/11 20:57:24 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/11 20:57:24 INFO server.AbstractConnector: Started [email protected]:52012
15/02/11 20:57:24 INFO util.Utils: Successfully started service 'HTTP file server' on port 52012.
15/02/11 20:57:25 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/11 20:57:25 INFO server.AbstractConnector: Started Sel[email protected]:4040
15/02/11 20:57:25 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/02/11 20:57:25 INFO ui.SparkUI: Started SparkUI at http://sandbox:4040
15/02/11 20:57:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/02/11 20:57:27 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
15/02/11 20:57:27 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/02/11 20:57:27 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
15/02/11 20:57:27 INFO yarn.Client: Setting up container launch context for our AM
15/02/11 20:57:27 INFO yarn.Client: Preparing resources for our AM container
15/02/11 20:57:31 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/02/11 20:57:31 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs:/spark/spark-assembly-1.2.0-hadoop2.4.0.jar
15/02/11 20:57:31 INFO yarn.Client: Setting up the launch environment for our AM container
15/02/11 20:57:31 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/02/11 20:57:31 INFO spark.SecurityManager: Changing view acls to: root
15/02/11 20:57:31 INFO spark.SecurityManager: Changing modify acls to: root
15/02/11 20:57:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/02/11 20:57:31 INFO yarn.Client: Submitting application 1 to ResourceManager
15/02/11 20:57:32 INFO impl.YarnClientImpl: Submitted application application_1423706171480_0001
15/02/11 20:57:33 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:33 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1423706251906
final status: UNDEFINED
tracking URL: http://sandbox:8088/proxy/application_1423706171480_0001/
user: root
15/02/11 20:57:34 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:35 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:36 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:37 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:38 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:40 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:41 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:42 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:43 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:44 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:45 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:46 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:47 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:48 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:49 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:50 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:51 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:52 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:53 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:54 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:55 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:56 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:57 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:58 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:57:59 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:00 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:01 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:04 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:05 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:06 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:07 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:08 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:10 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:11 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:13 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:14 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:15 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:16 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:17 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:18 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:19 INFO yarn.Client: Application report for application_1423706171480_0001 (state: ACCEPTED)
15/02/11 20:58:19 INFO cluster.YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://[email protected]:54672/user/YarnAM#-192648481]
15/02/11 20:58:19 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sandbox, PROXY_URI_BASES -> http://sandbox:8088/proxy/application_1423706171480_0001), /proxy/application_1423706171480_0001
15/02/11 20:58:19 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/02/11 20:58:20 INFO yarn.Client: Application report for application_1423706171480_0001 (state: RUNNING)
15/02/11 20:58:20 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: sandbox
ApplicationMaster RPC port: 0
queue: default
start time: 1423706251906
final status: UNDEFINED
tracking URL: http://sandbox:8088/proxy/application_1423706171480_0001/
user: root
15/02/11 20:58:20 INFO cluster.YarnClientSchedulerBackend: Application application_1423706171480_0001 has started running.
15/02/11 20:58:20 INFO netty.NettyBlockTransferService: Server created on 60949
15/02/11 20:58:20 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/02/11 20:58:20 INFO storage.BlockManagerMasterActor: Registering block manager sandbox:60949 with 530.3 MB RAM, BlockManagerId(<driver>, sandbox, 60949)
15/02/11 20:58:20 INFO storage.BlockManagerMaster: Registered BlockManager
15/02/11 20:58:21 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
15/02/11 20:58:21 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.
scala> 15/02/11 20:58:43 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:37188/user/Executor#375257054] with ID 1
15/02/11 20:58:43 INFO util.RackResolver: Resolved sandbox to /default-rack
15/02/11 20:58:43 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:52808/user/Executor#1782772186] with ID 2
15/02/11 20:58:45 INFO storage.BlockManagerMasterActor: Registering block manager sandbox:55768 with 530.3 MB RAM, BlockManagerId(1, sandbox, 55768)
15/02/11 20:58:45 INFO storage.BlockManagerMasterActor: Registering block manager sandbox:41242 with 530.3 MB RAM, BlockManagerId(2, sandbox, 41242)
scala> sc.parallelize(1 to 1000).count()
15/02/11 20:59:45 INFO spark.SparkContext: Starting job: count at <console>:13
15/02/11 20:59:45 INFO scheduler.DAGScheduler: Got job 0 (count at <console>:13) with 2 output partitions (allowLocal=false)
15/02/11 20:59:45 INFO scheduler.DAGScheduler: Final stage: Stage 0(count at <console>:13)
15/02/11 20:59:45 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/02/11 20:59:45 INFO scheduler.DAGScheduler: Missing parents: List()
15/02/11 20:59:45 INFO scheduler.DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize at <console>:13), which has no missing parents
15/02/11 20:59:45 INFO storage.MemoryStore: ensureFreeSpace(1088) called with curMem=0, maxMem=556038881
15/02/11 20:59:45 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1088.0 B, free 530.3 MB)
15/02/11 20:59:45 INFO storage.MemoryStore: ensureFreeSpace(842) called with curMem=1088, maxMem=556038881
15/02/11 20:59:45 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 842.0 B, free 530.3 MB)
15/02/11 20:59:45 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox:60949 (size: 842.0 B, free: 530.3 MB)
15/02/11 20:59:45 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/02/11 20:59:45 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
15/02/11 20:59:45 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at <console>:13)
15/02/11 20:59:45 INFO cluster.YarnClientClusterScheduler: Adding task set 0.0 with 2 tasks
15/02/11 20:59:45 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, sandbox, PROCESS_LOCAL, 1260 bytes)
15/02/11 20:59:45 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sandbox, PROCESS_LOCAL, 1260 bytes)
15/02/11 20:59:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox:41242 (size: 842.0 B, free: 530.3 MB)
15/02/11 20:59:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox:55768 (size: 842.0 B, free: 530.3 MB)
15/02/11 20:59:52 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 6625 ms on sandbox (1/2)
15/02/11 20:59:52 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6691 ms on sandbox (2/2)
15/02/11 20:59:52 INFO scheduler.DAGScheduler: Stage 0 (count at <console>:13) finished in 6.695 s
15/02/11 20:59:52 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/02/11 20:59:52 INFO scheduler.DAGScheduler: Job 0 finished: count at <console>:13, took 7.036182 s
res0: Long = 1000
scala> exit
4.4執行spark樣例程式
bash-4.1# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1 ./lib/spark-examples-1.2.0-hadoop2.4.0.jarSpark assembly has been built with Hive, including Datanucleus jars on classpath
15/02/11 21:09:37 INFO spark.SecurityManager: Changing view acls to: root
15/02/11 21:09:37 INFO spark.SecurityManager: Changing modify acls to: root
15/02/11 21:09:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/02/11 21:09:38 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/11 21:09:38 INFO Remoting: Starting remoting
15/02/11 21:09:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:46836]
15/02/11 21:09:38 INFO util.Utils: Successfully started service 'sparkDriver' on port 46836.
15/02/11 21:09:38 INFO spark.SparkEnv: Registering MapOutputTracker
15/02/11 21:09:38 INFO spark.SparkEnv: Registering BlockManagerMaster
15/02/11 21:09:38 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150211210938-ba7a
15/02/11 21:09:38 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB
15/02/11 21:09:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/11 21:09:39 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-590b1b3d-95b6-4d7c-bef4-36b0cafeafe9
15/02/11 21:09:39 INFO spark.HttpServer: Starting HTTP Server
15/02/11 21:09:39 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/11 21:09:39 INFO server.AbstractConnector: Started [email protected]:60161
15/02/11 21:09:39 INFO util.Utils: Successfully started service 'HTTP file server' on port 60161.
15/02/11 21:09:40 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/11 21:09:40 INFO server.AbstractConnector: Started [email protected]:4040
15/02/11 21:09:40 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/02/11 21:09:40 INFO ui.SparkUI: Started SparkUI at http://sandbox:4040
15/02/11 21:09:41 INFO spark.SparkContext: Added JAR file:/usr/local/spark-1.2.0-bin-hadoop2.4/./lib/spark-examples-1.2.0-hadoop2.4.0.jar at http://172.17.0.2:60161/jars/spark-examples-1.2.0-hadoop2.4.0.jar with timestamp 1423706981078
15/02/11 21:09:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/02/11 21:09:41 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
15/02/11 21:09:41 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/02/11 21:09:41 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
15/02/11 21:09:41 INFO yarn.Client: Setting up container launch context for our AM
15/02/11 21:09:41 INFO yarn.Client: Preparing resources for our AM container
15/02/11 21:09:42 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/02/11 21:09:42 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs:/spark/spark-assembly-1.2.0-hadoop2.4.0.jar
15/02/11 21:09:42 INFO yarn.Client: Setting up the launch environment for our AM container
15/02/11 21:09:42 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/02/11 21:09:42 INFO spark.SecurityManager: Changing view acls to: root
15/02/11 21:09:42 INFO spark.SecurityManager: Changing modify acls to: root
15/02/11 21:09:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/02/11 21:09:42 INFO yarn.Client: Submitting application 3 to ResourceManager
15/02/11 21:09:43 INFO impl.YarnClientImpl: Submitted application application_1423706171480_0003
15/02/11 21:09:44 INFO yarn.Client: Application report for application_1423706171480_0003 (state: ACCEPTED)
15/02/11 21:09:44 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1423706982964
final status: UNDEFINED
tracking URL: http://sandbox:8088/proxy/application_1423706171480_0003/
user: root
15/02/11 21:09:45 INFO yarn.Client: Application report for application_1423706171480_0003 (state: ACCEPTED)
15/02/11 21:09:46 INFO yarn.Client: Application report for application_1423706171480_0003 (state: ACCEPTED)
15/02/11 21:09:47 INFO yarn.Client: Application report for application_1423706171480_0003 (state: ACCEPTED)
15/02/11 21:09:48 INFO yarn.Client: Application report for application_1423706171480_0003 (state: ACCEPTED)
15/02/11 21:09:49 INFO cluster.YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://[email protected]:36886/user/YarnAM#250082351]
15/02/11 21:09:49 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sandbox, PROXY_URI_BASES -> http://sandbox:8088/proxy/application_1423706171480_0003), /proxy/application_1423706171480_0003
15/02/11 21:09:49 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/02/11 21:09:49 INFO yarn.Client: Application report for application_1423706171480_0003 (state: RUNNING)
15/02/11 21:09:49 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: sandbox
ApplicationMaster RPC port: 0
queue: default
start time: 1423706982964
final status: UNDEFINED
tracking URL: http://sandbox:8088/proxy/application_1423706171480_0003/
user: root
15/02/11 21:09:49 INFO cluster.YarnClientSchedulerBackend: Application application_1423706171480_0003 has started running.
15/02/11 21:09:49 INFO netty.NettyBlockTransferService: Server created on 56981
15/02/11 21:09:49 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/02/11 21:09:49 INFO storage.BlockManagerMasterActor: Registering block manager sandbox:56981 with 530.3 MB RAM, BlockManagerId(<driver>, sandbox, 56981)
15/02/11 21:09:49 INFO storage.BlockManagerMaster: Registered BlockManager
15/02/11 21:10:03 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:56995/user/Executor#-1663552722] with ID 2
15/02/11 21:10:04 INFO util.RackResolver: Resolved sandbox to /default-rack
15/02/11 21:10:04 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:35154/user/Executor#1336228035] with ID 1
15/02/11 21:10:04 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
15/02/11 21:10:04 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:35
15/02/11 21:10:04 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false)
15/02/11 21:10:04 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)
15/02/11 21:10:04 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/02/11 21:10:04 INFO scheduler.DAGScheduler: Missing parents: List()
15/02/11 21:10:04 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no missing parents
15/02/11 21:10:04 INFO storage.MemoryStore: ensureFreeSpace(1728) called with curMem=0, maxMem=556038881
15/02/11 21:10:05 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1728.0 B, free 530.3 MB)
15/02/11 21:10:05 INFO storage.MemoryStore: ensureFreeSpace(1235) called with curMem=1728, maxMem=556038881
15/02/11 21:10:05 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1235.0 B, free 530.3 MB)
15/02/11 21:10:05 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox:56981 (size: 1235.0 B, free: 530.3 MB)
15/02/11 21:10:05 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/02/11 21:10:05 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
15/02/11 21:10:05 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31)
15/02/11 21:10:05 INFO cluster.YarnClientClusterScheduler: Adding task set 0.0 with 2 tasks
15/02/11 21:10:05 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, sandbox, PROCESS_LOCAL, 1335 bytes)
15/02/11 21:10:05 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sandbox, PROCESS_LOCAL, 1335 bytes)
15/02/11 21:10:06 INFO storage.BlockManagerMasterActor: Registering block manager sandbox:48023 with 530.3 MB RAM, BlockManagerId(2, sandbox, 48023)
15/02/11 21:10:06 INFO storage.BlockManagerMasterActor: Registering block manager sandbox:46354 with 530.3 MB RAM, BlockManagerId(1, sandbox, 46354)
15/02/11 21:10:23 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox:46354 (size: 1235.0 B, free: 530.3 MB)
15/02/11 21:10:23 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox:48023 (size: 1235.0 B, free: 530.3 MB)
15/02/11 21:10:24 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 18997 ms on sandbox (1/2)
15/02/11 21:10:24 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 19283 ms on sandbox (2/2)
15/02/11 21:10:24 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 19.324 s
15/02/11 21:10:24 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/02/11 21:10:24 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:35, took 20.226582 s
Pi is roughly 3.143
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null}
15/02/11 21:10:24 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null}
15/02/11 21:10:24 INFO ui.SparkUI: Stopped Spark web UI at http://sandbox:4040
15/02/11 21:10:24 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/02/11 21:10:24 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
15/02/11 21:10:24 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
15/02/11 21:10:24 INFO cluster.YarnClientSchedulerBackend: Stopped
15/02/11 21:10:25 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/02/11 21:10:25 INFO storage.MemoryStore: MemoryStore cleared
15/02/11 21:10:25 INFO storage.BlockManager: BlockManager stopped
15/02/11 21:10:25 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/02/11 21:10:25 INFO spark.SparkContext: Successfully stopped SparkContext
15/02/11 21:10:25 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/02/11 21:10:25 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
bash-4.1#