hadoop-yarn叢集中,通過shell指令碼自動化提交spark任務
阿新 • • 發佈:2018-12-22
Set()
18/02/11 12:07:32 INFO yarn.Client: Submitting application application_1518316627470_0003 to ResourceManager
18/02/11 12:07:32 INFO impl.YarnClientImpl: Submitted application application_1518316627470_0003
18/02/11 12:07:32 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1518316627470_0003 and attemptId None
18/02/11 12:07:33 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:33 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1518322052230
final status: UNDEFINED
tracking URL: http://hadoop1:8088/proxy/application_1518316627470_0003/
user: elon
18/02/11 12:07:34 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:35 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18 /02/11 12:07:36 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:37 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:38 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:39 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:40 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:41 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:42 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:43 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
18/02/11 12:07:43 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop1, PROXY_URI_BASES -> http://hadoop1:8088/proxy/application_1518316627470_0003), /proxy/application_1518316627470_0003
18/02/11 12:07:43 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
18/02/11 12:07:43 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:44 INFO yarn.Client: Application report for application_1518316627470_0003 (state: RUNNING)
18/02/11 12:07:44 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.1.113
ApplicationMaster RPC port: 0
queue: default
start time: 1518322052230
final status: UNDEFINED
tracking URL: http://hadoop1:8088/proxy/application_1518316627470_0003/
user: elon
18/02/11 12:07:44 INFO cluster.YarnClientSchedulerBackend: Application application_1518316627470_0003 has started running.
18/02/11 12:07:44 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38368.
18/02/11 12:07:44 INFO netty.NettyBlockTransferService: Server created on 192.168.1.111:38368
18/02/11 12:07:44 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/02/11 12:07:44 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.111, 38368, None)
18/02/11 12:07:44 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.111:38368 with 117.0 MB RAM, BlockManagerId(driver, 192.168.1.111, 38368, None)
18/02/11 12:07:44 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.111, 38368, None)
18/02/11 12:07:44 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.111, 38368, None)
18/02/11 12:07:45 INFO handler.ContextHandler: Started [email protected]5db3d57c{/metrics/json,null,AVAILABLE,@Spark}
18/02/11 12:07:46 INFO scheduler.EventLoggingListener: Logging events to file:/tmp/spark-events/application_1518316627470_0003
18/02/11 12:07:53 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
18/02/11 12:07:54 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 290.1 KB, free 116.7 MB)
18/02/11 12:07:54 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.7 KB, free 116.7 MB)
18/02/11 12:07:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.111:38368 (size: 23.7 KB, free: 116.9 MB)
18/02/11 12:07:54 INFO spark.SparkContext: Created broadcast 0 from textFile at WordCount.scala:22
18/02/11 12:07:54 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.113:35724) with ID 1
18/02/11 12:07:55 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop3:33799 with 117.0 MB RAM, BlockManagerId(1, hadoop3, 33799, None)
18/02/11 12:07:55 INFO mapred.FileInputFormat: Total input paths to process : 1
18/02/11 12:07:58 INFO spark.SparkContext: Starting job: take at WordCount.scala:26
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCount.scala:24)
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Got job 0 (take at WordCount.scala:26) with 1 output partitions
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (take at WordCount.scala:26)
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24), which has no missing parents
18/02/11 12:08:03 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 116.7 MB)
18/02/11 12:08:03 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 116.6 MB)
18/02/11 12:08:03 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.111:38368 (size: 2.8 KB, free: 116.9 MB)
18/02/11 12:08:03 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/02/11 12:08:04 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1))
18/02/11 12:08:04 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks
18/02/11 12:08:05 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop3, executor 1, partition 0, PROCESS_LOCAL, 4856 bytes)
18/02/11 12:08:07 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop3:33799 (size: 2.8 KB, free: 117.0 MB)
18/02/11 12:08:08 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop3:33799 (size: 23.7 KB, free: 116.9 MB)
18/02/11 12:08:11 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hadoop3, executor 1, partition 1, PROCESS_LOCAL, 4856 bytes)
18/02/11 12:08:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 7027 ms on hadoop3 (executor 1) (1/2)
18/02/11 12:08:12 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 446 ms on hadoop3 (executor 1) (2/2)
18/02/11 12:08:12 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/02/11 12:08:12 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:24) finished in 7.435 s
18/02/11 12:08:12 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/02/11 12:08:12 INFO scheduler.DAGScheduler: running: Set()
18/02/11 12:08:12 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
18/02/11 12:08:12 INFO scheduler.DAGScheduler: failed: Set()
18/02/11 12:08:12 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:24), which has no missing parents
18/02/11 12:08:12 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 116.6 MB)
18/02/11 12:08:12 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2013.0 B, free 116.6 MB)
18/02/11 12:08:12 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.111:38368 (size: 2013.0 B, free: 116.9 MB)
18/02/11 12:08:12 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/02/11 12:08:12 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:24) (first 15 tasks are for partitions Vector(0))
18/02/11 12:08:12 INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks
18/02/11 12:08:12 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, hadoop3, executor 1, partition 0, NODE_LOCAL, 4632 bytes)
18/02/11 12:08:12 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hadoop3:33799 (size: 2013.0 B, free: 116.9 MB)
18/02/11 12:08:12 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.1.113:35724
18/02/11 12:08:12 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 150 bytes
18/02/11 12:08:12 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 425 ms on hadoop3 (executor 1) (1/1)
18/02/11 12:08:12 INFO scheduler.DAGScheduler: ResultStage 1 (take at WordCount.scala:26) finished in 0.427 s
18/02/11 12:08:12 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
18/02/11 12:08:12 INFO scheduler.DAGScheduler: Job 0 finished: take at WordCount.scala:26, took 14.817908 s
(package,1)
(this,1)
(Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1)
(Because,1)
(Python,2)
(page](http://spark.apache.org/documentation.html).,1)
(cluster.,1)
(its,1)
([run,1)
(general,3)
(have,1)
(pre-built,1)
(YARN,,1)
(locally,2)
(changed,1)
(locally.,1)
(sc.parallelize(1,1)
(only,1)
(several,1)
(This,2)
18/02/11 12:08:12 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/02/11 12:08:13 INFO server.AbstractConnector: Stopped [email protected]{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/02/11 12:08:13 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.1.111:4040
18/02/11 12:08:13 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on hadoop3:33799 in memory (size: 2013.0 B, free: 116.9 MB)
18/02/11 12:08:13 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.1.111:38368 in memory (size: 2013.0 B, free: 116.9 MB)
18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
18/02/11 12:08:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
18/02/11 12:08:13 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Stopped
18/02/11 12:08:13 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/02/11 12:08:13 INFO memory.MemoryStore: MemoryStore cleared
18/02/11 12:08:13 INFO storage.BlockManager: BlockManager stopped
18/02/11 12:08:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/02/11 12:08:13 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/02/11 12:08:13 INFO spark.SparkContext: Successfully stopped SparkContext
18/02/11 12:08:13 INFO util.ShutdownHookManager: Shutdown hook called
18/02/11 12:08:13 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-3b26c620-946b-4efe-a60b-d101e32ec42a