1. 程式人生 > >hadoop-yarn叢集中,通過shell指令碼自動化提交spark任務

hadoop-yarn叢集中,通過shell指令碼自動化提交spark任務

Set() 18/02/11 12:07:32 INFO yarn.Client: Submitting application application_1518316627470_0003 to ResourceManager 18/02/11 12:07:32 INFO impl.YarnClientImpl: Submitted application application_1518316627470_0003 18/02/11 12:07:32 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1518316627470_0003 and
attemptId None 18/02/11 12:07:33 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:33 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start
time: 1518322052230 final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1518316627470_0003/ user: elon 18/02/11 12:07:34 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:35 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18
/02/11 12:07:36 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:37 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:38 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:39 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:40 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:41 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:42 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:43 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) 18/02/11 12:07:43 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop1, PROXY_URI_BASES -> http://hadoop1:8088/proxy/application_1518316627470_0003), /proxy/application_1518316627470_0003 18/02/11 12:07:43 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 18/02/11 12:07:43 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED) 18/02/11 12:07:44 INFO yarn.Client: Application report for application_1518316627470_0003 (state: RUNNING) 18/02/11 12:07:44 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.1.113 ApplicationMaster RPC port: 0 queue: default start time: 1518322052230 final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1518316627470_0003/ user: elon 18/02/11 12:07:44 INFO cluster.YarnClientSchedulerBackend: Application application_1518316627470_0003 has started running. 18/02/11 12:07:44 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38368. 18/02/11 12:07:44 INFO netty.NettyBlockTransferService: Server created on 192.168.1.111:38368 18/02/11 12:07:44 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 18/02/11 12:07:44 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.111, 38368, None) 18/02/11 12:07:44 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.111:38368 with 117.0 MB RAM, BlockManagerId(driver, 192.168.1.111, 38368, None) 18/02/11 12:07:44 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.111, 38368, None) 18/02/11 12:07:44 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.111, 38368, None) 18/02/11 12:07:45 INFO handler.ContextHandler: Started [email protected]5db3d57c{/metrics/json,null,AVAILABLE,@Spark} 18/02/11 12:07:46 INFO scheduler.EventLoggingListener: Logging events to file:/tmp/spark-events/application_1518316627470_0003 18/02/11 12:07:53 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) 18/02/11 12:07:54 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 290.1 KB, free 116.7 MB) 18/02/11 12:07:54 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.7 KB, free 116.7 MB) 18/02/11 12:07:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.111:38368 (size: 23.7 KB, free: 116.9 MB) 18/02/11 12:07:54 INFO spark.SparkContext: Created broadcast 0 from textFile at WordCount.scala:22 18/02/11 12:07:54 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.113:35724) with ID 1 18/02/11 12:07:55 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop3:33799 with 117.0 MB RAM, BlockManagerId(1, hadoop3, 33799, None) 18/02/11 12:07:55 INFO mapred.FileInputFormat: Total input paths to process : 1 18/02/11 12:07:58 INFO spark.SparkContext: Starting job: take at WordCount.scala:26 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCount.scala:24) 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Got job 0 (take at WordCount.scala:26) with 1 output partitions 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (take at WordCount.scala:26) 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0) 18/02/11 12:08:02 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24), which has no missing parents 18/02/11 12:08:03 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 116.7 MB) 18/02/11 12:08:03 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 116.6 MB) 18/02/11 12:08:03 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.111:38368 (size: 2.8 KB, free: 116.9 MB) 18/02/11 12:08:03 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 18/02/11 12:08:04 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1)) 18/02/11 12:08:04 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks 18/02/11 12:08:05 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop3, executor 1, partition 0, PROCESS_LOCAL, 4856 bytes) 18/02/11 12:08:07 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop3:33799 (size: 2.8 KB, free: 117.0 MB) 18/02/11 12:08:08 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop3:33799 (size: 23.7 KB, free: 116.9 MB) 18/02/11 12:08:11 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hadoop3, executor 1, partition 1, PROCESS_LOCAL, 4856 bytes) 18/02/11 12:08:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 7027 ms on hadoop3 (executor 1) (1/2) 18/02/11 12:08:12 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 446 ms on hadoop3 (executor 1) (2/2) 18/02/11 12:08:12 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/02/11 12:08:12 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:24) finished in 7.435 s 18/02/11 12:08:12 INFO scheduler.DAGScheduler: looking for newly runnable stages 18/02/11 12:08:12 INFO scheduler.DAGScheduler: running: Set() 18/02/11 12:08:12 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1) 18/02/11 12:08:12 INFO scheduler.DAGScheduler: failed: Set() 18/02/11 12:08:12 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:24), which has no missing parents 18/02/11 12:08:12 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 116.6 MB) 18/02/11 12:08:12 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2013.0 B, free 116.6 MB) 18/02/11 12:08:12 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.111:38368 (size: 2013.0 B, free: 116.9 MB) 18/02/11 12:08:12 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 18/02/11 12:08:12 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:24) (first 15 tasks are for partitions Vector(0)) 18/02/11 12:08:12 INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks 18/02/11 12:08:12 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, hadoop3, executor 1, partition 0, NODE_LOCAL, 4632 bytes) 18/02/11 12:08:12 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hadoop3:33799 (size: 2013.0 B, free: 116.9 MB) 18/02/11 12:08:12 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.1.113:35724 18/02/11 12:08:12 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 150 bytes 18/02/11 12:08:12 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 425 ms on hadoop3 (executor 1) (1/1) 18/02/11 12:08:12 INFO scheduler.DAGScheduler: ResultStage 1 (take at WordCount.scala:26) finished in 0.427 s 18/02/11 12:08:12 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 18/02/11 12:08:12 INFO scheduler.DAGScheduler: Job 0 finished: take at WordCount.scala:26, took 14.817908 s (package,1) (this,1) (Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1) (Because,1) (Python,2) (page](http://spark.apache.org/documentation.html).,1) (cluster.,1) (its,1) ([run,1) (general,3) (have,1) (pre-built,1) (YARN,,1) (locally,2) (changed,1) (locally.,1) (sc.parallelize(1,1) (only,1) (several,1) (This,2) 18/02/11 12:08:12 INFO spark.SparkContext: Invoking stop() from shutdown hook 18/02/11 12:08:13 INFO server.AbstractConnector: Stopped [email protected]{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 18/02/11 12:08:13 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.1.111:4040 18/02/11 12:08:13 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on hadoop3:33799 in memory (size: 2013.0 B, free: 116.9 MB) 18/02/11 12:08:13 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.1.111:38368 in memory (size: 2013.0 B, free: 116.9 MB) 18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 18/02/11 12:08:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 18/02/11 12:08:13 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Stopped 18/02/11 12:08:13 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/02/11 12:08:13 INFO memory.MemoryStore: MemoryStore cleared 18/02/11 12:08:13 INFO storage.BlockManager: BlockManager stopped 18/02/11 12:08:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 18/02/11 12:08:13 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/02/11 12:08:13 INFO spark.SparkContext: Successfully stopped SparkContext 18/02/11 12:08:13 INFO util.ShutdownHookManager: Shutdown hook called 18/02/11 12:08:13 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-3b26c620-946b-4efe-a60b-d101e32ec42a