提交stage

阿新 • • 發佈：2017-05-06

remember private filter figure

//提交stage，為stage創建一批task，task數量和partition數量相同

private def submitMissingTasks(stage: Stage, jobId: Int) {

logDebug("submitMissingTasks(" + stage + ")")

// Get our pending tasks and remember them in our pendingTasks entry

stage.pendingTasks.clear()

// First figure out the indexes of partition ids to compute.

//獲取要創建的task的數量

val partitionsToCompute: Seq[Int] = {

if (stage.isShuffleMap) {

(0 until stage.numPartitions).filter(id => stage.outputLocs(id) == Nil)

} else {

val job = stage.resultOfJob.get

(0 until job.numPartitions).filter(id => !job.finished(id))

}

val properties = if (jobIdToActiveJob.contains(jobId)) {

jobIdToActiveJob(stage.jobId).properties

} else {

// this stage will be assigned to "default" pool

null

}

//將stage加入runningstage隊列

runningStages += stage

// SparkListenerStageSubmitted should be posted before testing whether tasks are

// serializable. If tasks are not serializable, a SparkListenerStageCompleted event

// will be posted, which should always come after a corresponding SparkListenerStageSubmitted

// event.

stage.latestInfo = StageInfo.fromStage(stage, Some(partitionsToCompute.size))

outputCommitCoordinator.stageStart(stage.id)

listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))

// TODO: Maybe we can keep the taskBinary in Stage to avoid serializing it multiple times.

// Broadcasted binary for the task, used to dispatch tasks to executors. Note that we broadcast

// the serialized copy of the RDD and for each task we will deserialize it, which means each

// task gets a different copy of the RDD. This provides stronger isolation between tasks that

// might modify state of objects referenced in their closures. This is necessary in Hadoop

// where the JobConf/Configuration object is not thread-safe.

var taskBinary: Broadcast[Array[Byte]] = null

try {

// For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).

// For ResultTask, serialize and broadcast (rdd, func).

val taskBinaryBytes: Array[Byte] =

if (stage.isShuffleMap) {

closureSerializer.serialize((stage.rdd, stage.shuffleDep.get) : AnyRef).array()

} else {

closureSerializer.serialize((stage.rdd, stage.resultOfJob.get.func) : AnyRef).array()

}

taskBinary = sc.broadcast(taskBinaryBytes)

} catch {

// In the case of a failure during serialization, abort the stage.

case e: NotSerializableException =>

abortStage(stage, "Task not serializable: " + e.toString)

runningStages -= stage

return

case NonFatal(e) =>

abortStage(stage, s"Task serialization failed: $e\n${e.getStackTraceString}")

runningStages -= stage

return

}

//為stage創建指定數量的task

val tasks: Seq[Task[_]] = if (stage.isShuffleMap) {

partitionsToCompute.map { id =>

//給每個partition創建一個task

//給每個task計算最佳位置

val locs = getPreferredLocs(stage.rdd, id)

val part = stage.rdd.partitions(id)

//對於finalstage之外的stage的isShuffleMap都是true

//所以會創建ShuffleMapTask

new ShuffleMapTask(stage.id, taskBinary, part, locs)

}

} else {

//如果不是ShuffleMap，就會創建finalstage

//finalstage是穿件resultTask

val job = stage.resultOfJob.get

partitionsToCompute.map { id =>

val p: Int = job.partitions(id)

val part = stage.rdd.partitions(p)

//獲取task計算的最佳位置的方法 getPreferredLocs

val locs = getPreferredLocs(stage.rdd, p)

new ResultTask(stage.id, taskBinary, part, locs, id)

}

if (tasks.size > 0) {

logInfo("Submitting " + tasks.size + " missing tasks from " + stage + " (" + stage.rdd + ")")

stage.pendingTasks ++= tasks

logDebug("New pending tasks: " + stage.pendingTasks)

taskScheduler.submitTasks(

new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, properties))

stage.latestInfo.submissionTime = Some(clock.getTimeMillis())

} else {

// Because we posted SparkListenerStageSubmitted earlier, we should post

// SparkListenerStageCompleted here in case there are no tasks to run.

outputCommitCoordinator.stageEnd(stage.id)

listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))

logDebug("Stage " + stage + " is actually done; %b %d %d".format(

stage.isAvailable, stage.numAvailableOutputs, stage.numPartitions))

runningStages -= stage

}

def getPreferredLocs(rdd: RDD[_], partition: Int): Seq[TaskLocation] = {

getPreferredLocsInternal(rdd, partition, new HashSet)

}

//task對應partition的最佳位置

//就是從stage的最後一個RDD開始，找哪個RDD是被持久化了或者checkpoint

//那麽task的最佳位置就是緩存的/checkpoint 的 partition的位置

//因為這樣的話，task就在那個節點上執行，不需要計算之前的RDD

private def getPreferredLocsInternal(

rdd: RDD[_],

partition: Int,

visited: HashSet[(RDD[_],Int)])

: Seq[TaskLocation] =

{

// If the partition has already been visited, no need to re-visit.

// This avoids exponential path exploration. SPARK-695

if (!visited.add((rdd,partition))) {

// Nil has already been returned for previously visited partitions.

return Nil

}

// If the partition is cached, return the cache locations

//尋找當前RDD是否緩存了

val cached = getCacheLocs(rdd)(partition)

if (!cached.isEmpty) {

return cached

}

// If the RDD has some placement preferences (as is the case for input RDDs), get those

//尋找當前RDD是否checkpoint了

val rddPrefs = rdd.preferredLocations(rdd.partitions(partition)).toList

if (!rddPrefs.isEmpty) {

return rddPrefs.map(TaskLocation(_))

}

// If the RDD has narrow dependencies, pick the first partition of the first narrow dep

// that has any placement preferences. Ideally we would choose based on transfer sizes,

// but this will do for now.

//遞歸調用，看看父RDD是否緩存或者checkpoint

rdd.dependencies.foreach {

case n: NarrowDependency[_] =>

for (inPart <- n.getParents(partition)) {

val locs = getPreferredLocsInternal(n.rdd, inPart, visited)

if (locs != Nil) {

return locs

}

case _ =>

}

//如果從第一個RDD到最後一個RDD都沒有緩存或者checkpoint，那最佳位置就是Nil，也就是沒有最佳位置

//那他的位置就要由taskscheduler來分配

Nil

}

提交stage

remember private filter figure //提交stage，為stage創建一批task，task數量和partition數量相同 private def submitMissingTasks(stage: Stage, jobId: Int) { logDe

Spark原始碼走讀（三） —— Stage的劃分和提交

Stage的劃分繼續上節的分析。handleJobSubmitted的原始碼如下： private[scheduler] def handleJobSubmitted(jobId: Int, finalRDD: RDD[_], func:

Spark2.3.2原始碼解析： 10. 排程系統 Task任務提交（一） DAGScheduler 之 stage 提交

一個Spark Application分為stage級別和task級別的排程， task來源於stage，所有本文先從stage提交開始講解task任務提交。架構圖： Standalone模式提交執行流程圖：首先寫一個W

spark 原始碼分析之十九 -- Stage的提交

引言上篇 spark 原始碼分析之十九 -- DAG的生成和Stage的劃分中，主要介紹了下圖中的前兩個階段DAG的構建和Stage的劃分。本篇文章主要剖析，Stage是如何提交的。 rdd的依賴關係構成了DAG，DAGScheduler根據shuffle依賴關係將DAG圖劃分

JavaWeb網上圖書商城完整項目--day02-4.regist頁面提交表單時對所有輸入框進行校驗

word except 繼承 stub jstl use cti bmi imp 1、現在我們要將table表中的輸入的參數全部提交到後臺進行校驗，我們提交我們是按照表單的形式提交，所以我們首先需要在table表外面添加一個表單 <%@ page lang

Ajax提交數據判斷員工編號是否存在，及自動填充與員工編號所對應的員工姓名。

ajax lur 進行 stack extends 字符 spa pac pub JSP頁面中所需要的JavaScript事件及Ajax 1 <script type="text/javascript"> 2 function checkEmpNo(i

ssh框架提交表單數據後，數據庫表出現問號？的解決方法

div nco pla play character isp 指定提交表單 span 解決方法如下：連接數據庫的時候需要指定編碼方式，如下所示： jdbc.url = jdbc:mysql:///dbName?useUnicode=true&charact

JavaScript基礎 submit按鈕配合form的onsubmit實現表單的提交與驗證

java asc body bmi 是我技術 pos true value 鎮場詩：　　　　清心感悟智慧語，不著世間名與利。學水處下納百川，舍盡貢高我慢意。　　　　學有小成返哺根，願鑄一良心博客。誠心於此寫經驗，願見文者得啟發。—————————————————————

JavaScript基礎 submit按鈕結合onclick事件實現表單的提交與驗證

ret vs2015 基礎 result oct 學習資源 charset 簡單添加鎮場詩：　　　　清心感悟智慧語，不著世間名與利。學水處下納百川，舍盡貢高我慢意。　　　　學有小成返哺根，願鑄一良心博客。誠心於此寫經驗，願見文者得啟發。————————————————

Post提交和Get提交的區別

顯示改變 color 多個自身 height 數據 action 協議表單提交中get和post的區別 1. get: 把表單內各個字段均顯示在URL中。 post：把表單內各個字段和內容放在html的header內一起傳遞給action所指的url，用戶看不

使用jquery 動態創建form 並提交

get size event end fun () form nbsp delet $(document).ready(function(){ $("a.delete").click(function(event){ action = this.getA

提交本地項目到GitHub

mysql remove abs actor const sre then mes ref [email protected]/* */ MINGW64 /d/test$ git initInitialized empty Git repository in D

H-ui出現提交後沒辦法關閉

img log ima sublime image 問題 bsp http 搭建服務器可以用sublime代替服務器來解決，或者是webstorm可以自行搭建服務器來解決當前的問題。 sublime可以更改端口號自己加上一個服務器默認打開瀏覽器的 “快捷鍵

php表單提交圖片、音樂、視頻、文字，四種類型共同提交到數據庫

class 文件 loaded 頁面 ins 需要 ech video 文件是否存在這個問題一直困擾了我好幾天，終於在今天讓我給解決了，難以掩飾的激動。其實在之前沒有接觸到這種問題，只是表單提交數據而已，再就是圖片，四種類型同時提交還真是沒遇到過，做了一個系統，其中有

git 提交新項目，並修改用戶名以及提交郵箱

github php 本地有一個項目myweb，裏面有.git目錄。線上新建了git倉庫,gitweb。現在要把myweb提交到線上。直接在myweb目錄下，以免沖突，提交不了。所以我用了copy的方法。1.先把項目myweb的 .git目錄，刪除。 2.在/data目錄下，git clone線上的倉庫

SVN提交強制寫入註釋及相關擴展閱讀

svn hook 註釋假設SVN已經搭建成功,版本庫根目錄為/opt/svn/repos找到版本庫下面./hook/目錄,一共9個文件,簡單介紹一下文件說明1pre-commit.tmpl提交之前被執行2pre-lock.tmpl版本庫鎖定之前被執行3pre-revprop-change.tmp

表單的提交onsubmit事件

-s 方法 style ont pre scrip orm 調用 inpu 看代碼留個記錄： <script> function func(){ alert(‘false‘); return false; } </s

Stage生成Task

情況分享個數開始過程數量 images image sta 一、stage 的處理過程 1、從下圖可以看出stage是通過遞歸的形式，從開始依次提交每個stage，直到ResultStage。 2、生成task的主要代碼

mysql互為主從的環境，更新一條語句同時提交，為什麽會出現數據不一致？

mysql互為主從的環境更新一條語句同時提交為什麽會出現數據不一致？ mysql互為主從的環境，更新一條語句同時提交，為什麽會出現數據不一致？m1：begin;update t1 set c2=‘b1‘ where c1=2;commit;m2:begin;update t1 set c2=‘

使用ajax提交form表單，包括ajax文件上傳轉http://www.cnblogs.com/zhuxiaojie/p/4783939.html

ima option img jquery選擇器 open request resp logs ges 使用ajax提交form表單，包括ajax文件上傳前言使用ajax請求數據，很多人都會，比如說： $.post(path,{data:data},function

提交stage

相關推薦