Spark MapOutputTracker原始碼分析

阿新 • • 發佈：2018-12-14

Spark MapOutputTracker原始碼分析

前置條件

Hadoop版本: Hadoop 2.6.0-cdh5.15.0
Spark版本: SPARK 1.6.0-cdh5.15.0

JDK.1.8.0_191
scala2.10.7

技能標籤

Spark ShuffleMapTask處理完成後，把MapStatus資料(BlockManagerId,[compressSize])傳送給MapOutputTrackerMaster.mapStatuses儲存
ResultTask對ShuffleMapTask輸出結果迭代ShuffleBlockFetcherIterator需要用到MapStatus

ShuffleMapTask

MapStatus

MapStatus 資料(BlockManagerId,[compressSize])

ShuffleRDD.compute()

呼叫BlockStoreShuffleReader.read()方法


  override def compute(split: Partition, context: TaskContext): Iterator[(K, C)] = {
    val dep = dependencies.head.asInstanceOf[ShuffleDependency[K, V, C]]
    SparkEnv.get.shuffleManager.getReader(dep.shuffleHandle, split.index, split.index + 1, context)
      .read()
      .asInstanceOf[Iterator[(K, C)]]
  }

BlockStoreShuffleReader.read

呼叫 mapOutputTracker.getMapSizesByExecutorId


override def read(): Iterator[Product2[K, C]] = {
    val streamWrapper: (BlockId, InputStream) => InputStream = { (blockId, in) =>
      blockManager.wrapForCompression(blockId,
        CryptoStreamUtils.wrapForEncryption(in, blockManager.conf))
    }

    val wrappedStreams = new ShuffleBlockFetcherIterator(
      context,
      blockManager.shuffleClient,
      blockManager,
      mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition),
      streamWrapper,
      // Note: we use getSizeAsMb when no suffix is provided for backwards compatibility
      SparkEnv.get.conf.getSizeAsMb("spark.reducer.maxSizeInFlight", "48m") * 1024 * 1024,
      SparkEnv.get.conf.getBoolean("spark.shuffle.detectCorrupt", true))

    val ser = Serializer.getSerializer(dep.serializer)
    val serializerInstance = ser.newInstance()

    // Create a key/value iterator for each stream
    val recordIter = wrappedStreams.flatMap { case (blockId, wrappedStream) =>
      // Note: the asKeyValueIterator below wraps a key/value iterator inside of a
      // NextIterator. The NextIterator makes sure that close() is called on the
      // underlying InputStream when all records have been read.
      serializerInstance.deserializeStream(wrappedStream).asKeyValueIterator
    }

    // Update the context task metrics for each record read.
    val readMetrics = context.taskMetrics.createShuffleReadMetricsForDependency()
    val metricIter = CompletionIterator[(Any, Any), Iterator[(Any, Any)]](
      recordIter.map(record => {
        readMetrics.incRecordsRead(1)
        record
      }),
      context.taskMetrics().updateShuffleReadMetrics())

    // An interruptible iterator must be used here in order to support task cancellation
    val interruptibleIter = new InterruptibleIterator[(Any, Any)](context, metricIter)

    val aggregatedIter: Iterator[Product2[K, C]] = if (dep.aggregator.isDefined) {
      if (dep.mapSideCombine) {
        // We are reading values that are already combined
        val combinedKeyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, C)]]
        dep.aggregator.get.combineCombinersByKey(combinedKeyValuesIterator, context)
      } else {
        // We don't know the value type, but also don't care -- the dependency *should*
        // have made sure its compatible w/ this aggregator, which will convert the value
        // type to the combined type C
        val keyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, Nothing)]]
        dep.aggregator.get.combineValuesByKey(keyValuesIterator, context)
      }
    } else {
      require(!dep.mapSideCombine, "Map-side combine without Aggregator specified!")
      interruptibleIter.asInstanceOf[Iterator[Product2[K, C]]]
    }

    // Sort the output if there is a sort ordering defined.
    dep.keyOrdering match {
      case Some(keyOrd: Ordering[K]) =>
        // Create an ExternalSorter to sort the data. Note that if spark.shuffle.spill is disabled,
        // the ExternalSorter won't spill to disk.
        val sorter =
          new ExternalSorter[K, C, C](context, ordering = Some(keyOrd), serializer = Some(ser))
        sorter.insertAll(aggregatedIter)
        context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
        context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
        context.internalMetricsToAccumulators(
          InternalAccumulator.PEAK_EXECUTION_MEMORY).add(sorter.peakMemoryUsedBytes)
        CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](sorter.iterator, sorter.stop())
      case None =>
        aggregatedIter
    }
  }

MapOutputTracker.getMapSizesByExecutorId

呼叫 MapOutputTracker.getStatuses()方法


/**
   * Called from executors to get the server URIs and output sizes for each shuffle block that
   * needs to be read from a given range of map output partitions (startPartition is included but
   * endPartition is excluded from the range).
   *
   * @return A sequence of 2-item tuples, where the first item in the tuple is a BlockManagerId,
   *         and the second item is a sequence of (shuffle block id, shuffle block size) tuples
   *         describing the shuffle blocks that are stored at that block manager.
   */
  def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, endPartition: Int)
      : Seq[(BlockManagerId, Seq[(BlockId, Long)])] = {
    logDebug(s"Fetching outputs for shuffle $shuffleId, partitions $startPartition-$endPartition")
    val statuses = getStatuses(shuffleId)
    // Synchronize on the returned array because, on the driver, it gets mutated in place
    statuses.synchronized {
      return MapOutputTracker.convertMapStatuses(shuffleId, startPartition, endPartition, statuses)
    }
  }

MapOutputTracker.getStatuses()

傳送訊息 askTracker[Array[Byte]](GetMapOutputStatuses(shuffleId))
訊息通過Outbox和Inbox進行傳送和收取，最後呼叫MapOutputTracker.receiveAndReply處理訊息
接收訊息 : MapOutputTracker.receiveAndReply


 /**
   * Get or fetch the array of MapStatuses for a given shuffle ID. NOTE: clients MUST synchronize
   * on this array when reading it, because on the driver, we may be changing it in place.
   *
   * (It would be nice to remove this restriction in the future.)
   */
  private def getStatuses(shuffleId: Int): Array[MapStatus] = {
    val statuses = mapStatuses.get(shuffleId).orNull
    if (statuses == null) {
      logInfo("Don't have map outputs for shuffle " + shuffleId + ", fetching them")
      val startTime = System.currentTimeMillis
      var fetchedStatuses: Array[MapStatus] = null
      fetching.synchronized {
        // Someone else is fetching it; wait for them to be done
        while (fetching.contains(shuffleId)) {
          try {
            fetching.wait()
          } catch {
            case e: InterruptedException =>
          }
        }

        // Either while we waited the fetch happened successfully, or
        // someone fetched it in between the get and the fetching.synchronized.
        fetchedStatuses = mapStatuses.get(shuffleId).orNull
        if (fetchedStatuses == null) {
          // We have to do the fetch, get others to wait for us.
          fetching += shuffleId
        }
      }

      if (fetchedStatuses == null) {
        // We won the race to fetch the statuses; do so
        logInfo("Doing the fetch; tracker endpoint = " + trackerEndpoint)
        // This try-finally prevents hangs due to timeouts:
        try {
          val fetchedBytes = askTracker[Array[Byte]](GetMapOutputStatuses(shuffleId))
          fetchedStatuses = MapOutputTracker.deserializeMapStatuses(fetchedBytes)
          logInfo("Got the output locations")
          mapStatuses.put(shuffleId, fetchedStatuses)
        } finally {
          fetching.synchronized {
            fetching -= shuffleId
            fetching.notifyAll()
          }
        }
      }
      logDebug(s"Fetching map output statuses for shuffle $shuffleId took " +
        s"${System.currentTimeMillis - startTime} ms")

      if (fetchedStatuses != null) {
        return fetchedStatuses
      } else {
        logError("Missing all output locations for shuffle " + shuffleId)
        throw new MetadataFetchFailedException(
          shuffleId, -1, "Missing all output locations for shuffle " + shuffleId)
      }
    } else {
      return statuses
    }
  }

MapOutputTracker.receiveAndReply

呼叫方法tracker.post(new GetMapOutputMessage(shuffleId, context))



  override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
    case GetMapOutputStatuses(shuffleId: Int) =>
      val hostPort = context.senderAddress.hostPort
      logInfo("Asked to send map output locations for shuffle " + shuffleId + " to " + hostPort)
      val mapOutputStatuses = tracker.post(new GetMapOutputMessage(shuffleId, context))

    case StopMapOutputTracker =>
      logInfo("MapOutputTrackerMasterEndpoint stopped!")
      context.reply(true)
      stop()
  }

MapOutputTrackerMaster.post


  // requests for map output statuses
  private val mapOutputRequests = new LinkedBlockingQueue[GetMapOutputMessage]
  
  def post(message: GetMapOutputMessage): Unit = {
    mapOutputRequests.offer(message)
  }

MapOutputTrackerMaster.MessageLoop
迴圈處理阻塞佇列中的訊息mapOutputRequests
呼叫方法 MapOutputTrackerMaster.getSerializedMapOutputStatuses()得到


 /** Message loop used for dispatching messages. */
  private class MessageLoop extends Runnable {
    override def run(): Unit = {
      try {
        while (true) {
          try {
            val data = mapOutputRequests.take()
             if (data == PoisonPill) {
              // Put PoisonPill back so that other MessageLoops can see it.
              mapOutputRequests.offer(PoisonPill)
              return
            }
            val context = data.context
            val shuffleId = data.shuffleId
            val hostPort = context.senderAddress.hostPort
            logDebug("Handling request to send map output locations for shuffle " + shuffleId +
              " to " + hostPort)
            val mapOutputStatuses = getSerializedMapOutputStatuses(shuffleId)
            context.reply(mapOutputStatuses)
          } catch {
            case NonFatal(e) => logError(e.getMessage, e)
          }
        }
      } catch {
        case ie: InterruptedException => // exit
      }
    }
  }

MapOutputTrackerMaster.getSerializedMapOutputStatuses
呼叫 MapOutputTrackerMaster.getSerializedMapOutputStatuses
反向推變數mapStatuses在哪裡被呼叫，賦值


def getSerializedMapOutputStatuses(shuffleId: Int): Array[Byte] = {
    var statuses: Array[MapStatus] = null
    var retBytes: Array[Byte] = null
    var epochGotten: Long = -1

    // Check to see if we have a cached version, returns true if it does
    // and has side effect of setting retBytes.  If not returns false
    // with side effect of setting statuses
    def checkCachedStatuses(): Boolean = {
      epochLock.synchronized {
        if (epoch > cacheEpoch) {
          cachedSerializedStatuses.clear()
          clearCachedBroadcast()
          cacheEpoch = epoch
        }
        cachedSerializedStatuses.get(shuffleId) match {
          case Some(bytes) =>
            retBytes = bytes
            true
          case None =>
            logDebug("cached status not found for : " + shuffleId)
            //此時的mapStatuses中已有值，儲存的是(shuffleId,[{BlockManagerId,[compressSize]}])
            statuses = mapStatuses.getOrElse(shuffleId, Array[MapStatus]())
            epochGotten = epoch
            false
        }
      }
    }

    if (checkCachedStatuses()) return retBytes
    var shuffleIdLock = shuffleIdLocks.get(shuffleId)
    if (null == shuffleIdLock) {
      val newLock = new Object()
      // in general, this condition should be false - but good to be paranoid
      val prevLock = shuffleIdLocks.putIfAbsent(shuffleId, newLock)
      shuffleIdLock = if (null != prevLock) prevLock else newLock
    }
    // synchronize so we only serialize/broadcast it once since multiple threads call
    // in parallel
    shuffleIdLock.synchronized {
      // double check to make sure someone else didn't serialize and cache the same
      // mapstatus while we were waiting on the synchronize
      if (checkCachedStatuses()) return retBytes

      // If we got here, we failed to find the serialized locations in the cache, so we pulled
      // out a snapshot of the locations as "statuses"; let's serialize and return that
      val (bytes, bcast) = MapOutputTracker.serializeMapStatuses(statuses, broadcastManager,
        isLocal, minSizeForBroadcast)
      logInfo("Size of output statuses for shuffle %d is %d bytes".format(shuffleId, bytes.length))
      // Add them into the table only if the epoch hasn't changed while we were working
      epochLock.synchronized {
        if (epoch == epochGotten) {
          cachedSerializedStatuses(shuffleId) = bytes
          if (null != bcast) cachedSerializedBroadcast(shuffleId) = bcast
        } else {
          logInfo("Epoch changed, not caching!")
          removeBroadcast(bcast)
        }
      }
      bytes
    }
  }

反向推mapStatuses

MapOutputTrackerMaster裡的變數mapStatuses在哪裡被呼叫
MapOutputTrackerMaster.registerMapOutputs
被DAGScheduler.handleTaskCompletion()方法呼叫

protected val mapStatuses = new TimeStampedHashMap[Int, Array[MapStatus]]()

  /** Register multiple map output information for the given shuffle */
  def registerMapOutputs(shuffleId: Int, statuses: Array[MapStatus], changeEpoch: Boolean = false) {
    mapStatuses.put(shuffleId, Array[MapStatus]() ++ statuses)
    if (changeEpoch) {
      incrementEpoch()
    }
  }

DAGScheduler.handleTaskCompletion()

ShuffleMapTask任務完成後匹配該項
shuffleStage.addOutputLoc(smt.partitionId, status)得到ShuffleMapTask的返回值
val status = event.result.asInstanceOf[MapStatus]
ShuffleMapTask完成時返回MapStage: (BlockManagerId,[compressSize])
DAGScheduler.handleTaskCompletion()被呼叫DAGScheduler.doOnReceive()方法中的訊息型別匹配: completion @ CompletionEvent
completion @ CompletionEvent被髮出: DAGScheduler.taskEnded
DAGScheduler.taskEnded被呼叫 TaskSetManager.handleSuccessfulTask()
TaskSetManager.handleSuccessfulTask()被呼叫:TaskSchedulerImpl.handleSuccessfulTask()
TaskSchedulerImpl.handleSuccessfulTask()被呼叫:TaskResultGetter.enqueueSuccessfulTask
TaskResultGetter.enqueueSuccessfulTask被呼叫: TaskSchedulerImpl.statusUpdate()方法，此時的任務狀態為TaskState.FINISHED
TaskSchedulerImpl.statusUpdate()方法由executor中任務完成後傳送給DriverEndpoint來觸發

 case smt: ShuffleMapTask =>
            val shuffleStage = stage.asInstanceOf[ShuffleMapStage]
            updateAccumulators(event)
            val status = event.result.asInstanceOf[MapStatus]
            val execId = status.location.executorId
            logDebug("ShuffleMapTask finished on " + execId)
            if (stageIdToStage(task.stageId).latestInfo.attemptId == task.stageAttemptId) {
              // This task was for the currently running attempt of the stage. Since the task
              // completed successfully from the perspective of the TaskSetManager, mark it as
              // no longer pending (the TaskSetManager may consider the task complete even
              // when the output needs to be ignored because the task's epoch is too small below.
              // In this case, when pending partitions is empty, there will still be missing
              // output locations, which will cause the DAGScheduler to resubmit the stage below.)
              shuffleStage.pendingPartitions -= task.partitionId
            }
            if (failedEpoch.contains(execId) && smt.epoch <= failedEpoch(execId)) {
              logInfo(s"Ignoring possibly bogus $smt completion from executor $execId")
            } else {
              // The epoch of the task is acceptable (i.e., the task was launched after the most
              // recent failure we're aware of for the executor), so mark the task's output as
              // available.
              shuffleStage.addOutputLoc(smt.partitionId, status)
              // Remove the task's partition from pending partitions. This may have already been
              // done above, but will not have been done yet in cases where the task attempt was
              // from an earlier attempt of the stage (i.e., not the attempt that's currently
              // running).  This allows the DAGScheduler to mark the stage as complete when one
              // copy of each task has finished successfully, even if the currently active stage
              // still has tasks running.
              shuffleStage.pendingPartitions -= task.partitionId
            }

            if (runningStages.contains(shuffleStage) && shuffleStage.pendingPartitions.isEmpty) {
              markStageAsFinished(shuffleStage)
              logInfo("looking for newly runnable stages")
              logInfo("running: " + runningStages)
              logInfo("waiting: " + waitingStages)
              logInfo("failed: " + failedStages)

              // We supply true to increment the epoch number here in case this is a
              // recomputation of the map outputs. In that case, some nodes may have cached
              // locations with holes (from when we detected the error) and will need the
              // epoch incremented to refetch them.
              // TODO: Only increment the epoch number if this is not the first time
              //       we registered these map outputs.
              //shuffleStage.outputLocInMapOutputTrackerFormat()得到ShuffleMapTask的返回值
              //ShuffleMapTask完成時返回(BlockManagerId,[compressSize])
              mapOutputTracker.registerMapOutputs(
                shuffleStage.shuffleDep.shuffleId,
                shuffleStage.outputLocInMapOutputTrackerFormat(),
                changeEpoch = true)

              clearCacheLocs()

              if (!shuffleStage.isAvailable) {
                // Some tasks had failed; let's resubmit this shuffleStage.
                // TODO: Lower-level scheduler should also deal with this
                logInfo("Resubmitting " + shuffleStage + " (" + shuffleStage.name +
                  ") because some of its tasks had failed: " +
                  shuffleStage.findMissingPartitions().mkString(", "))
                submitStage(shuffleStage)
              } else {
                // Mark any map-stage jobs waiting on this stage as finished
                if (shuffleStage.mapStageJobs.nonEmpty) {
                  val stats = mapOutputTracker.getStatistics(shuffleStage.shuffleDep)
                  for (job <- shuffleStage.mapStageJobs) {
                    markMapStageJobAsFinished(job, stats)
                  }
                }
              }

              // Note: newly runnable stages will be submitted below when we submit waiting stages
            }
        }

end

Spark MapOutputTracker原始碼分析

Spark MapOutputTracker原始碼分析更多資源分享 SPARK 原始碼分析技術分享(視訊彙總套裝視訊): https://www.bilibili.com/video/av37442139/ github: https://github.com/opensourceteams

Spark core原始碼分析之spark叢集的啟動（二）

2.2 Worker的啟動 org.apache.spark.deploy.worker 1 從Worker的伴生物件的main方法進入在main方法中首先是得到一個SparkConf例項conf，然後將conf和啟動Worker傳入的引數封裝得到Wor

《Spark核心原始碼分析與開發實戰》讀書筆記之一

第1章 Spark系統概述 1.1 Spark是什麼 1. Spark比Hadoop快在哪裡（1）Spark使用記憶體計算，而Hadoop使用IO （2）Hadoop的計算是按部就班一步一步進行的，而Spark則是提前生成了DAG，優化了運算路徑 1.2 Sp

Spark BlockManager原始碼分析

Spark BlockManager原始碼分析更多資源分享 SPARK 原始碼分析技術分享(視訊彙總套裝視訊): https://www.bilibili.com/video/av37442139/ github: https://github.com/opensou

Spark SQL 原始碼分析之Physical Plan 到 RDD的具體實現

我們都知道一段sql，真正的執行是當你呼叫它的collect()方法才會執行Spark Job，最後計算得到RDD。 lazy val toRdd: RDD[Row] = executedPlan.execute() Spark Plan基本包含4種操作型別，即Bas

Spark MLlib原始碼分析—Word2Vec原始碼詳解

以下程式碼是我依據SparkMLlib(版本1.6)中Word2Vec原始碼改寫而來，基本算是照搬。此版Word2Vec是基於Hierarchical Softmax的Skip-gram模型的實現。在決定讀懂原始碼前，博主建議讀者先看一下《Word2Vec_

spark mllib原始碼分析之二分類邏輯迴歸evaluation

在邏輯迴歸分類中，我們評價分類器好壞的主要指標有精準率（precision），召回率（recall），F-measure，AUC等，其中最常用的是AUC，它可以綜合評價分類器效能，其他的指標主要偏重一些方面。我們介紹下spark中實現的這些評價指標，便於使用sp

Spark-ThriftServer原始碼分析

Spark1.1之後的版本引入了ThriftServer和CLI，使得Hive使用者和RDBMS使用者可以直接通過JDBC方式提交SQL至Spark執行而無需編寫sparksql程式碼，下面對spark-thriftserver的原始碼進行簡單分析。執行${SPARK_H

spark mllib原始碼分析之隨機森林(Random Forest)（二）

4. 特徵處理這部分主要在DecisionTree.scala的findSplitsBins函式，將所有特徵封裝成Split，然後裝箱Bin。首先對split和bin的結構進行說明 4.1. 資料結構 4.1.1. Split cl

spark mllib原始碼分析之DecisionTree與GBDT

我們在前面的文章講過，在spark的實現中，樹模型的依賴鏈是GBDT-> Decision Tree-> Random Forest，前面介紹了最基礎的Random Forest的實現，在此基礎上我們介紹Decision Tree和GBDT的實現

spark mllib原始碼分析之L-BFGS（一）

1. 使用 spark給出的example中涉及到LBFGS有兩個，分別是LBFGSExample.scala和LogisticRegressionWithLBFGSExample.scala，第一個是直接使用LBFGS直接訓練，需要指定一系列優化引數，優

spark core原始碼分析15 Shuffle詳解－寫流程

Shuffle是一個比較複雜的過程，有必要詳細剖析一下內部寫的邏輯 ShuffleManager分為SortShuffleManager和HashShuffleManager 一、SortShu

spark mllib原始碼分析之隨機森林(Random Forest)（三）

6. 隨機森林訓練 6.1. 資料結構 6.1.1. Node 樹中的每個節點是一個Node結構 class Node @Since("1.2.0") ( @Since("1.0.0") val id: Int, @S

spark mllib原始碼分析之邏輯迴歸彈性網路ElasticNet（一）

spark在ml包中將邏輯迴歸封裝了下，同時在演算法中引入了L1和L2正則化，通過elasticNetParam來調節兩種正則化的係數，同時根據選擇的正則化，決定使用L-BFGS還是OWLQN優化，是謂Elastic Net。 1. 輔助類我們首先介紹

Spark MLlib原始碼分析—TFIDF原始碼詳解

以下程式碼是我依據SparkMLlib(版本1.6) 1、HashingTF 是使用雜湊表來儲存分詞，並計算分詞頻數（TF），生成HashMap表。在Map中，K為分詞對應索引號，V為分詞的頻數。在宣告HashingTF 時，需要設定numFeatures，該

Spark原始碼分析之Spark Shell（上）

https://www.cnblogs.com/xing901022/p/6412619.html 文中分析的spark版本為apache的spark-2.1.0-bin-hadoop2.7。 bin目錄結構： -rwxr-xr-x. 1 bigdata bigdata 1089 Dec

大資料之Spark（三）--- Spark核心API，Spark術語，Spark三級排程流程原始碼分析

一、Spark核心API ----------------------------------------------- [SparkContext] 連線到spark叢集,入口點. [HadoopRDD] extends RDD 讀取hadoop

Spark叢集啟動流程-Worker啟動-原始碼分析

Spark叢集啟動流程-Worker啟動-原始碼分析上篇文章介紹了Master啟動（Master啟動點選：https://blog.csdn.net/weixin_43637653/article/details/84073849 ），接下來，我們在原始碼裡繼續分析Worker的啟動

Spark叢集啟動流程-Master啟動-原始碼分析

Spark叢集啟動流程-Master啟動-原始碼分析總結： 1.初始化一些用於啟動Master的引數 2.建立ActorSystem物件，並啟動Actor 3.呼叫工具類AkkaUtils工具類來建立actorSystem（用來建立Actor的物件） 4.建立屬於Master的ac

【Spark核心原始碼】Word Count程式的簡單分析

目錄啟動Spark Shell 日誌級別的設定解析word count程式第0步：設定日誌級別（“可選”）第1步：讀取檔案第2步：將每行的內容根據空格進行拆分成單詞第3步：設定每一個單詞的計數為1 第4步：單詞根據Key進行計數值累加聚合第5步：輸出

Spark MapOutputTracker原始碼分析

Spark MapOutputTracker原始碼分析

更多資源分享

前置條件

技能標籤

ShuffleMapTask

MapStatus

ShuffleRDD.compute()

BlockStoreShuffleReader.read

MapOutputTracker.getMapSizesByExecutorId

MapOutputTracker.getStatuses()

MapOutputTracker.receiveAndReply

MapOutputTrackerMaster.post

反向推mapStatuses

DAGScheduler.handleTaskCompletion()

相關推薦