Spark原始碼閱讀筆記之Broadcast（三）

阿新 • • 發佈：2019-01-31

Broadcast的Torrent傳輸機制是通過TorrentBroadcastFactory和TorrentBroadcast來實現的。

由於Torrent機制對於Broadcast實際的儲存和傳輸都是通過BlockManager來實現的，因此TorrentBroadcastFactory的程式碼比較簡單，initialize和stop都沒有做任何事情，newBroadcast新建了一個TorrentBroadcast，unbroadcast函式則呼叫BlockManager的removeBroadcast清除在各個Executor上儲存的該Broadcast。

TorrentBroadcastFactory

程式碼

class TorrentBroadcastFactory extends BroadcastFactory {

  override def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager) { }

  override def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean, id: Long) = {
    new TorrentBroadcast[T](value_, id)
  }

  override def stop() { }

  /**
   * Remove all persisted state associated with the torrent broadcast with the given ID.
   * @param 
 removeFromDriver Whether to remove state from the driver.
   * @param blocking Whether to block until unbroadcasted
   */
  override def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean) {
    TorrentBroadcast.unpersist(id, removeFromDriver, blocking)
  }
}

分析TorrentBroadcast時需要注意兩點：1、快取機制；2、序列化和反序列化機制。先來看TorrentBroadcast

的程式碼：

private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long)
  extends Broadcast[T](id) with Logging with Serializable {

  /**
   * Value of the broadcast object on executors. This is reconstructed by [[readBroadcastBlock]],
   * which builds this value by reading blocks from the driver and/or other executors.
   *
   * On the driver, if the value is required, it is read lazily from the block manager.
   */
  @transient private lazy val _value: T = readBroadcastBlock()

  /** The compression codec to use, or None if compression is disabled */
  @transient private var compressionCodec: Option[CompressionCodec] = _
  /** Size of each block. Default value is 4MB.  This value is only read by the broadcaster. */
  @transient private var blockSize: Int = _

  private def setConf(conf: SparkConf) {
    compressionCodec = if (conf.getBoolean("spark.broadcast.compress", true)) {
      Some(CompressionCodec.createCodec(conf))
    } else {
      None
    }
    blockSize = conf.getInt("spark.broadcast.blockSize", 4096) * 1024
  }
  setConf(SparkEnv.get.conf)

  private val broadcastId = BroadcastBlockId(id)

  /** Total number of blocks this broadcast variable contains. */
  private val numBlocks: Int = writeBlocks(obj)

  override protected def getValue() = {
    _value
  }

  /**
   * Divide the object into multiple blocks and put those blocks in the block manager.
   * @param value the object to divide
   * @return number of blocks this broadcast variable is divided into
   */
  private def writeBlocks(value: T): Int = {
    // Store a copy of the broadcast variable in the driver so that tasks run on the driver
    // do not create a duplicate copy of the broadcast variable's value.
    SparkEnv.get.blockManager.putSingle(broadcastId, value, StorageLevel.MEMORY_AND_DISK,
      tellMaster = false)
    val blocks =
      TorrentBroadcast.blockifyObject(value, blockSize, SparkEnv.get.serializer, compressionCodec)
    blocks.zipWithIndex.foreach { case (block, i) =>
      SparkEnv.get.blockManager.putBytes(
        BroadcastBlockId(id, "piece" + i),
        block,
        StorageLevel.MEMORY_AND_DISK_SER,
        tellMaster = true)
    }
    blocks.length
  }

  /** Fetch torrent blocks from the driver and/or other executors. */
  private def readBlocks(): Array[ByteBuffer] = {
    // Fetch chunks of data. Note that all these chunks are stored in the BlockManager and reported
    // to the driver, so other executors can pull these chunks from this executor as well.
    val blocks = new Array[ByteBuffer](numBlocks)
    val bm = SparkEnv.get.blockManager

    for (pid <- Random.shuffle(Seq.range(0, numBlocks))) {
      val pieceId = BroadcastBlockId(id, "piece" + pid)
      logDebug(s"Reading piece $pieceId of $broadcastId")
      // First try getLocalBytes because there is a chance that previous attempts to fetch the
      // broadcast blocks have already fetched some of the blocks. In that case, some blocks
      // would be available locally (on this executor).
      def getLocal: Option[ByteBuffer] = bm.getLocalBytes(pieceId)
      def getRemote: Option[ByteBuffer] = bm.getRemoteBytes(pieceId).map { block =>
        // If we found the block from remote executors/driver's BlockManager, put the block
        // in this executor's BlockManager.
        SparkEnv.get.blockManager.putBytes(
          pieceId,
          block,
          StorageLevel.MEMORY_AND_DISK_SER,
          tellMaster = true)
        block
      }
      val block: ByteBuffer = getLocal.orElse(getRemote).getOrElse(
        throw new SparkException(s"Failed to get $pieceId of $broadcastId"))
      blocks(pid) = block
    }
    blocks
  }

  /**
   * Remove all persisted state associated with this Torrent broadcast on the executors.
   */
  override protected def doUnpersist(blocking: Boolean) {
    TorrentBroadcast.unpersist(id, removeFromDriver = false, blocking)
  }

  /**
   * Remove all persisted state associated with this Torrent broadcast on the executors
   * and driver.
   */
  override protected def doDestroy(blocking: Boolean) {
    TorrentBroadcast.unpersist(id, removeFromDriver = true, blocking)
  }

  /** Used by the JVM when serializing this object. */
  private def writeObject(out: ObjectOutputStream): Unit = Utils.tryOrIOException {
    assertValid()
    out.defaultWriteObject()
  }

  private def readBroadcastBlock(): T = Utils.tryOrIOException {
    TorrentBroadcast.synchronized {
      setConf(SparkEnv.get.conf)
      SparkEnv.get.blockManager.getLocal(broadcastId).map(_.data.next()) match {
        case Some(x) =>
          x.asInstanceOf[T]

        case None =>
          logInfo("Started reading broadcast variable " + id)
          val startTimeMs = System.currentTimeMillis()
          val blocks = readBlocks()
          logInfo("Reading broadcast variable " + id + " took" + Utils.getUsedTimeMs(startTimeMs))

          val obj = TorrentBroadcast.unBlockifyObject[T](
            blocks, SparkEnv.get.serializer, compressionCodec)
          // Store the merged copy in BlockManager so other tasks on this executor don't
          // need to re-fetch it.
          SparkEnv.get.blockManager.putSingle(
            broadcastId, obj, StorageLevel.MEMORY_AND_DISK, tellMaster = false)
          obj
      }
    }
  }

}

TorrentBroadcast初始化時會呼叫writeBlocks函式，該函式先呼叫TorrentBroadcast.blockifyObject函式將Broadcast分片，然後對於每個分片以BroadcastBlockId(id, “piece” + i)為BlockId儲存到BlockManager中。

TorrentBroadcast.blockifyObject函式根據配置（spark.broadcast.blockSize，預設為4M）將Broadcast分片，然後返回Array[ByteBuffer]，程式碼：

def blockifyObject[T: ClassTag](
      obj: T,
      blockSize: Int,
      serializer: Serializer,
      compressionCodec: Option[CompressionCodec]): Array[ByteBuffer] = {
    val bos = new ByteArrayChunkOutputStream(blockSize)
    val out: OutputStream = compressionCodec.map(c => c.compressedOutputStream(bos)).getOrElse(bos)
    val ser = serializer.newInstance()
    val serOut = ser.serializeStream(out)
    serOut.writeObject[T](obj).close()
    bos.toArrays.map(ByteBuffer.wrap)
  }

TorrentBroadcast的快取機制和HttpBroadcast一樣，但是序列化機制區別比較大。TorrentBroadcast序列化時不會序列化需要傳輸的value，而是序列化該Broadcast的Id，在反序列時仍然也只會反序列化Broadcast的Id。它通過一個lazy的變數（_value）來進行延遲初始化Broadcast實際的值，當在Executor中需要使用Broadcast中的值時，_value被初始化，_value初始化通過readBroadcastBlock函式來實現，該函式先嚐試從BlockManager中讀取快取著的Broadcast，若沒有則呼叫readBlocks函式從其他的Executor中讀取該Broadcast所有的分片，然後呼叫TorrentBroadcast.unBlockifyObject函式將分片組裝成最終的值。

readBroadcastBlock函式程式碼

private def readBroadcastBlock(): T = Utils.tryOrIOException {
    TorrentBroadcast.synchronized {
      setConf(SparkEnv.get.conf)
      SparkEnv.get.blockManager.getLocal(broadcastId).map(_.data.next()) match {
        case Some(x) =>
          x.asInstanceOf[T]

        case None =>
          logInfo("Started reading broadcast variable " + id)
          val startTimeMs = System.currentTimeMillis()
          val blocks = readBlocks()
          logInfo("Reading broadcast variable " + id + " took" + Utils.getUsedTimeMs(startTimeMs))

          val obj = TorrentBroadcast.unBlockifyObject[T](
            blocks, SparkEnv.get.serializer, compressionCodec)
          // Store the merged copy in BlockManager so other tasks on this executor don't
          // need to re-fetch it.
          SparkEnv.get.blockManager.putSingle(
            broadcastId, obj, StorageLevel.MEMORY_AND_DISK, tellMaster = false)
          obj
      }
    }
  }

readBlocks函式以隨機洗牌的順序從其他Executor中讀取Broadcast的所有分片，並且在每讀取到一個分片時儲存到BlockManager中以供其他的Executor讀取，因此實現了類似BitTorrent的傳輸機制。

readBlocks函式程式碼：

/** Fetch torrent blocks from the driver and/or other executors. */
  private def readBlocks(): Array[ByteBuffer] = {
    // Fetch chunks of data. Note that all these chunks are stored in the BlockManager and reported
    // to the driver, so other executors can pull these chunks from this executor as well.
    val blocks = new Array[ByteBuffer](numBlocks)
    val bm = SparkEnv.get.blockManager

    for (pid <- Random.shuffle(Seq.range(0, numBlocks))) {
      val pieceId = BroadcastBlockId(id, "piece" + pid)
      logDebug(s"Reading piece $pieceId of $broadcastId")
      // First try getLocalBytes because there is a chance that previous attempts to fetch the
      // broadcast blocks have already fetched some of the blocks. In that case, some blocks
      // would be available locally (on this executor).
      def getLocal: Option[ByteBuffer] = bm.getLocalBytes(pieceId)
      def getRemote: Option[ByteBuffer] = bm.getRemoteBytes(pieceId).map { block =>
        // If we found the block from remote executors/driver's BlockManager, put the block
        // in this executor's BlockManager.
        SparkEnv.get.blockManager.putBytes(
          pieceId,
          block,
          StorageLevel.MEMORY_AND_DISK_SER,
          tellMaster = true)
        block
      }
      val block: ByteBuffer = getLocal.orElse(getRemote).getOrElse(
        throw new SparkException(s"Failed to get $pieceId of $broadcastId"))
      blocks(pid) = block
    }
    blocks
  }

TorrentBroadcast.unBlockifyObject將得到的Broadcast的分片進行組裝，得到最終的value。程式碼：

def unBlockifyObject[T: ClassTag](
      blocks: Array[ByteBuffer],
      serializer: Serializer,
      compressionCodec: Option[CompressionCodec]): T = {
    require(blocks.nonEmpty, "Cannot unblockify an empty array of blocks")
    val is = new SequenceInputStream(
      asJavaEnumeration(blocks.iterator.map(block => new ByteBufferInputStream(block))))
    val in: InputStream = compressionCodec.map(c => c.compressedInputStream(is)).getOrElse(is)
    val ser = serializer.newInstance()
    val serIn = ser.deserializeStream(in)
    val obj = serIn.readObject[T]()
    serIn.close()
    obj
  }

結論：Spark的Broadcast模組通過廣播變數的形式來實現在各個Executor對不可變變數的共享。它通過Http和Torrent兩種機制來傳輸共享的變數。並且該模組是一個可以定製的模組，使用者可以通過實現BroadcastFactory和Broadcast介面，並配置spark.broadcast.factory引數來實現自己的Broadcast傳輸機制。

Spark原始碼閱讀筆記之Broadcast（三）

Spark原始碼閱讀筆記之Broadcast（三）

nsq源碼閱讀筆記之nsqd（三）——diskQueue

nsq源碼閱讀筆記之nsqd（一）——nsqd的配置解析和初始化

CLR VIA C# 閱讀筆記和感悟（三）

spring原始碼學習筆記-初始化（三）-BeanFactory

Kubernetes原始碼閱讀筆記——Controller Manager（之一）

前端筆記之Vue（三）生命周期&CSS預處理&全局組件&自定義指令

spark原始碼閱讀筆記Dataset（三）structField、structType、schame

JAVA 10原始碼閱讀筆記之JEP-307（G1的並行Full GC）

spark原始碼閱讀筆記Dataset（二）Dataset中Actions、function、transformations

Spark筆記之累加器（Accumulator）

小程式wepy踩坑之旅（三）----- 微信小程式wepy左滑刪除特效原始碼

mxnet原始碼閱讀筆記之include

Android Broadcast原理分析之LocalBroadcast（三）

我的spark學習之路（三）：利用spark做迴歸分析

深度學習筆記——理論與推導之DNN（三）

Robot Framework 原始碼閱讀筆記之二

window service 學習之路（三）【學習筆記】 -- 啟動服務出錯，提示1053 ，刪掉服務

Dubbo原始碼分析之SPI（三）

精盡MyBatis原始碼分析 - SQL執行過程（三）之 ResultSetHandler

Spark原始碼閱讀筆記之Broadcast（三）

相關推薦