1. 程式人生 > >RDD 5大特性 在原始碼中的體現

RDD 5大特性 在原始碼中的體現


  /**
   * :: DeveloperApi ::
   * Implemented by subclasses to compute a given partition.
   */
def compute(split: Partition, context: TaskContext): Iterator[T]

對應 RDD特點二

**
   * Implemented by subclasses to return the set of partitions in this RDD. This method will only
   * be called once, so it is safe to implement a time-consuming computation in it.
   *
   * The partitions in this array must satisfy the following property:
   *   `rdd.partitions.zipWithIndex.forall { case (partition, index) => partition.index == index }`
   */
protected def getPartitions: Array[Partition]

對應RDD特點一

  /**
   * Implemented by subclasses to return how this RDD depends on parent RDDs. This method will only
   * be called once, so it is safe to implement a time-consuming computation in it.
   */
protected def getDependencies: Seq[Dependency[_]] = deps

對應RDD特點三

 /**
   * Optionally overridden by subclasses to specify placement preferences.
   */
protected def getPreferredLocations(split: Partition): Seq[String] = Nil

對應特點五

 /** Optionally overridden by subclasses to specify how they are partitioned. */
@transient val partitioner: Option[Partitioner] = None

對應特點四