Spark PruneDependency 依賴關係 RangePartitioner
阿新 • • 發佈:2018-12-05
Spark PruneDependency 依賴關係 RangePartitioner
- Represents a dependency between the PartitionPruningRDD and its parent. In this
case, the child RDD contains a subset of partitions of the parents’.
更多資源
- github: https://github.com/opensourceteams/spark-scala-maven
- csdn(彙總視訊線上看): https://blog.csdn.net/thinktothings/article/details/84726769
youtub視訊演示
- https://youtu.be/YRQ6OaOXmPY (youtube視訊)
- https://www.bilibili.com/video/av37442139/?p=4 (bilibile視訊)
輸入資料
List(("a",2),("d",1),("b",8),("d",3)
處理程式scala
package com.opensource.bigdata.spark.local.rdd.operation.dependency.narrow.n_03_pruneDependency.n_02_filterByRange import com.opensource.bigdata.spark.local.rdd.operation.base.BaseScalaSparkContext object Run extends BaseScalaSparkContext{ def main(args: Array[String]): Unit = { val sc = pre() val rdd1 = sc.parallelize(List(("a",2),("d",1),("b",8),("d",3)),2) //ParallelCollectionRDD val rdd1Sort = rdd1.sortByKey() //ShuffleRDD val rdd2 =rdd1Sort.filterByRange("a","b") //MapParttionsRDD println("rdd \n" + rdd2.collect().mkString("\n")) sc.stop() } }