第五章_Spark核心程式設計_Rdd_轉換運算元_keyValue型_reduceByKey

阿新 • • 發佈：2022-03-24

1. 定義

  /*
  * 1. 定義
  *    def reduceByKey(func: (V, V) => V): RDD[(K, V)]
  *    def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]
  *    def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]
  *
  * 2. 功能
  *    按照相同的 Key 對 Value 進行聚合
  *    傳輸的函式為聚合規則(相對於MapReduce中的reduce函式)
  *
  * 3. note
  *   1. 傳送到reduce端聚合時,會在Map端預聚合(自動觸發)
  *   2. 只能操作 key-value型別的Rdd 

  *   3. 不指定分割槽器時,使用預設分割槽器和分割槽個數和父Rdd相同
  *   4. scala語言中一般的聚合操作都是兩兩聚合，spark基於scala開發的，所以它的聚合也是兩兩聚合
  *   5. 當key 只要一個元素時,引數函式時不會參與運算的
  *
  * */

2. 示例

  object reduceByKeyTest extends App {

    val sparkconf: SparkConf = new SparkConf().setMaster("local").setAppName("distinctTest")

    val sc: SparkContext  
= new SparkContext(sparkconf)

    val rdd: RDD[(Int, String)] = sc.makeRDD(List((2, "x1"), (2, "x2"), (3, "x3"), (4, "x4"), (4, "x5"), (4, "x6"), (5, "x7")), 2)

    // 實現 collect_list
    //var list = Arrary[String](1)

    private val rdd1 = rdd.reduceByKey(
      _ + "-" + _
    )

    println(s"${rdd1.collect().mkString(",")}")

    sc.stop()
  }

第五章_Spark核心程式設計_Rdd_轉換運算元_keyValue型_reduceByKey

1. 定義 /* * 1. 定義 *def reduceByKey(func: (V, V) => V): RDD[(K, V)] *def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]

第五章_Spark核心程式設計_Rdd_轉換運算元_keyValue型_groupByKey

1. 定義 /* * 1. 定義 *def groupByKey(): RDD[(K, Iterable[V])] *def groupByKey(partitioner: Partitioner): RDD[(K, Iterable[V])]

第五章_Spark核心程式設計_Rdd_轉換運算元_keyValue型_aggregateByKey

1. 定義 /* * 1. 定義 *def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner) *(seqOp: (U, V) => U,combOp: (U, U) => U): RDD[(K, U)]

第五章_Spark核心程式設計_Rdd_轉換運算元_keyValue型_foldByKey

1. 定義 /* * 1. 定義 *def foldByKey(zeroValue: V)(func: (V, V) => V): RDD[(K, V)] *def foldByKey(zeroValue: V,partitioner: Partitioner)(func: (V, V) => V): RDD[(K, V)]

第五章_Spark核心程式設計_Rdd_轉換運算元_keyValue型_combineByKey

1. 定義 /* * 1. 定義 *def combineByKey[C](createCombiner: V => C, *mergeValue: (C, V) => C, *mergeCombiners: (C, C) => C,

第五章_Spark核心程式設計_Rdd_轉換運算元_keyValue型_join&leftOuterJoin&rightOuterJoin&fullOuterJoin

1. join /* * 1.定義 *def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))] *def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]

第五章_Spark核心程式設計_Rdd_轉換運算元_keyValue型_cogroup

1. 定義 /* * 1.定義 *def cogroup[W](other: RDD[(K, W)]): RDD[(K, (Iterable[V], Iterable[W]))] *def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)])