Spark的高階排序（二次排序）

阿新 • • 發佈：2018-12-31

為了多維的排序，需要考慮多個條件，這要求我們自定義key

二、使用java實現

2.1、自定義key

使用scala.math.Ordered介面,實現Serializable介面

package com.chb.sparkDemo.secondarySort;

import java.io.Serializable;

import scala.math.Ordered;
/**
 * Spark 二次排序自定義key
 * 使用scala.math.Ordered介面
 * @author 12285
 */ 

public class MyKey implements Ordered<MyKey>, Serializable{
    private int firstKey;
    private int secondKey;

    public MyKey(int firstKey, int secondKey) {
        super();
        this.firstKey = firstKey;
        this.secondKey = secondKey;
    }   

    public int getFirstKey() {
        return 
 firstKey;
    }
    public int getSecondKey() {
        return secondKey;
    }
    public void setFirstKey(int firstKey) {
        this.firstKey = firstKey;
    }
    public void setSecondKey(int secondKey) {
        this.secondKey = secondKey;
    }


    public boolean $greater(MyKey other) {
        if 
 (this.getFirstKey() > other.getFirstKey()) {
            return true;
        }else if(this.getFirstKey() == other.getFirstKey() && this.getSecondKey() > other.getSecondKey()){
            return true;
        }else {
            return false;
        }
    }
    public boolean $greater$eq(MyKey other) {
        if ($greater(other) || this.getFirstKey()==other.getFirstKey() && this.getSecondKey() == other.getSecondKey()) {
            return true;
        }
        return false;
    }
    public boolean $less(MyKey other) {
        if (this.getFirstKey() < other.getFirstKey()) {
            return true;
        }else if(this.getFirstKey() == other.getFirstKey() && this.getSecondKey() < other.getSecondKey()){
            return true;
        }else {
            return false;
        }
    }
    public boolean $less$eq(MyKey other) {
        if ($less(other) || this.getFirstKey()==other.getFirstKey() && this.getSecondKey() == other.getSecondKey()) {
            return true;
        }
        return false;
    }
    public int compare(MyKey other) {
        if (this.getFirstKey() != other.getFirstKey()) {
            return this.getFirstKey()-other.getFirstKey();
        }else {
            return this.getSecondKey() - other.getSecondKey();
        }
    }
    public int compareTo(MyKey other) {
        if (this.getFirstKey() != other.getFirstKey()) {
            return this.getFirstKey()-other.getFirstKey();
        }else {
            return this.getSecondKey() - other.getSecondKey();
        }
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + firstKey;
        result = prime * result + secondKey;
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        MyKey other = (MyKey) obj;
        if (firstKey != other.firstKey)
            return false;
        if (secondKey != other.secondKey)
            return false;
        return true;
    }
}

2.2、具體實現步驟

第一步: 自定義key 實現scala.math.Ordered介面，和Serializeable介面
第二步：將要進行二次排序的資料載入，按照<key，value>格式的RDD
第三步：使用sortByKey 基於自定義的key進行二次排序
第四步：去掉排序的key,只保留排序的結果

2.2.1、第一步: 自定義key 實現scala.math.Ordered介面，和Serializeable介面

        JavaPairRDD<MyKey, String> mykeyPairs = lines.mapToPair(new PairFunction<String, MyKey, String>() {

            private static final long serialVersionUID = 1L;

            public Tuple2<MyKey, String> call(String line) throws Exception {
                int firstKey = Integer.valueOf(line.split(" ")[0]);
                int secondKey = Integer.valueOf(line.split(" ")[1]);
                MyKey mykey = new MyKey(firstKey, secondKey);
                return new Tuple2<MyKey, String>(mykey, line);
            }
        });

2.2.2、第三步：使用sortByKey 基於自定義的key進行二次排序

    JavaPairRDD<MyKey, String> sortPairs = mykeyPairs.sortByKey();

2.2.3、第四步：去掉排序的key,只保留排序的結果

JavaRDD<String> result = sortPairs.map(new Function<Tuple2<MyKey,String>, String>() {
            private static final long serialVersionUID = 1L;

            public String call(Tuple2<MyKey, String> tuple) throws Exception {
                return tuple._2;//line
            }
        });
        //列印排序好的結果
        result.foreach(new VoidFunction<String>() {

            private static final long serialVersionUID = 1L;

            public void call(String line) throws Exception {
                System.out.println(line);
            }
        });

三、完整程式碼

package com.chb.sparkDemo.secondarySort;

import io.netty.handler.codec.http.HttpContentEncoder.Result;

import java.awt.image.RescaleOp;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;

import scala.Tuple2;

/**
 * Spark二次排序的具體實現步驟：
 * 第一步: 自定義key 實現scala.math.Ordered介面，和Serializeable介面
 * 第二步：將要進行二次排序的資料載入，按照<key，value>格式的RDD
 * 第三步：使用sortByKey 基於自定義的key進行二次排序
 * 第四步：去掉排序的key,只保留排序的結果
 * @author 12285
 *
 */
public class SecordSortTest {
    public static void main(String[] args) {
        SparkConf conf = new SparkConf().setMaster("local").setAppName("WordCount");
        //內部實際呼叫的SparkContext
        JavaSparkContext jsc = new JavaSparkContext(conf);
        //讀取檔案，將每行資料轉換為
        JavaRDD<String> lines = jsc.textFile("C:\\Users\\12285\\Desktop\\test");//hadoopRDD
        //第二步：將要進行二次排序的資料載入，按照<key，value>格式的RDD
        JavaPairRDD<MyKey, String> mykeyPairs = lines.mapToPair(new PairFunction<String, MyKey, String>() {

            private static final long serialVersionUID = 1L;

            public Tuple2<MyKey, String> call(String line) throws Exception {
                int firstKey = Integer.valueOf(line.split(" ")[0]);
                int secondKey = Integer.valueOf(line.split(" ")[1]);
                MyKey mykey = new MyKey(firstKey, secondKey);
                return new Tuple2<MyKey, String>(mykey, line);
            }
        });
        //第三步：使用sortByKey 基於自定義的key進行二次排序
        JavaPairRDD<MyKey, String> sortPairs = mykeyPairs.sortByKey();

        //第四步：去掉排序的key,只保留排序的結果

        JavaRDD<String> result = sortPairs.map(new Function<Tuple2<MyKey,String>, String>() {
            private static final long serialVersionUID = 1L;

            public String call(Tuple2<MyKey, String> tuple) throws Exception {
                return tuple._2;//line
            }
        });
        //列印排序好的結果
        result.foreach(new VoidFunction<String>() {

            private static final long serialVersionUID = 1L;

            public void call(String line) throws Exception {
                System.out.println(line);
            }
        });




    }
}


結果：
1 12
1 23
2 11
3 22
3 31
4 45

四、使用scala實現

4.1、自定義key


class SecordSortKey(val firstKey: Int, val secondKey: Int)extends Ordered[SecordSortKey] with Serializable{
    override def compare(that: SecordSortKey):Int = {
      if(this.firstKey != that.firstKey) {
        this.firstKey - that.firstKey
      }else {
        this.secondKey - that.secondKey
      }
    }  
  }

4.2、具體實現


import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object SecordSortTest {
    def main(args: Array[String]): Unit = {
      val conf = new SparkConf().setMaster("local[2]").setAppName("SecordSort")
      val sc = new SparkContext(conf);

      val lines = sc.textFile("C:\\Users\\12285\\Desktop\\test");
      //第二步：將要進行二次排序的資料載入，按照<key，value>格式的RDD
      val pairSortKey = lines.map { line => (
        new SecordSortKey(line.split(" ")(0).toInt, line.split(" ")(1).toInt),
        line
      ) };
      //第三步：使用sortByKey 基於自定義的key進行二次排序
     val sortPair = pairSortKey.sortByKey(false);

     val sortResult = sortPair.map(line=>line._2);

     sortResult.collect().foreach { x => print(x) };

    }
}

Spark：高階排序（二次排序）

為了多維的排序，需要考慮多個條件，這要求我們自定義key 1 23 3 22 3 31 1 12 2 11 4 45 二、使用java實現 2.1、自定義key 使用scala.math.Ordered介面,實現Serializable介面 package com.

Spark的高階排序（二次排序）

為了多維的排序，需要考慮多個條件，這要求我們自定義key 1 23 3 22 3 31 1 12 2 11 4 45 二、使用java實現 2.1、自定義key 使用scala.math.Ordered介面,實現Serializable介

大數據技術之輔助排序和二次排序案例（GroupingComparator）

group http pac ppr instance div lec tex boolean 大數據技術之輔助排序和二次排序案例（GroupingComparator） 1）需求有如下訂單數據訂單id 商品id 成交金額

Spark 使用sortByKey進行二次排序

Spark的sortByKey API允許自定義排序規則，這樣就可以進行自定義的二次排序、三次排序等等。先來看一下sortByKey的原始碼實現： def sortByKey(): JavaPairRDD[K, V] = sortByKey(true)

JMeter後置處理器使用詳解（二次開發）

一、外掛下載地址：百度網盤連結：https://pan.baidu.com/s/1WK7FVzq_PYYd2JEGX92rvQ 提取碼：shnw 二、使用條件 1.JMeter版本為3.3（在JMeter3.3的基礎上開發）； 2.將jar包放置到目錄…\lib\ext下重啟J

Ueditor 百度富文字框的使用（二次渲染）其他的在文件中都有

富文字編輯器有很多。好用的，不好用的，功能簡單的，功能複雜的。現在，我選擇的是百度的UEditor編輯器。這個編輯器的唯一有點就是功能多。比kindeditor 這些編輯器的功能要多。當然，像layui 提供的富文字框我沒有用，所以，現在不能拿來對比。因為當初想要用layui的時候，我套了一下

側滑（點選條目進行跳轉+點選更換頭像（二次取樣））+ViewPager

1.Layout佈局 <?xml version="1.0" encoding="utf-8"?> <android.support.v4.widget.DrawerLayout xmlns:android="http://schemas.android.com/apk

spark學習記錄（二、RDD）

一、概念 RDD（Resilient Distributed Dataset）叫做彈性分散式資料集，是Spark中最基本的資料抽象，它代表一個不可變、可分割槽、裡面的元素可平行計算的集合。RDD具有資料流模型的特點：自動容錯、位置感知性排程和可伸縮性。RDD允許使用者在執行多個查詢時顯式地將工作

2018.12.30【NOIP訓練】【SCOI2018】Numazu 的蜜柑（二次剩餘）

題面傳送門解析：直接解方程可以得到 a u

C# 避免程式重複啟動（二次啟動）

{bool requestInitialOwnership =true;bool mutexWasCreated;Mutex m =new Mutex(requestInitialOwnership,"MyMutex",out mutexWasCreated);if(!mutexWasCreated)

MapReduce排序之二次排序

一：背景 Hadoop中雖然有自動排序和分組，由於自帶的排序是按照Key進行排序的，有些時候，我們希望同時對Key和Value進行排序。自帶的排序功能就無法滿足我們了，還好Hadoop提供了一些元件可以讓開發人員進行二次排序。二：技術實現我們先來看案例需求 #需求1：

Ubuntu18.04.2視窗過小不能自適應（二次轉載）

解決Ubuntu在虛擬機器視窗不能自適應 2018年09月06日 16:20:08 起不了名兒閱讀數 8

《資料演算法-Hadoop/Spark大資料處理技巧》讀書筆記（一）——二次排序

寫在前面：在做直播的時候有同學問Spark不是用Scala語言作為開發語言麼，的確是的，從網上查資料的話也會看到大把大把的用Scala編寫的Spark程式，但是仔細看就會發現這些用Scala寫的文章

spark學習記錄（七、二次排序和分組取TopN問題）

1.二次排序例題：將兩列數字按第一列升序，如果第一列相同，則第二列升序排列資料檔案：https://download.csdn.net/download/qq_33283652/10894807 將資料封裝成物件，對物件進行排序，然後取出value public class Se

mapreduce程式設計（一）－二次排序

mr自帶的例子中的原始碼SecondarySort，我重新寫了一下，基本沒變。這個例子中定義的map和reduce如下，關鍵是它對輸入輸出型別的定義：（java泛型程式設計） public static class Map extends Mapper<LongW

Hadoop Mapreduce分割槽、分組、連線以及輔助排序（也叫二次排序）過程詳解

package com.hadoop; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import or

編程題＃3：奇偶排序（二）

log cnblogs 保存比較如果 () return names space #include <iostream> using namespace std; int main() { int n, a[1000]; // 一共n個數，n不超過

Java排序算法分析與實現：快排、冒泡排序、選擇排序、插入排序、歸並排序（二）

第一個元素 spa insert 循環冒泡排序 author 高級算法 ins -s 一、概述：　　上篇博客介紹了常見簡單算法：冒泡排序、選擇排序和插入排序。本文介紹高級排序算法：快速排序和歸並排序。在開始介紹算法之前，首先介紹高級算法所需要的基礎知識：劃分、遞歸，並順

數據結構與算法小結——排序（二）

由於優秀復雜度如圖所示 post bsp blog 1.2 間隔 1.2 希爾排序　　希爾排序屬於插入排序的一種，是直接插入排序的優化，其主要思想是：由於在序列基本有序的情況下，直接插入排序的效率很高，那麽，我們引入一個增量incre，把以incre為間隔的元素做一

【算法】排序（二）冒泡排序

-m and 我們 sta image system ring ole bce 上一篇給大家說了選擇排序的原理，這一次我們來說一說冒泡排序的原理其實冒泡排序和選擇排序一樣都是很簡單的排序方式。本文將介紹以下內容排序原理算法實現（JAVA）測試階段算法分析

Spark的高階排序（二次排序）

為了多維的排序，需要考慮多個條件，這要求我們自定義key

二、使用java實現

2.1、自定義key

使用scala.math.Ordered介面,實現Serializable介面

2.2、具體實現步驟

2.2.1、 第一步: 自定義key 實現scala.math.Ordered介面，和Serializeable介面

2.2.2、第三步：使用sortByKey 基於自定義的key進行二次排序

2.2.3、第四步：去掉排序的key,只保留排序的結果

三、完整程式碼

四、使用scala實現

4.1、自定義key

4.2、具體實現

相關推薦

2.2.1、第一步: 自定義key 實現scala.math.Ordered介面，和Serializeable介面