Hadoop基礎（二十二）：Shuffle機制（三）

阿新 • • 發佈：2020-07-19

7 Combiner合併

（6）自定義Combiner實現步驟

（a）自定義一個Combiner繼承Reducer，重寫Reduce方法

public class WordcountCombiner extends Reducer<Text, IntWritable, Text,IntWritable>{

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {

         
// 1 彙總操作
        int count = 0;
        for(IntWritable v :values){
            count += v.get();
        }

        // 2 寫出
        context.write(key, new IntWritable(count));
    }
}

（b）在Job驅動類中設定：

job.setCombinerClass(WordcountCombiner.class);

8Combiner合併案例實操

1．需求

統計過程中對每一個MapTask的輸出進行區域性彙總，以減小網路傳輸量即採用

Combiner功能。

（1）資料輸入

（2）期望輸出資料

期望：Combine輸入資料多，輸出時經過合併，輸出資料降低。

2．需求分析

圖4-15Combiner的合併案例

3．案例實操-方案一

1）增加一個WordcountCombiner類繼承Reducer

package com.atguigu.mr.combiner;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

 
public class WordcountCombiner extends Reducer<Text, IntWritable, Text, IntWritable>{

IntWritable v = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

        // 1 彙總
        int sum = 0;

        for(IntWritable value :values){
            sum += value.get();
        }

        v.set(sum);

        // 2 寫出
        context.write(key, v);
    }
}

View Code

2）在WordcountDriver驅動類中指定Combiner

// 指定需要使用combiner，以及用哪個類作為combiner的邏輯
job.setCombinerClass(WordcountCombiner.class);

4．案例實操-方案二

1）將WordcountReducer作為Combiner在WordcountDriver驅動類中指定

// 指定需要使用Combiner，以及用哪個類作為Combiner的邏輯
job.setCombinerClass(WordcountReducer.class);

執行程式，如圖4-16，4-17所示

圖4-16未使用前

圖4-17使用後

9 GroupingComparator分組（輔助排序）

對Reduce階段的資料根據某一個或幾個欄位進行分組。

分組排序步驟：

（1）自定義類繼承WritableComparator

（2）重寫compare()方法

@Override
public int compare(WritableComparable a, WritableComparable b) {
        // 比較的業務邏輯
        return result;
}

（3）建立一個構造將比較物件的類傳給父類

protected OrderGroupingComparator() {
        super(OrderBean.class, true);
}

10GroupingComparator分組案例實操

1．需求

有如下訂單資料

表4-2 訂單資料

訂單id	商品id	成交金額
0000001	Pdt_01	222.8
0000001	Pdt_02	33.8
0000002	Pdt_03	522.8
	Pdt_04	122.4
	Pdt_05	722.4
0000003	Pdt_06	232.8
0000003	Pdt_02	33.8

現在需要求出每一個訂單中最貴的商品。

（1）輸入資料

（2）期望輸出資料

1 222.8

2 722.4

3 232.8

2．需求分析

（1）利用“訂單id和成交金額”作為key，可以將Map階段讀取到的所有訂單資料按照id升序排序，如果id相同再按照金額降序排序，傳送到Reduce。

（2）在Reduce端利用groupingComparator將訂單id相同的kv聚合成組，然後取第一個即是該訂單中最貴商品，如圖4-18所示。

圖4-18過程分析

3．程式碼實現

（1）定義訂單資訊OrderBean類

package com.atguigu.mapreduce.order;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;

public class OrderBean implements WritableComparable<OrderBean> {

    private int order_id; // 訂單id號
    private double price; // 價格

    public OrderBean() {
        super();
    }

    public OrderBean(int order_id, double price) {
        super();
        this.order_id = order_id;
        this.price = price;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(order_id);
        out.writeDouble(price);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        order_id = in.readInt();
        price = in.readDouble();
    }

    @Override
    public String toString() {
        return order_id + "\t" + price;
    }

    public int getOrder_id() {
        return order_id;
    }

    public void setOrder_id(int order_id) {
        this.order_id = order_id;
    }

    public double getPrice() {
        return price;
    }

    public void setPrice(double price) {
        this.price = price;
    }

    // 二次排序
    @Override
    public int compareTo(OrderBean o) {

        int result;

        if (order_id > o.getOrder_id()) {
            result = 1;
        } else if (order_id < o.getOrder_id()) {
            result = -1;
        } else {
            // 價格倒序排序
            result = price > o.getPrice() ? -1 : 1;
        }

        return result;
    }
}

View Code

（2）編寫OrderSortMapper類

package com.atguigu.mapreduce.order;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class OrderMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable> {

    OrderBean k = new OrderBean();
    
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        
        // 1 獲取一行
        String line = value.toString();
        
        // 2 擷取
        String[] fields = line.split("\t");
        
        // 3 封裝物件
        k.setOrder_id(Integer.parseInt(fields[0]));
        k.setPrice(Double.parseDouble(fields[2]));
        
        // 4 寫出
        context.write(k, NullWritable.get());
    }
}

View Code

（3）編寫OrderSortGroupingComparator類

package com.atguigu.mapreduce.order;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;

public class OrderGroupingComparator extends WritableComparator {

    protected OrderGroupingComparator() {
        super(OrderBean.class, true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {

        OrderBean aBean = (OrderBean) a;
        OrderBean bBean = (OrderBean) b;

        int result;
        if (aBean.getOrder_id() > bBean.getOrder_id()) {
            result = 1;
        } else if (aBean.getOrder_id() < bBean.getOrder_id()) {
            result = -1;
        } else {
            result = 0;
        }

        return result;
    }
}

（4）編寫OrderSortReducer類

package com.atguigu.mapreduce.order;
import java.io.IOException;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;

public class OrderReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable> {

    @Override
    protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context)        throws IOException, InterruptedException {
        
        context.write(key, NullWritable.get());
    }
}

View Code

（5）編寫OrderSortDriver類

package com.atguigu.mapreduce.order;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class OrderDriver {

    public static void main(String[] args) throws Exception, IOException {

// 輸入輸出路徑需要根據自己電腦上實際的輸入輸出路徑設定
        args  = new String[]{"e:/input/inputorder" , "e:/output1"};

        // 1 獲取配置資訊
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        // 2 設定jar包載入路徑
        job.setJarByClass(OrderDriver.class);

        // 3 載入map/reduce類
        job.setMapperClass(OrderMapper.class);
        job.setReducerClass(OrderReducer.class);

        // 4 設定map輸出資料key和value型別
        job.setMapOutputKeyClass(OrderBean.class);
        job.setMapOutputValueClass(NullWritable.class);

        // 5 設定最終輸出資料的key和value型別
        job.setOutputKeyClass(OrderBean.class);
        job.setOutputValueClass(NullWritable.class);

        // 6 設定輸入資料和輸出資料路徑
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // 8 設定reduce端的分組
    job.setGroupingComparatorClass(OrderGroupingComparator.class);

        // 7 提交
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 : 1);
    }
}

Hadoop基礎（二十一）：Shuffle機制（二）

4WritableComparable排序 1．排序的分類 2．自定義排序WritableComparable （1）原理分析 bean物件做為key傳輸，需要實現WritableComparable介面重寫compareTo方法，就可以實現排序。

Hadoop基礎（二十二）：Shuffle機制（三）

7 Combiner合併（6）自定義Combiner實現步驟（a）自定義一個Combiner繼承Reducer，重寫Reduce方法

Hadoop基礎（二十九）：資料清洗（ETL）（二）複雜解析版

資料清洗案例實操-複雜解析版 1．需求對Web訪問日誌中的各欄位識別切分，去除日誌中不合法的記錄。根據清洗規則，輸出過濾後的資料。

Hadoop基礎（二十八）：資料清洗（ETL）（一）簡單解析版

資料清洗案例實操-簡單解析版在執行核心業務MapReduce程式之前，往往要先對資料進行清洗，清理掉不符合使用者要求的資料。清理的過程往往只需要執行Mapper程式，不需要執行Reduce程式。

Hadoop基礎（五十二）：企業級調優（二）

4 資料傾斜 4.1 合理設定 Map 數 1）通常情況下，作業會通過 input 的目錄產生一個或者多個 map 任務。

Hadoop基礎（四十三）：Hive 安裝（二）

1 MySql 安裝 1.1安裝包準備 1．檢視 mysql 是否安裝，如果安裝了，解除安裝 mysql （1）檢視

Flink基礎（二十二）：FLINK基本題（一）

1 簡單介紹一下 Flink 　　Flink 是一個框架和分散式處理引擎，用於對無界和有界資料流進行有狀態計算。並且 Flink 提供了資料分佈、容錯機制以及資源管理等核心功能。Flink提供了諸多高抽象層的API以便使用者編寫分

Flink基礎（四十二）：FLINK SQL(十八) 配置

0 配置 Table 和 SQL API 的預設配置能夠確保結果準確，同時也提供可接受的效能。

re正則表示式返回空值_Python 基礎（二十二）：正則表示式

技術標籤：re正則表示式返回空值 1. 簡介正則表示式是一個強大的字串處理工具，幾乎所有的字串操作都可以通過正則表示式來完成，其本質是一個特殊的字元序列，可以方便的檢查一個字串是否與我們定義的字元

C#資料結構與算法系列（十二）：遞迴（Recursion）

1.介紹簡單的說：遞迴就是方法自己呼叫自己，每次呼叫時傳入不同的變數，遞迴有助於程式設計者解決複雜的問題，同時也讓程式碼變得整潔

C#資料結構與算法系列（二十二）：快速排序演演算法（QuickSort）

1.介紹快速排序（QuickSort）是對氣泡排序的一種改進，基本思想是：通過一趟排序將要排序的資料分割成獨立的兩部分，

大資料實戰（四十二）：電商數倉（三十五）之使用者行為資料倉庫（二十一）月活躍率

月活躍使用者與截止到該月累計的使用者總和之間的比例 1 DWS層使用DWS層月活表以及ADS新增使用者表作為DWS層

大資料實戰（五十二）：電商數倉（三十五）之系統業務資料倉庫（八）GMV成交總額

1 ADS層 1.1 什麼是GMV 1.2建表語句 hive (gmall)> drop table if exists ads_gmv_sum_day; create external table ads_gmv_sum_day(

大資料實戰（五十九）：電商數倉（四十二）之系統業務資料倉庫（十五）商品每日銷量排行Top10

1 DWS層使用使用者購買商品明細寬表作為DWS資料 2 ADS層 2.1 建表語句 drop table if exists ads_goods_order_count_day;

大資料實戰（六十八）：電商數倉（五十二）Cloudera Manager（三）CM部署準備

1 連線阿里雲伺服器開啟遠端連線工具進行配置，這裡以CRT為例。 1）新建一個session

大資料實戰（七十二）：電商數倉（五十六）數倉之使用者行為數倉搭建（一）使用者行為日誌生成以及日誌資料匯入數倉

1使用者行為日誌生成 1）將log-collector-1.0-SNAPSHOT-jar-with-dependencies.jar上傳到hadoop102的/opt/module目錄

大資料實戰（七十八）：電商數倉（六十二）數倉之業務數倉搭建（五）ADS層（需求：GMV成交總額）

1建表語句 hive (gmall)> drop table if exists ads_gmv_sum_day; create external table ads_gmv_sum_day(

大資料實戰（八十二）：電商數倉（六十六）安全之Kerberos安全認證（二）Kerberos資料庫操作

1 登入Kerberos資料庫 1）本地登入（無需認證） [root@hadoop102 ~]# kadmin.local Authenticating as principal root/admin@HADOOP.COM with password.

大資料實戰（八十八）：電商數倉（七十二）叢集測試（一）叢集效能測試

1 DFSIO測試在Hadoop中包含很多的基準測試，用來驗證叢集的HDFS是不是設定合理，效能是不是達到預期，DFSIO是Hadoop的一個基準測試工具，被用來分析叢集HDFS的I/O效能。

Flink例項（三十一）：狀態管理（二）自定義鍵控狀態（一）ValueState

ValueState[T]儲存單個的值，值的型別為T。 get操作: ValueState.value() set操作: ValueState.update(value: T)

Hadoop基礎（二十二）：Shuffle機制（三）

7 Combiner合併

8Combiner合併案例實操

9 GroupingComparator分組（輔助排序）

10GroupingComparator分組案例實操

相關推薦