MapReduce優化例項（自定義Partition Combiner）

阿新 • • 發佈：2019-02-04

MapReduce優化例項

1.案例介紹

我們使用簡單的成績資料集，統計出0~20、20~50、50~100這三個年齡段的男、女學生的最高分數

2.資料集

姓名	年齡	性別	成績
Alice	23	female	45
Bob	34	male	89
Chris	67	male	97
Kristine	38	female	53
Connor	25	male	27
Daniel	78	male	95
James	34	male	79
Alex	52	male	69

3、分析

基於需求，我們通過以下幾步完成：
1、編寫Mapper類，按需求將資料集解析為key=gender，value=name+age+score，然後輸出
2、編寫Partitioner類，按年齡段，將結果指定給不同的Reduce執行
3、編寫Reducer類，分別統計出男女學生的最高分
4、編寫run方法執行MapReduce作業

4、程式碼實現

package org.bigdata.hadoop;

import java.io.IOException;

import 
 org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import 
 org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * @ProjectName BestScoreCount
 * @PackageName com.buaa
 * @ClassName Gender *
 * @Description 統計不同年齡段內，男、女最高分數
 * @Author Administartor
 * @Date 2017-07-31 21:49:50
 */

public class GenderMR extends Configured implements Tool {
    private static String TAB_SEPARATOR = "\t";

    public static class GenderMapper extends Mapper<LongWritable, Text, Text, Text> {
        /**
         * 呼叫map解析一行資料，該行的資料儲存在value引數中，然後根據\t分隔符，
         * 解析出姓名，年齡，性別和成績
         */
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            /**
             * 姓名 年齡 性別 成績
             * Alice 23 female 45 * 每個欄位的分隔符是tab鍵
             */

            // 使用\t,分割資料
            String[] tokens = value.toString().split(TAB_SEPARATOR);
            // 性別
            String gender = tokens[2];
            // 姓名 年齡 成績
            String nameAgeScore = tokens[0] + TAB_SEPARATOR + tokens[1] + TAB_SEPARATOR + tokens[3];
            // 輸出key=gender value=name+age+score
            context.write(new Text(gender), new Text(nameAgeScore));
        }
    }


/**
 * 合併 Mapper輸出結果
 */
public static class GenderCombiner extends Reducer<Text, Text, Text, Text> {
    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        int maxScore = Integer.MIN_VALUE;
        int score = 0;
        String name = " ";
        String age = " ";
        for (Text val : values) {
            String[] valTokens = val.toString().split(TAB_SEPARATOR);
            score = Integer.parseInt(valTokens[2]);
            if (score > maxScore) {
                name = valTokens[0];
                age = valTokens[1];
                maxScore = score;
            }
        }
        context.write(key, new Text(name + TAB_SEPARATOR + age + TAB_SEPARATOR + maxScore));
    }
}

/**
 * 根據 age年齡段將map輸出結果均勻分佈在reduce上
 */
public static class GenderPartitioner extends Partitioner<Text, Text> {
    @Override
    public int getPartition(Text key, Text value, int numReduceTasks) {
        String[] nameAgeScore = value.toString().split(TAB_SEPARATOR);
        // 學生年齡
        int age = Integer.parseInt(nameAgeScore[1]);
        // 預設指定分割槽 0
        if (numReduceTasks == 0) {
            return 0;
        }
        // 年齡小於等於20，指定分割槽0
        if (age <= 20) {
            return 0;
        } else if (age <= 50) {
            // 年齡大於20，小於等於50，指定分割槽1
            return 1 % numReduceTasks;
        } else {
            // 剩餘年齡，指定分割槽2
            return 2 % numReduceTasks;
        }
    }
}
    /* * 統計出不同性別的最高分 */
    public static class GenderReducer extends Reducer<Text, Text, Text, Text> {
        @Override
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            int maxScore = Integer.MIN_VALUE;
            int score = 0;
            String name = " ";
            String age = " ";
            String gender = " ";
            // 根據key，迭代 values集合，求出最高分
            for (Text val : values) {
                String[] valTokens = val.toString().split(TAB_SEPARATOR);
                score = Integer.parseInt(valTokens[2]);
                if (score > maxScore) {
                    name = valTokens[0];
                    age = valTokens[1];
                    gender = key.toString();
                    maxScore = score;
                }
            }
            context.write(new Text(name), new Text("age：" + age + TAB_SEPARATOR + "gender：" + gender + TAB_SEPARATOR + "score：" + maxScore));
        }

    }

    public int run(String[] args) throws Exception {
        // 讀取配置檔案
        Configuration conf = new Configuration();

        // 新建一個任務
        Job job = Job.getInstance(conf, this.getClass().getSimpleName());
        // 主類
        job.setJarByClass(Gender.class);
         // 輸入路徑
         Path inPath = new Path(args[0])
        FileInputFormat.addInputPath(job, inPath);
        // Mapper
        job.setMapperClass(GenderMapper.class);
        // Reducer
        job.setReducerClass(GenderReducer.class);
        // map 輸出key型別
        job.setMapOutputKeyClass(Text.class);
        // map 輸出value型別
        job.setMapOutputValueClass(Text.class);
        // reduce 輸出key型別
        job.setOutputKeyClass(Text.class);
        // reduce 輸出value型別
        job.setOutputValueClass(Text.class);
        // 設定Combiner類
        job.setCombinerClass(GenderCombiner.class);
        // 設定Partitioner類
        job.setPartitionerClass(GenderPartitioner.class);
        //reduce個數設定為3
        job.setNumReduceTasks(3);

        // 輸出路徑
        Path outpath = new Path(args[1]);
        FileSystem fs = mypath.getFileSystem(conf);
        if (fs.exists(outpath)) {
            fs.delete(outpath, true);
        }
        FileOutputFormat.setOutputPath(job, outPath);
        // 提交任務
        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
      Configuration conf = new Configuration();
        args = new String[]{               "hdfs://com.learn.bigdata:8020/input/gender.txt",
  "hdfs://com.learn.bigdta:8020/output"};
        int status = ToolRunner.run(
        conf,
        new GenderMR(), 
        args);
        System.exit(status);
    }
}

MapReduce優化例項（自定義Partition Combiner）

本文轉載自 MapReduce優化例項 1.案例介紹我們使用簡單的成績資料集，統計出0~20、20~50、50~100這三個年齡段的男、女學生的最高分數 2.資料集姓名年齡性別成績 Alic

python第九天（自定義函數）

但是程序 turn return 默認自定義括號 none for 一、.函數：　　　　如果檢測一個元素的長度用len非常簡單，也可以寫個循環來做。 s = ‘asd‘ def my_len(): i = 0 for k in s:

ASP.NET WebApi OWIN 實現 OAuth 2.0（自定義獲取 Token）

href timespan 獲取 edi prot cep b- med 2-0 相關文章：ASP.NET WebApi OWIN 實現 OAuth 2.0 之前的項目實現，Token 放在請求頭的 Headers 裏面，類似於這樣： Accept: application

idea+maven + spring security +springmvc入門（自定義登入頁面），附idea如何建立web專案

第一次使用idea，上午在eclipse中學習了spring security 入門，下午試試在idea中搭建。剛開始我以為直接將eclipse的檔案 copy過來就行了，結果發現copy過來以後各種報錯。後來把m

Android應用--簡美音樂播放器獲取專輯圖片（自定義列表介面卡）

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

vue+ElementUI+高德API地址模糊搜尋（自定義UI元件）

開發環境描述： Vue.js ElementUI 高德地圖API 需求描述：在新增地址資訊的時候，我們需要根據input輸入的關鍵字呼叫地圖的輸入提示API，獲取到返回的資料，並根據這些資料生成下拉列表，選擇某一個即獲取當前的地址相關資訊（包括位置名稱、經緯度、街區、城市、id等資

4、php的錯誤異常處理（自定義異常類）

一、php自帶的異常處理類： <?php class Exception { protected $message; // 異常資訊 protected $code; //

Spring Boot與Logback的運用（自定義異常+AOP）

在開發以及除錯過程中，程式設計師對日誌的需求是非常大的，出了什麼問題，都要通過日誌去進行排查，但是如果日誌不清或者雜亂無章，則不利於維護這邊就比較詳細的列舉幾種型別的日誌，供大家參考首先明白logback日誌是Spring Boot自帶的，不需要引入額外的包 <depend

上傳檔案動態生成目錄（自定義工具類）

public class UploadUtils { // 方式一：使用用目錄層級分離 public static String getPath(String uuidFileName){ // 使用唯一檔名.hashCode();

使用順序表求解約瑟夫環問題（自定義順序表）

約瑟夫環（Josephus）問題：古代某法官要判決n個犯人的死刑，他有一條荒唐的法律，將犯人站成一個圓圈，從第s個人開始數起，每數到第d個犯人，就拉出來處決，然後再數d個，數到的人再處決……直到剩下的最後一個可赦免。當n=5，s=1，d=2，時：第一步：定義一個順序表Se

小程式開發的一些經驗（自定義picker元件）

最近這段時間接了一個小程式的開發，開發了一段時間，總結一些經驗，與大家交流下。 1.小程式的頭部title，可以在json裡配置，也可以動態修改。 2.微信小程式連續點選跳轉頁面會跳轉多個頁面，可以加個公共方法，可以加在util.js裡，比如： let button

微信小程式之分享或轉發功能（自定義button樣式）

小程式頁面內發起轉發通過給 button 元件設定屬性open-type="share"，可以在使用者點選按鈕後觸發 Page.onShareAppMessage 事件，如果當前頁面沒有定義此事件，則點選後無效果。相關元件：button wxml：  <!--/

瀑布流（自定義佈局實現）

這篇文章主要分享如何用自定義佈局來實現瀑布流，關於瀑布流的其他實現方式可以參考我的另一篇文章瀑布流（UIScrollView實現），利用UICollectionView實現瀑布流有個非常大的好處就是我們不用關心重用機制，只把注重點放在如何自定義佈局來排布每一個

裁剪圖片（自定義裁剪範圍）

網上現在有很多的圖片剪下和從圖片庫中取出圖片並擷取的demo，但是大部分都是固定的大小，而且我們系統本來就帶可編輯的圖片裁剪功能。不過那個是正方形的，並不適用於我們多變的需求。 1.做裁剪圖片需要先放一個父view，然後再在上面放原始圖片imageview最後

Linux自動化運維之Cobbler（自定義重裝）

localhost reboot ace koan mirrors 虛擬機 all 更換 epel Cobbler?定義重裝當現有虛擬機運?出現故障後, 需要進?重裝操作, 可通過koan進?重裝系統將CentOS6重裝成CentOS7 1.客戶端安裝 koan #

Java 異常處理（自定義異常處理）

異常處理分為兩種： 1、系統異常處理 2、自定義異常處理下面分別來講解小編對這個的理解 1、系統異常處理 public class Abnormal { public static void main(String args[]) { P

Linux自動化運維之Cobbler（自定義系統安裝）

lin gateway onf ces -a entos 網卡定義 proc ##Cobbler自定義安裝由於 kickstart 指定某臺服務器使?某個具體的 ks ?件?較復雜，所以引? Cobbler 就變得? 常的簡單。通過物理MAC地址來區分。 Cobbler

IOS_設定UITableView Section的背景顏色和字型顏色（自定義section佈局）

section所顯示的灰色背景和白色字型是預設的，呼叫以下方法即可實現- (NSString *)tableView:(UITableView *)tableView titleForHeaderInSection:(NSInteger)section { ret

spring boot + mybatis + spring security（自定義登入介面）環境搭建

概述在前不久用了spring boot、mybatis、spring security搭建了一個工程，中間經歷了各種坑，最後得到一條經驗：spring的文件很詳細，入門最好以官方文件為準。這裡講的是以mav作為依賴管理工具pom搭建spring boot應用快捷的方式是在po

Spring Seurity系列（三）個性化使用者認證邏輯（自定義登入頁面）

一：自定義登入頁面： 1.1：訪問資源時如果沒有認證返回的是標準的登入頁面： @Configuration public class BrowserSecurityConfig extends WebSecurityConfigurerAdapter { @Bean

MapReduce優化例項（自定義Partition Combiner）

MapReduce優化例項

1.案例介紹

2.資料集

3、分析

4、程式碼實現

相關推薦