MapReduce簡單例項解析map、reduce、combiner、partition一條龍

阿新 • • 發佈：2019-02-12

需求：通過MapReduce對紅樓夢TXT檔案統計笑、喜、哭、怒在全書的數量，使用combiner減少IO，通過partition輸出到兩個檔案中。
通過MapReduce外掛建立MapReduce project，這樣需要的包都會自動匯入

主函式：

package com.zhiyou100;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; 

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat 
;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class MyApp {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

        Configuration conf = new Configuration();
        conf.set("mapreduce.output.textoutputformat.separator" 
, ":");

        Path inputPath = new Path("hdfs://master:9000/mark/hlm-utf8.txt");
        Path outputPath = new Path("hdfs://master:9000/result/hml02");
        FileSystem fs = FileSystem.newInstance(conf);
        // 如果檔案已存在就刪除
        if (fs.exists(outputPath)) {
            fs.delete(outputPath, true);
        }
        fs.close();

        // job相當於一個model
        Job job = Job.getInstance(conf, "HLM");
        job.setJarByClass(MyApp.class);

        // 指定輸入目錄
        FileInputFormat.addInputPath(job, inputPath);
        // 指定對輸入資料進行格式化處理的類（可以省略）
        job.setInputFormatClass(TextInputFormat.class);
        // 指定自定義的Mapper類
        job.setMapperClass(MyMapper.class);

        // map輸入
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // 分割槽（可以省略）
        job.setPartitionerClass(MyPartition.class);
        // 設定要執行的Reducer的數量（可以省略）
        job.setNumReduceTasks(2);

        // 指定自定義的Reducer類
        job.setReducerClass(MyReducer.class);

        job.setCombinerClass(MyCombiner.class);

        // 泛型類在編譯時會被當成？所以要指定
        // 指定map輸出的<K,V>型別（如果<k3,v3>的型別與<k2,v2>的型別一致則可以省略）
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // 指定輸出目錄
        FileOutputFormat.setOutputPath(job, outputPath);

        // 指定對輸出資料進行格式化處理的類（可以省略）
        job.setOutputFormatClass(TextOutputFormat.class);

        // 把任務提交到叢集，輪尋方式
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

mapper：

package com.zhiyou100;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MyMapper extends Mapper<Object, Text, Text, IntWritable> {
    static {
        System.out.println("my_-mapper");
    }

    private IntWritable num = new IntWritable();
    private Text word = new Text();
    private int no = 0;

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

        StringTokenizer st = new StringTokenizer(value.toString(), "《 》 、 ！ ， 。 ？ ：；  “ ” ‘ ’ ");
        while (st.hasMoreElements()) {
            String text = st.nextElement().toString().trim();
            no += 1;
            context.getCounter("ZY", "statement").increment(1);

            if (text.contains("笑")) {
                word.set("笑");
                num.set(no);
                context.write(word, num);
            }
            if (text.contains("喜")) {
                word.set("喜");
                num.set(no);
                context.write(word, num);
            }

            if (text.contains("哭")) {
                word.set("哭");
                num.set(no);
                context.write(word, num);
            }
            if (text.contains("怒")) {
                word.set("怒");
                num.set(no);
                context.write(word, num);
            }
        }

    }

}

reduce：

package com.zhiyou100;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

    static {
        System.out.println("my_-reducer");
    }
    private IntWritable result = new IntWritable();

    @Override
    public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {

        int sum = 0;
        for(IntWritable val : values) {
            sum += val.get();
        }

        result.set(sum);
        context.write(key, result);

    }
}

combiner：

package com.zhiyou100;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyCombiner extends Reducer<Text, IntWritable, Text, IntWritable> {
    static {
        System.out.println("my_-combiner");
    }

    private IntWritable result = new IntWritable();

    @Override
    public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {

        int sum = 0;
        for(IntWritable val : values) {
            sum += 1;
        }

        result.set(sum);
        context.write(key, result);

    }
}

partition：

package com.zhiyou100;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

public class MyPartition extends Partitioner<Text, IntWritable>{
    static {
        System.out.println("my_-partition");
    }
    @Override
    public int getPartition(Text key, IntWritable value, int numPartitions) {
        if(key.toString().contains("笑") || key.toString().contains("喜")) {
            return 0;
        }else {
            return 1;
        }
    }

}

輸出結果：
這裡寫圖片描述

通過每個類上定義的靜態方法列印的日誌也可以看出job在呼叫MapReduce 及 combiner partition的先後順序
my_-mapper –> my_-combiner –> my_-partition –> my_-reducer

MapReduce簡單例項解析map、reduce、combiner、partition一條龍

需求：通過MapReduce對紅樓夢TXT檔案統計笑、喜、哭、怒在全書的數量，使用combiner減少IO，通過partition輸出到兩個檔案中。通過MapReduce外掛建立MapReduce project，這樣需要的包都會自動匯入主函式：

scikit-learn：4. 數據集預處理（clean數據、reduce降維、expand增維、generate特征提取）

ova trac ict mea res additive track oval mmc 本文參考：http://scikit-learn.org/stable/data_transforms.html 本篇主要講數據預處理，包含四部分：數據清洗、數據

Java 高階函式的簡單使用：map，reduce，filter，sorted

().map((x)->(x+100)).forEach((num)->{System.out.print(num+",");}); a.forEach(System.out::print); System.out.println(); } @Tes

Jdk8 高階函式的簡單使用：map，reduce，filter，sorted

轉載地址：https://blog.csdn.net/rongrong_love_lc/article/details/72845528 備份下： package test; import java.util.ArrayList; import java.util.IntSummaryS

SAXReader簡單例項解析HTML

轉載自：http://blog.csdn.net/seayqrain/article/details/5024068# 使用SAXReader需要匯入dom4j-full.jar包。 dom4j是一個Java的XML API，類似於jdom，用來讀寫XML檔案的。dom4

手動建立makefile簡單例項解析！

假設我們有一個程式由5個檔案組成，原始碼如下： /*main.c*/ #include "mytool1.h" #include "mytool2.h"int main() { mytool1_print("hello mytool1!"); mytool2_print("

mapreduce控制map分割槽、reduce排序實現TopN

實現一個javabean類，並實現writablecomplle介面 public class OrderBean implements WritableComparable<OrderBean>{ private String orderId; priv

python之sorted、map、reduce、join、split函式的例項操作

sorted 資料如下： key為選擇需要排序的元素；reverse為True，表示逆序排序。 reverse為False，表示順序排序。 map 資料如下。按lambda表示式操作。 reduce 按lambda表示式操作

hadoop 中map、reduce數量對mapreduce執行速度的影響

增加task的數量，一方面增加了系統的開銷，另一方面增加了負載平衡和減小了任務失敗的代價；map task的數量即mapred.map.tasks的引數值，使用者不能直接設定這個引數。Input Split的大小，決定了一個Job擁有多少個map。預設input spli

lambda 、map、reduce的簡單使用

lambda：這是Python支援一種有趣的語法，它允許你快速定義單行的最小函式什麼是lamda函式？lambda 函式是一個可以接收任意多個引數(包括可選引數)並且返回單個表示式值的函式。（注意：lambda 函式不能包含命令，它們所包含的表示式不能超過一個）lamda函

5個數組Array方法: indexOf、filter、forEach、map、reduce使用例項

在ES5中，一共有9個Array方法 : Array.prototype.indexOf Array.prototype.lastIndexOf Array.prototype.every Array.prototype.some Array.pro

內置方法map、reduce、filter

enc -- cti initial 代碼實現 cto top 過濾 port map： map(func, *iterables) --> map object Make an iterator that computes the function

Python中的lambda、map、filter、reduce、zip

sum http seq 是你解包 range 匿名函數三元 param lambda lambda是匿名函數，也就是沒有名字的函數。lambda的語法非常簡單：下面是一個lambda表達式的簡單例子：註意：我們可以把lambda表達式賦值給一個變量，然後通過這個

map、reduce函數

div 可選參數元素 space 參數 clas 返回結果一輪 lis map函數：接受一個函數 f 和一個 list 。格式：map( f , L)，對L中的每個元素，進行f(x)的一個操作。例如，對於list [1, 2, 3, 4, 5, 6, 7

js--codewars--Write Number in Expanded Form—filters、map、reduce、forEach

foreach遍歷稀疏矩陣匿名 lte con RM http () 檢查問題描述： you will be given a number and you will need to return it as a string in Expanded Form. For

swift中高階函數map、flatMap、filter、reduce

title pre tle 優點 www code html log 編程 Swift相比於Objective-C又一個重要的優點，它對函數式編程提供了很好的支持，Swift提供了map、filter、reduce這三個高階函數作為對容器的支持。 1 map：可以對數組中的

hive優化，控制map、reduce數量

行合並答案只有一個 mapred hdfs yarn str 浪費邏輯一、調整hive作業中的map數 1.通常情況下，作業會通過input的目錄產生一個或者多個map任務。主要的決定因素有： input的文件總個數，input的文件大小，集群設置的文件塊大小(目前

Python基礎-----map、filter、reduce函數總結

imp 布爾值操作 cti lte 處理 map port 原來 map(function,sequence)處理序列中的每個元素，得到結果是一個‘列表’（叠代器），該‘列表’元素個數及位置與原來一樣filter(function,sequence)遍歷序列中的每個元素，

Python 內建函式 lambda、filter、map、reduce

轉載自：http://www.cnblogs.com/feeland/ 　　Python 內建了一些比較特殊且實用的函式，使用這些能使你的程式碼簡潔而易讀。　　下面對 Python 的 lambda、filter、map、reduce 進行初步的學習。red

Python之路Python作用域、匿名函式、函數語言程式設計、map函式、filter函式、reduce函式 Python之路Python作用域、匿名函式、函數語言程式設計、map函式、filter函式、reduce函式

Python之路Python作用域、匿名函式、函數語言程式設計、map函式、filter函式、reduce函式一、作用域 return 可以返回任意值例子 def test1(): print("test1") def test(): print("te

MapReduce簡單例項解析map、reduce、combiner、partition一條龍

相關推薦