mapreduce中加入combiner

阿新 • • 發佈：2018-07-24

combine mage rim opened alt 不用一次 apr configure

combiner相當於是一個本地的reduce，它的存在是為了減少網絡的負擔，在本地先進行一次計算再叫計算結果提交給reduce進行二次處理。

現在的流程為：

技術分享圖片

對於combiner我們有這些理解：

技術分享圖片

Mapper代碼展示：

package com.nenu.mprd.test;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
 
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

public class MyMap extends Mapper<LongWritable, Text, Text, Text> {
    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
             
throws IOException, InterruptedException {
        // TODO Auto-generated method stub
        //獲取到單詞
        String line=value.toString();
        String[] words=line.split(" ");
        //獲取到文件名
        FileSplit filesplit = (FileSplit)context.getInputSplit();
        String fileName =  filesplit.getPath().getName().trim();// 
.substring(0,5).
        
        String outkey=null;
        for (String word : words) {
            //字母+:+文件名
            outkey=word.trim()+":"+fileName;
            System.out.println("map:"+outkey);
            
            context.write(new Text(outkey), new Text("1"));
        }
    }
}

View Code

Combiner代碼展示：

package com.nenu.mprd.test;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyCombiner extends Reducer<Text, Text, Text, Text>{
    @Override
    protected void reduce(Text key, Iterable<Text> values,Context context) throws IOException, InterruptedException {
           Text n = null;//輸出key
           int count=0;
           Text m=null;//輸出value
           for(Text v :values){ //對同一個map輸出的k,v對進行按k進行一次匯總。不同map的k,v匯總必須要用reduce方法
                 String[] words=key.toString().split(":");
                 n=new Text(words[0].trim());//字母--key
                 System.out.println("MyCombiner KEY:"+n);
                 
                 count+=Integer.parseInt(v.toString());
                 m=new Text("("+words[1].trim()+" "+count+")");
                 
           }
           System.out.println("MyCombiner value:"+m);
           context.write(n, m);
    }

}

View Code

　Reduce代碼展示：

package com.nenu.mprd.test;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyReduce extends Reducer<Text, Text, Text, Text> {
    @Override
    protected void reduce(Text key, Iterable<Text> values,
            Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {
        // TODO Auto-generated method stub
        System.out.println("reduce: key"+key);
        String out="";
        for (Text Text : values) {
            //sum+=intWritable.get();
            out+=Text.toString()+" ";
        }
        System.out.println("reduce value:"+out);
        context.write(key, new Text(out));
    }
}

View Code

　Job代碼展示：

package com.nenu.mprd.test;

import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class MyJob extends Configured implements Tool{
    
    public static void main(String[] args) throws Exception {
        MyJob myJob=new MyJob();
        ToolRunner.run(myJob, null);
    }
    @Override
    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf=new Configuration();
        conf.set("fs.defaultFS", "hdfs://192.168.64.141:9000");
        
        //添加自動刪除hadoop下的文件
        //如果導成架包則需要改變一些參數作為手動輸入
        FileSystem filesystem =FileSystem.get(new URI("hdfs://192.168.64.141:9000"), conf, "root");
        Path deletePath=new Path("/hadoop/wordcount/city/out");
        if(filesystem.exists(deletePath)){
            filesystem.delete(deletePath,true);//str:  b:
        }
        
        
        Job job=Job.getInstance(conf);
        job.setJarByClass(MyJob.class);
        job.setMapperClass(MyMap.class);
        
        //設置combiner 如果combiner和reduce一樣則可以不用設置
        job.setCombinerClass(MyCombiner.class);
        
        job.setReducerClass(MyReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path("/hadoop/wordcount/city"));
        FileOutputFormat.setOutputPath(job, new Path("/hadoop/wordcount/city/out"));
        job.waitForCompletion(true);
        return 0;
    }

}

View Code

mapreduce中加入combiner

combine mage rim opened alt 不用一次 apr configure combiner相當於是一個本地的reduce，它的存在是為了減少網絡的負擔，在本地先進行一次計算再叫計算結果提交給reduce進行二次處理。現在的流程為：對於comb

[MapReduce_5] MapReduce 中的 Combiner 元件應用

0. 說明　　Combiner 介紹 && 在 MapReduce 中的應用 1. 介紹　　Combiner：　　Map 端的 Reduce，有自己的使用場景　　在相同 Key 過多的情況下，在 Map 端進行的預

MapReduce中的combiner類詳解及自定義combiner類（轉）

一、Combiner的出現背景 1.1 回顧Map階段五大步驟　　在第四篇博文《初識MapReduce》中，我們認識了MapReduce的八大步湊，其中在Map階段總共五個步驟，如下圖所示：　　其中，step1.5是一個可選步驟，它就是我們今天需要了解的 M

MapReduce中Combiner方法使用

Combiner 會繼承Reducer，它是一種mr的優化，用於減少伺服器之間網路頻寬的壓力，它是在map結束後在每臺伺服器中都算出一個值，再傳到shuffle中。適合於求和等每臺伺服器算出的值對最後結果沒有影響的業務中，但是像求平均值等功能會帶來誤差所以不能使

curl 在HEAD請求中加入Authenticaion

curlcurl -k -i -H ‘content-type:application/json‘ -H ‘Authentication:Token t=12345678‘ -d ‘{"ids":["22"]}‘ "https://domain.com"curl 在HEAD請求中加入Authenticaion

mapreduce中reduce中的叠代器只能調用一次！

new resultset row reducer style prot category nds 重復親測，只能調用一次，如果想想在一次reduce重復使用叠代器中的數據，得先取出來放在list中然後在從list中取出來！！多次讀取reduce函數中叠代器的數據

在DBGRIDEH中加入“合計”行

count 數據集 lis seq column columns orm 求和所有 1、將dBGridEh.FooterRowCount := 1 2、將DBGridEh.SumList.Active := True; 3、將Columns[要求和的字段].Footer.

Android中加入思源字體/NotoSansCJK/SourceHanSans

default languages nes 字體 google optional one puts ++ 系統版本號：Android 4.2.2_r1 本文主要是在Android中加入思源字體的過程記錄。思源字體是Google和Adobe在2014.07.18公布的中文

MapReduce中combine、partition、shuffle的作用是什麽

rgs 輸出 microsoft ted pop .com int ack 結果 http://www.aboutyun.com/thread-8927-1-1.html Mapreduce在hadoop中是一個比較難以的概念。以下須要用心

Android中加入水平線和垂直線

ng- div data- round java ack fill style -a 1.加入水平線 <View android:layout_height="0.5dip" android:background="#68

向場景中加入光照

ext 例如 col 位置 ng- oid 法線步驟 name 向場景中加入光照的4個步驟： 1）為每一個物體的每一個頂點計算法向量，法線確定了物體相對於光源的指向法線的計算：設向量a（x1,y1,z1）。向量b（x2,y2,z2）則a×b=(x2·y3

duilib中加入自己定義控件之後怎麽可以在xml文件裏配置使用

調用 tin 不同 center ger 使用我們 article virtual 加入自己定義控件可能有兩種不同的情況： 1、在duilib庫中加入的自己定義控件。 2、在我們的應用程序中自己重寫了一個控件。以下開始解說不同的情況下怎麽才幹支持在

我的Android進階之旅------>怎樣在多個LinearLayout中加入分隔線

lai tex pre draw white 方法 utf == 技術分享假設要適合於全部的Android版本號，能夠在多個LinearLayout放置用於顯示分隔線的View。比如，放一個ImageView組件。然後將其背景設為分隔線的顏色或圖像，分隔線

如何在apk中加入外部程序

aid music http 外部外部程序 lis 程序如何 .com %E6%9C%9F%E5%88%8A%E6%A3%AE%E6%9E%97%E6%B3%95%E5%A4%A7%E7%90%86%E7%9F%B3%E6%92%92%E9%85%92%E7%96%AF

jsp實現仿QQ空間新建多個相冊名稱，向相冊中加入照片

ext forname parseint type 登錄失敗 ocr cli str null 工具：Eclipse，Oracle，smartupload.jar。語言：jsp，Java；數據存儲：Oracle。實現功能介紹：主要是新建相冊，能夠建多個相冊，

如何在字符串中加入變量

如果 spa 變量字符 color pan for style blog for i in range(1,7): print(‘這是第‘ + str(i) + ‘次打印‘) ‘‘‘ 這是第1次打印這是第2次打印這是第3次打印這是第4次打印這是第5次打印

C#.NET常見問題(FAQ)-如何在系統變量中加入新的環境變量

教學視頻分號新的 alt 文件 sof 問題 png 點擊比如我要將C:\Windows\Microsoft.NET\Framework\v3.5這個目錄加入環境變量則在系統的環境變量中點擊Path,編輯，然後加入一個分號";"，然後粘貼新的地址。

在項目中加入其他樣式

false name 配置 default top andro conf style get 直接在項目中加入其他樣式會報錯的時候，在配置文件中加入一面的紅色那兩句就好 android { compileSdkVersion 26 buildToolsVersi

關於Eclipse項目中加入jquery-1.x.js文件報錯問題

文件 div bsp 顯示項目但是目的 data pro 現在使用Eclipse3.7及以後的版本的時候，加入jQuery文件會報錯，上面顯示一個小小的紅色X，雖然這個並不會影響項目的運行，但是這個卻會影響到開發人員的心情，看這總是很不爽，怎麽樣才能解決呢。很簡單，首

Latex技巧：在圖表序號中加入章節號（實現諸如“圖1.1.2”這樣的圖表序號）

了解 abi count 參考手動一行就是 .html title 平時看書經常看到“圖1.2”這樣的編號，含義是第1章的第2幅插圖；或者“圖1.1.2”，含義是第1章第1節的第2幅插圖。而在LaTeX中如果直接插圖的話只會顯示“圖2”這樣的編號，有沒有辦法在LaTe

mapreduce中加入combiner

相關推薦