Hadoop實戰-MapReduce之max、min、avg統計(六)

阿新 • • 發佈：2017-05-08

next combine output fileinput private pub eof pri use

1、數據準備：

Mike,35

Steven,40

Ken,28

Cindy,32

2、預期結果

Max　　40

Min　　 28

Avg 33

3、MapReduce代碼如下

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
 
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class AgeMapReduce {

    public 
 static class WordCountMapper extends
            Mapper<Object, Text, Text, Text> {
        private Text nameKey = new Text();
        private Text ageValue = new Text();

        @Override
        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            StringTokenizer itr  
= new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                String content = itr.nextToken();
                String[] nameAndAge = content.split(",");
                //String name = nameAndAge[0];
                String age = nameAndAge[1];
                nameKey.set("only you");
                ageValue.set(age);
                context.write(nameKey, ageValue);
            }
        }
    }

    public static class WordCountReduce extends Reducer<Text, Text, Text, Text> {
        private int min = Integer.MAX_VALUE;
        private int max = 0;
        private int sum = 0;
        private int count = 0;

        @Override
        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            for (Text tmpAge : values) {
                int age = Integer.valueOf(tmpAge.toString());
                if (age < min) {
                    min = age;
                }
                if (age > max) {
                    max = age;
                }
                sum += age;
                count++;
            }
            //String resultStr = min + "\t" + max + "\t" + (sum / count);
            //result.set(resultStr);
            context.write(new Text("Max"), new Text(String.valueOf(min)));
            context.write(new Text("Min"), new Text(String.valueOf(max)));
            context.write(new Text("Avg"), new Text(String.valueOf(sum/count)));
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: MinMaxCountDriver <in> <out>");
            System.exit(2);
        }
        Job job = new Job(conf, "StackOverflow Comment Date Min Max Count");
        job.setJarByClass(AgeMapReduce.class);
        job.setMapperClass(WordCountMapper.class);
        // job.setCombinerClass(MusicReduce.class);
        job.setReducerClass(WordCountReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        // user/joe/wordcount/input
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

        // user/joe/wordcount/output
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

4、註意事項

因為輸出的結果和Key沒有關系，所以在map階段要固定一個Key即可。

Hadoop實戰-MapReduce之max、min、avg統計(六)

next combine output fileinput private pub eof pri use 1、數據準備： Mike,35 Steven,40 Ken,28 Cindy,32 2、預期結果 Max　　40 Min　　 28 Avg 33 3、M

《Python程式設計從入門到實踐》記錄之range、min、max、sum函式

目錄 1、range（）函式 2、min、max、sum函式 1、range（）函式 range（num1，num2，steps）函式：可以生成一個一系列數字。 num1引數：表示起始範圍 num2引數：表示終止範圍，但不包含此數字。 steps

內置函數-max、min、round、sorted、ord、chr、any、all、dir、eval、exec、map、filter

簡單 http 結果 world -s www. [] 安全 pytho http://www.nnzhp.cn/archives/152 1、max,min,round 1 print(max([3,4.563,3,6,2.5])) #取最大值，可循環參數即可，int類

【hadoop】MapReduce工作流程和MapTask、Shuffle、ReduceTask工作機制

MapReduce整個工作流程：一、MapTask階段（1）Read階段：MapTask通過使用者編寫的RecordReader，從輸入InputSplit中解析出一個個key/value。（2）Map階段：該節點主要是將解析出的key/value交給使用者編寫map()函式

mongodb 中max、min、sum、avg等函式用法

記錄一個tip，網上沒找到合理方案，自己試出來的~ mongodb中test表的資料如下 db.test.find() { "_id" : "A", "company_name" : "公司A", "search_fre

Hadoop實戰-Flume之自定義Sink(十九)

current ioe back urn oop print out java try import java.io.File; import java.io.FileNotFoundException; import java.io.FileOutputStream;

Hadoop實戰-Flume之Hdfs Sink(十)

pac esc path ref times buffers ogg events nts a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources

Hadoop實戰-Flume之Source regex_extractor(十二)

local netcat nts configure style cto and event time a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.s

大數據Hadoop Streaming編程實戰之C++、Php、Python

大數據編程 PHP語言 Python編程 C語言的應用 Streaming框架允許任何程序語言實現的程序在HadoopMapReduce中使用，方便已有程序向Hadoop平臺移植。因此可以說對於hadoop的擴展性意義重大。接下來我們分別使用C++、Php、Python語言實現HadoopWo

Python之路Python內建函式、zip()、max()、min() Python之路Python內建函式、zip()、max()、min()

Python之路Python內建函式、zip()、max()、min() 一、python內建函式 abs() 求絕對值例子 print(abs（-2）) all() 把序列中每一個元素做布林運算，如果全部都是true，就返回true,

【NumPy】之常見運算（min、max、mean、sum、exp、sqrt、sort、乘法、點積、物件拼接/切分）

____tz_zs 之前把 numpy 資料寫在了同一篇部落格裡，發現非常難以查閱，於是按功能切分開來。運算 ndarray.min() / np.min(ndarray) ndarray.max() / np.max(ndarray) ndarray.m

hadoop之HDFS、yarn、MapReduce執行原理分析

1、HDFS分散式儲存 namenode:統一管理檔案的元資料資訊 fsImage:儲存了檔案的基本資訊，如檔案路徑，檔案副本集個數，檔案塊的資訊，檔案所在的主機資訊。 editslog：

系統架構師之Java虛擬機、OSGi—JVM高級性能架構項目實戰開發

JVM系統架構師之Java虛擬機、OSGi—JVM高級性能架構項目實戰開發分享網盤下載地址：https://pan.baidu.com/s/1hs3pz1M 密碼: g2wa 本課程由淺入深，全面、系統地介紹了JAVA 虛擬機基礎、應用、管理、性能優化、數據庫的架構，環境搭建實例，編程實例等內容

【Vue實戰之路】一、Vue-cli全面詳解及進階操作。

image 腳本 js基礎這一命令執行 bsp row 編譯服務器全面的Vue-cli學習，這一篇就夠了！一、下載使用vue-cli前，需先安裝node.js,node的安裝就不贅述，不過在此需要註意： 1. node版本需在4.x以上，首推6.x以上版本

3星|《實戰復盤第四季·商業巨頭們的變革之道》：GE、TCL、力拓集團、英美資源集團等企業總裁的變更經驗

tar 表現哈佛商業評論運動選擇方法 -c 團隊文章實戰復盤第四季·商業巨頭們的變革之道（《哈佛商業評論》增刊）本期是《哈佛商業評論》“實戰復盤”欄目的10篇文章，講的是GE、TCL、力拓集團、英美資源集團等企業如何

服務化實戰之 dubbo、dubbox、motan、thrift、grpc等RPC框架比較及選型

分布式系統線程 ins tno 大小實施基礎設施 child shift 概述前段時間項目

Hadoop完全分散式用MapReduce實現自定義排序、分割槽和分組

經過前面一段時間的學習，簡單的單詞統計已經不能實現更多的需求，就連自帶的一些函式方法等也是跟不上節奏了；加上前面一篇MapReduce的底層執行步驟的瞭解，今天學習自定義的排序、分組、分割槽相對也特別容易。認為不好理解，先參考一下前面的一篇：https://bl

分享《深度學習之TensorFlow：入門、原理與進階實戰》PDF+源代碼

image pro 源代碼代碼復制進階 com nag 分享圖片下載：https://pan.baidu.com/s/1zI-pblJ5dEwjGVe-QQP9hQ 更多資料：http://blog.51cto.com/3215120 《深度學習之TensorFlo

大資料Hadoop學習系列之Hadoop、Spark學習路線

1 Java基礎：視訊方面：推薦畢老師《畢向東JAVA基礎視訊教程》。學習hadoop不需要過度的深入，java學習到javase，在多執行緒和並行化多多理解實踐即可。書籍方面：推薦李興華的《java開發實戰經典》 2 Linux基礎：視訊方面：（1）馬哥的高薪Linux

RocketMQ實戰(三)之生產者、消費者

一：Maven配置加入rocketmq-client依賴二：生產者、消費者 1：生產者 2：消費者 DefaultMQPushConsumer和DefaultMQProducer需要設定三個引數：一是這個Consumer的GroupName,二是Nam

Hadoop實戰-MapReduce之max、min、avg統計(六)

相關推薦