1. 程式人生 > 其它 >MapReduce之詞頻統計本地執行

MapReduce之詞頻統計本地執行

1、上述的MapReduce之Mapper、Reducer、Driver三步實現,是基於輸入和輸出都是HDFS的

(1)輸入:HADOOP_USER_NAME、
(2)輸出:hdfs://192.168.126.101:8020

//WordCountApp.java       
        //設定許可權
        System.setProperty("HADOOP_USER_NAME", "hadoop");

        Configuration configuration = new Configuration();
        //在configuration裡設定一些東西:
        configuration.set("fs.defaultFS", "hdfs://192.168.126.101:8020");
        

2、不連HDFS,只在本地處理詞頻統計

(1)在hadoop-train-v2下新建Directory:input

(2)在input裡新建file.text:WordCount.Input

(3)將h.txt內容考入WordCount.Input中

(4)在com.imooc.bigdata.hadoop.mapreduce.wordcount下複製WordCountApp.java為:WordCountLocalApp.java

3、WordCountLocalApp.java

package com.imooc.bigdata.hadoop.mapreduce.wordcount;


import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/* * Driver類:配置Mapper和Reducer的相關屬性 * 通過WordCountApp.java將Mapper和Reducer關聯起來 * 使用MapReduce統計HDFS上的檔案對應的詞頻 * * 使用本地檔案進行統計,然後統計結果輸出到本地路徑 */ public class WordCountLocalApp { public static void main(String[] args) throws Exception{ Configuration configuration = new Configuration(); //建立一個Job //將configuration傳進來 Job job = Job.getInstance(configuration); //設定Job對應的引數:主類 job.setJarByClass(WordCountLocalApp.class); //設定Job對應的引數:設定自定義的Mapper和Reducer處理類 job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); //設定Job對應的引數:Mapper輸出key和value的型別 //不需要關注Mapper輸入 //Mapper<LongWritable, Text, Text, IntWritable> job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); //設定Job對應的引數:Reducer輸出key和value的型別 //不需要關注Reducer輸入 //Reducer<Text, IntWritable, Text, IntWritable> job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //設定Job對應的引數:Mapper輸出key和value的型別:作業輸入和輸出的路徑 FileInputFormat.setInputPaths(job, new Path("input")); FileOutputFormat.setOutputPath(job, new Path("output")); //提交job boolean result = job.waitForCompletion(true); System.exit(result ? 0 : -1); } }

4、結果輸出

問題:統計結果區分大小寫