MapReduce之詞頻統計本地執行
阿新 • • 發佈:2021-07-12
1、上述的MapReduce之Mapper、Reducer、Driver三步實現,是基於輸入和輸出都是HDFS的
(1)輸入:HADOOP_USER_NAME、
(2)輸出:hdfs://192.168.126.101:8020
//WordCountApp.java //設定許可權 System.setProperty("HADOOP_USER_NAME", "hadoop"); Configuration configuration = new Configuration(); //在configuration裡設定一些東西: configuration.set("fs.defaultFS", "hdfs://192.168.126.101:8020");
2、不連HDFS,只在本地處理詞頻統計
(1)在hadoop-train-v2下新建Directory:input
(2)在input裡新建file.text:WordCount.Input
(3)將h.txt內容考入WordCount.Input中
(4)在com.imooc.bigdata.hadoop.mapreduce.wordcount下複製WordCountApp.java為:WordCountLocalApp.java
3、WordCountLocalApp.java
package com.imooc.bigdata.hadoop.mapreduce.wordcount;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/* * Driver類:配置Mapper和Reducer的相關屬性 * 通過WordCountApp.java將Mapper和Reducer關聯起來 * 使用MapReduce統計HDFS上的檔案對應的詞頻 * * 使用本地檔案進行統計,然後統計結果輸出到本地路徑 */ public class WordCountLocalApp { public static void main(String[] args) throws Exception{ Configuration configuration = new Configuration(); //建立一個Job //將configuration傳進來 Job job = Job.getInstance(configuration); //設定Job對應的引數:主類 job.setJarByClass(WordCountLocalApp.class); //設定Job對應的引數:設定自定義的Mapper和Reducer處理類 job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); //設定Job對應的引數:Mapper輸出key和value的型別 //不需要關注Mapper輸入 //Mapper<LongWritable, Text, Text, IntWritable> job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); //設定Job對應的引數:Reducer輸出key和value的型別 //不需要關注Reducer輸入 //Reducer<Text, IntWritable, Text, IntWritable> job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //設定Job對應的引數:Mapper輸出key和value的型別:作業輸入和輸出的路徑 FileInputFormat.setInputPaths(job, new Path("input")); FileOutputFormat.setOutputPath(job, new Path("output")); //提交job boolean result = job.waitForCompletion(true); System.exit(result ? 0 : -1); } }
4、結果輸出
問題:統計結果區分大小寫