MapReduce之求平均值
阿新 • • 發佈:2018-12-18
給定檔案資訊求檔案內容的平均值演算法
<1>Map端讀取檔案資訊內容
在讀取檔案資訊內容時,首先對檔案資訊進行切分,將檔案切分為key和value,便於檔案資訊的計算
public class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>{ @Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { //對檔案進行拆分 String[] str = value.toString().split(" "); //獲取檔案key值 String name = str[0]; //獲取檔案value值 long a =Long.parseLong(str[str.length-1]); //將key和value寫進文字中 context.write(new Text(name),new LongWritable(a)); } }
在設定Map類時繼承Mapper類,並使用泛型,通常泛型的第一個型別是LongWritable,之後的泛型可以根據自己需求進行設定
<2>Reduce端進行接收Map端傳進來的key,value並進行函式處理
PS:key值相同的value進行運算,不同的key值value不進行運算
public class MyReduce extends Reducer<Text, LongWritable, Text, LongWritable>{ @Override protected void reduce(Text key, Iterable<LongWritable> value, Reducer<Text, LongWritable, Text, LongWritable>.Context arg2) throws IOException, InterruptedException { int i = 0; for (LongWritable values : value) { i += values.get(); } arg2.write(key, new LongWritable(i/3)); } }
<3>載入驅動
public class SumDriver { public static void main(String[] args) throws Exception { //載入配置檔案 Configuration conf = new Configuration(); //建立mr任務 Job job = Job.getInstance(conf, "mt"); //設定主類 //job.setJar("mt.jar"); //設定map job.setMapperClass(MyMapper.class); //設定reduce job.setReducerClass(MyReduce.class); //設定輸出格式 job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); //設定輸入路徑 FileInputFormat.addInputPath(job, new Path(args[0])); //設定輸出路徑 FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
如果要進行精確運算,可以將泛型型別設定為DoubleWritable型別
簡單的改變方式是在Reduce端的輸出value泛型設定為DoubleWritable型別,然後在計算value值 i 設定為double型別,同時在驅動上設定job.setOutputValueClass(DoubleWritable.class);