1. 程式人生 > >MapReduce之求平均值

MapReduce之求平均值

給定檔案資訊求檔案內容的平均值演算法

<1>Map端讀取檔案資訊內容

在讀取檔案資訊內容時,首先對檔案資訊進行切分,將檔案切分為key和value,便於檔案資訊的計算

public class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>{

	@Override
	protected void map(LongWritable key, Text value,Context context)
			throws IOException, InterruptedException {
		//對檔案進行拆分
		String[] str = value.toString().split(" ");
		//獲取檔案key值
		String name = str[0];
		//獲取檔案value值
		long a =Long.parseLong(str[str.length-1]);
		//將key和value寫進文字中
		context.write(new Text(name),new LongWritable(a));
	}
	
}

在設定Map類時繼承Mapper類,並使用泛型,通常泛型的第一個型別是LongWritable,之後的泛型可以根據自己需求進行設定

<2>Reduce端進行接收Map端傳進來的key,value並進行函式處理

PS:key值相同的value進行運算,不同的key值value不進行運算

public class MyReduce extends Reducer<Text, LongWritable, Text, LongWritable>{

	@Override
	protected void reduce(Text key, Iterable<LongWritable> value,
			Reducer<Text, LongWritable, Text, LongWritable>.Context arg2) throws IOException, InterruptedException {
		int i = 0;
		for (LongWritable values : value) {
			i += values.get();
		}
		arg2.write(key, new LongWritable(i/3));
	}
	
}

<3>載入驅動

public class SumDriver {
	public static void main(String[] args) throws Exception {
		//載入配置檔案
		Configuration conf = new Configuration();
		//建立mr任務
		Job job = Job.getInstance(conf, "mt");
		//設定主類
		//job.setJar("mt.jar");
		//設定map
		job.setMapperClass(MyMapper.class);
		//設定reduce
		job.setReducerClass(MyReduce.class);
		//設定輸出格式
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(LongWritable.class);
		//設定輸入路徑
		FileInputFormat.addInputPath(job, new Path(args[0]));
		//設定輸出路徑
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

如果要進行精確運算,可以將泛型型別設定為DoubleWritable型別

簡單的改變方式是在Reduce端的輸出value泛型設定為DoubleWritable型別,然後在計算value值 設定為double型別,同時在驅動上設定job.setOutputValueClass(DoubleWritable.class);