MapReduce中計算Wordcount中map端及reduce端的設定
阿新 • • 發佈:2018-12-01
map端的設定:
package wordcount; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object Key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
map端主要是將輸入進來的數值轉換成(key,1)的形式
reduce端的設定:
package wordcount; import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class MyReducer extends Reducer<Text, IntWritable,Text,IntWritable>{ private IntWritable result = new IntWritable(); public void reduce(Text key,Iterable<IntWritable> values,Context context) { int sum = 0; for (IntWritable val: values) { sum += val.get(); } Iterator<IntWritable> it = values.iterator(); while(it.hasNext()) { IntWritable n = it.next(); System.out.println(n); System.out.println(key); } result.set(sum); context.write(key, result); } }
在用MapReduce來計算Wordcount中,reduce端才是真正按照相同的key進行設定將value的值相加的。在這期間使用的是迭代器進項轉換的。