Hadoop Mapreduce之WordCount實現
阿新 • • 發佈:2017-06-11
註意 com split gin 繼承 [] leo ring exce 1.新建一個WCMapper繼承Mapper
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//接收數據V1
String line = value.toString();
//切分數據
String[] wordsStrings = line.split(" ");
//循環
for (String w: wordsStrings) {
//出現一次,記一個一,輸出
context.write(new Text(w), new LongWritable(1));
}
}
}
2.新建一個WCReducer繼承Reducer
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
@Override
protected void reduce(Text key, Iterable<LongWritable> v2s, Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
//接收數據
//Text k3 = k2;
//定義一個計算器
long counter = 0;
//循環v2s
for (LongWritable i : v2s)
{
counter += i.get();
}
//輸出
context.write(key, new LongWritable(counter));
}
}
3.WordCount類實現Main方法
/*
* 1.分析具體的業力邏輯,確定輸入輸出數據樣式
* 2.自定義一個類,這個類要繼承import org.apache.hadoop.mapreduce.Mapper;
* 重寫map方法,實現具體業務邏輯,將新的kv輸出
* 3.自定義一個類,這個類要繼承import org.apache.hadoop.mapreduce.Reducer;
* 重寫reduce,實現具體業務邏輯
* 4.將自定義的mapper和reducer通過job對象組裝起來
*/
public class WordCount {
public static void main(String[] args) throws Exception {
// 構建Job對象
Job job = Job.getInstance(new Configuration());
// 註意:main方法所在的類
job.setJarByClass(WordCount.class);
// 設置Mapper相關屬性
job.setMapperClass(WCMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
FileInputFormat.setInputPaths(job, new Path("/words.txt"));
// 設置Reducer相關屬性
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileOutputFormat.setOutputPath(job, new Path("/wcount619"));
// 提交任務
job.waitForCompletion(true);
}
}
4.打包為wc.jar,並上傳到linux,並在Hadoop下運行
hadoop jar /root/wc.jar
Hadoop Mapreduce之WordCount實現