一個MapReduce 程式示例 細節決定成敗(一)
阿新 • • 發佈:2018-12-16
最近在看MapReduce,想起一直都是Copy 然後修改的方法來寫。突然想試試自己動手寫一個級其簡單的mr程式。
細節決定成敗啊,不試不知道,一試才能發現平時注意不到的細節。
下面是我用了很快時間寫好的一個程式,注意,這份是有問題的!
package wordcount; import java.io.IOException; import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.log4j.Logger; public class MyWordCountJob extends Configured implements Tool { Logger log = Logger.getLogger(MyWordCountJob.class); public class MyWordCountMapper extends Mapper<LongWritable, Text, LongWritable, Text> { Logger log = Logger.getLogger(MyWordCountJob.class); LongWritable mapKey = new LongWritable(); Text mapValue = new Text(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { mapKey.set(key.get()); mapValue.set(value.toString()); log.info("Mapper: mapKey--" + mapKey.get() + "mapValue --"+ mapValue.toString()); context.write(mapKey, mapValue); } } public class MyWordCountReducer extends Reducer<LongWritable, Text, LongWritable, Text> { @Override protected void reduce(LongWritable key, Iterable<Text> values,Context context) throws IOException, InterruptedException { for(Text value :values) context.write(key, value); } } @Override public int run(String[] args) throws Exception { log.info("begin to run"); Job job = Job.getInstance(getConf(), "MyWordCountJob"); job.setJarByClass(MyWordCountJob.class); Path inPath = new Path("demos/pigdemo.txt"); Path outPath = new Path("demos/pigdemoOut.txt"); outPath.getFileSystem(getConf()).delete(outPath,true); TextInputFormat.setInputPaths(job, inPath); TextOutputFormat.setOutputPath(job, outPath); job.setMapperClass(MyWordCountJob.MyWordCountMapper.class); job.setReducerClass(MyWordCountJob.MyWordCountReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class); return job.waitForCompletion(true)?0:1; } public static void main(String [] args){ int result = 0; try { result = ToolRunner.run(new Configuration(), new MyWordCountJob(), args); } catch (Exception e) { e.printStackTrace(); } System.exit(result); } }
寫完成編譯,打包然後執行。
16/05/10 22:43:46 INFO mapreduce.Job: Running job: job_1462517728035_0033 16/05/10 22:43:54 INFO mapreduce.Job: Job job_1462517728035_0033 running in uber mode : false 16/05/10 22:43:54 INFO mapreduce.Job: map 0% reduce 0% 16/05/10 22:43:58 INFO mapreduce.Job: Task Id : attempt_1462517728035_0033_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.NoSuchMethodException: wordcount.MyWordCountJob$MyWordCountMapper.<init>() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.lang.NoSuchMethodException: wordcount.MyWordCountJob$MyWordCountMapper.<init>() at java.lang.Class.getConstructor0(Class.java:2706) at java.lang.Class.getDeclaredConstructor(Class.java:1985) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 7 more 16/05/10 22:44:02 INFO mapreduce.Job: Task Id : attempt_1462517728035_0033_m_000000_1, Status : FAILED Error: java.lang.RuntimeException: java.lang.NoSuchMethodException: wordcount.MyWordCountJob$MyWordCountMapper.<init>() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.lang.NoSuchMethodException: wordcount.MyWordCountJob$MyWordCountMapper.<init>() at java.lang.Class.getConstructor0(Class.java:2706) at java.lang.Class.getDeclaredConstructor(Class.java:1985) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 7 more 16/05/10 22:44:07 INFO mapreduce.Job: Task Id : attempt_1462517728035_0033_m_000000_2, Status : FAILED Error: java.lang.RuntimeException: java.lang.NoSuchMethodException: wordcount.MyWordCountJob$MyWordCountMapper.<init>() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.lang.NoSuchMethodException: wordcount.MyWordCountJob$MyWordCountMapper.<init>() at java.lang.Class.getConstructor0(Class.java:2706) at java.lang.Class.getDeclaredConstructor(Class.java:1985) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 7 more 16/05/10 22:44:14 INFO mapreduce.Job: map 100% reduce 100% 16/05/10 22:44:14 INFO mapreduce.Job: Job job_1462517728035_0033 failed with state FAILED due to: Task failed task_1462517728035_0033_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 16/05/10 22:44:15 INFO mapreduce.Job: Counters: 6 Job Counters Failed map tasks=4 Launched map tasks=4 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=99584 Total time spent by all reduces in occupied slots (ms)=0
上面的問題百思不得甚解,完全不知道什麼地方錯了。
然後跟之前copy的程式碼進行比對。終於找出了問題所在!
注意Mapper 與 Reducer 類寫成內部類,一定要加static !!!!
留個小任務,檢視一下生成的結果檔案可以發現什麼?
使用TextInputFormat時,進入map 函式中的LongWritable型別的key 代表什麼?
經實驗確認這個key 其實是本行的首字元在整個檔案中的偏移量。
下一篇中介紹瞭如何檢視執行日誌,通過不斷改進一個mapreduce 任務學習hadoop