1. 程式人生 > >初學MapReduce-WordCount案例遇到的問題

初學MapReduce-WordCount案例遇到的問題

一、WordCount案例

1.Driver類中容易發生導包錯誤

//6指定輸入輸出路徑
	FileInputFormat.setInputPaths(job, new Path(args [0]));
	FileOutputFormat.setOutputPath(job, new Path( args [1]));
JobConf是舊API使用的,而我們需要的是新API所以使用import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;代替原的import org.apache.hadoop.mapred.FileInputFormat;

2.在Windows端測試的時候,可以生成資料夾 但是不能生成日誌檔案

Unable to initialize MapOutputCollector org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  java.lang.ClassCastException: interface javax.xml.soap.Text
at java.lang.Class.asSubclass(Class.java:3404)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2018-05-07 20:09:54,107 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
  2018-05-07 20:09:54,112 WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local804371642_0001
  java.lang.Exception: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :interface javax.xml.soap.Text
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :interface javax.xml.soap.Text
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:415)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: interface javax.xml.soap.Text
at java.lang.Class.asSubclass(Class.java:3404)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
... 10 more


無法初始化,錯誤都在MapTest裡,說明輸出型別錯誤,
通過查Driver Job的程式碼,發現在setMapOutputKeyFormat方法裡面,設定的輸出是Text型別
 job.setMapOutputKeyClass(Text.class); 
但是在import裡面卻不是import的org.apache.hadoop.io.Text類,而是另外一個不知所云的包下面的Text。在修改成hadoop的Text類後,問題解決。
在Hadoop 2.6.0的API文件中,對於WritableCompareble的介紹並沒有特別指出這個要求。不知道是因為這個要求在以前的版本里面已經眾所周知了,還是API作者忽視了。對於開始學習MapReduce程式設計的人來說,這個還是挺困惑的,因為log給出的錯誤幾乎沒有參考價值。