IDE遠端提交mapreduce任務至linux,遇到ClassNotFoundException: Mapper
情況
VMware上安裝了Linux系統,部署偽分散式hadoop2。我在Windows宿主機用IDEA或Eclipse編寫MapReduce程式,提交任務之後,執行時報錯:
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$MyMapper not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:742) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ClassNotFoundException: Class com.lance.common.entity.LineSplitMapper not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072) ... 8 more
軟硬體
JDK 1.7
Hadoop 2.6.0
CentOS 6.7
Windows 10
IDEA 2016
Spring Tool Suite Version: 3.7.3.RELEASE
hadoop-eclipse-plugin-2.7.1.jar
樣例程式碼
public class WordCount { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { String inputPath = "input/protocols"; String outputPath = "output"; // 獲取Job ID Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://192.168.147.128:9000"); conf.set("mapreduce.framework.name", "yarn"); conf.set("yarn.resourcemanager.hostname", "192.168.147.128"); conf.set("mapreduce.app-submission.cross-platform", "true"); Job job = Job.getInstance(conf, "Word Count"); job.setJarByClass(WordCount.class); job.setMapperClass(MyMapper.class); job.setCombinerClass(MyReducer.class); job.setReducerClass(MyReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Path path = new Path(outputPath); path.getFileSystem(conf).delete(path, true); FileInputFormat.addInputPath(job, new Path(inputPath)); FileOutputFormat.setOutputPath(job, new Path(outputPath)); // 提交任務 if (job.waitForCompletion(true)) { System.out.println("-----------------MR Finished-------------------"); } System.out.println("Finished"); } public static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] words = line.split(" "); for (String word : words) { context.write(new Text(word), new IntWritable(1)); } } } public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } }
說明
我百度、必應、google、stackoverflow等都查過,解決方案無非是以下幾種:
(1)setJarByClass(無效);
(2)用hadoop的eclipse外掛(無效);
(3)Run as Hadoop(無效)
總而言之,下面的相關連結的內容都試過了
嘗試
(1)手動打包成jar,上傳到linux(執行成功)
(2)手動打包成jar,Windows下呼叫“hadoop jar ..”(執行成功)
(3)IDEA或Eclipse提交任務(都失敗)
分析
提交mapreduce任務的原理是這樣的,參考《Hadoop權威指南 第3版》P207:
(1)拷貝相關jar包、配置資訊、分片資訊到HDFS(預設10個備份),提交任務至master;
(2)slave節點分配到任務後,從HDFS取得上述資料,並執行;
(3)執行完畢後,刪除上述jar包、配置資訊xml等
報錯的位置是ApplicationMaster,也就是說與客戶端的程式碼無關,任務已經提交到master,並且準備運行了,但是缺少類(這個可以看上面的異常棧,看到YarnChild)。
直接執行jar包的結果,可以看到HDFS左側有10個備份,右側顯示的job.jar就是執行時的jar包,job.xml就是Configuration的內容:
用IDE執行的情況,可以看到,是沒有job.jar的,slave節點當然就拿不到Mapper/Reducer等類,但是由於有job.xml,所以可以拿到一串字串表示Mapper/Reducer等類:
解決方案
(1)如果沒有出現這種情況的,不太清楚,沒試過其他的機子(確實有群友是可以正常執行的);
(2)打包成jar,然後執行jar;
簡化做法
我個人比較懶,但是花了一週多的時間都沒解決這個問題,所以準備按第三種方案來操作:
上面的程式碼中的Configuration要改成:
Configuration conf = new Configuration();
conf.addResource("core-site.xml");
conf.addResource("mapred-site.xml");
conf.addResource("yarn-site.xml");
(1)Eclipse,將配置資訊都設定放到core-site.xml等檔案。然後每次執行前都要生成jar,這樣是適用於每一個類的,程式碼裡不用寫這一段。實際釋出時就將下面的mapred.jar刪掉。
(2)IDEA, Intellij Idea 將java專案打包成jar,這個連結裡就是IDEA打包jar的方法,下面我展示一下我的簡化:
執行時,首先是Build-Make Project,生成了一個out資料夾,並且裡面有hadoop.jar(這個可以隨便改,沒必要一定就叫hadoop.jar),然後再正常地執行java程式,這樣就不會報錯了。每次寫程式,只要額外點一下Build-Make Project就好,又簡單又方便: