1. 程式人生 > >IDE遠端提交mapreduce任務至linux,遇到ClassNotFoundException: Mapper

IDE遠端提交mapreduce任務至linux,遇到ClassNotFoundException: Mapper

情況

VMware上安裝了Linux系統,部署偽分散式hadoop2。我在Windows宿主機用IDEA或Eclipse編寫MapReduce程式,提交任務之後,執行時報錯:

Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$MyMapper not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
	at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:742)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.lance.common.entity.LineSplitMapper not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
	... 8 more

軟硬體

JDK 1.7

Hadoop 2.6.0

CentOS 6.7

Windows 10

IDEA 2016

Spring Tool Suite Version: 3.7.3.RELEASE

hadoop-eclipse-plugin-2.7.1.jar

樣例程式碼

public class WordCount {

	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		String inputPath = "input/protocols";
		String outputPath = "output";

		// 獲取Job ID
		Configuration conf = new Configuration();
		conf.set("fs.default.name", "hdfs://192.168.147.128:9000");
		conf.set("mapreduce.framework.name", "yarn");
		conf.set("yarn.resourcemanager.hostname", "192.168.147.128");
		conf.set("mapreduce.app-submission.cross-platform", "true"); 
		Job job = Job.getInstance(conf, "Word Count");

		job.setJarByClass(WordCount.class); 

		job.setMapperClass(MyMapper.class);
		job.setCombinerClass(MyReducer.class);
		job.setReducerClass(MyReducer.class);

		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);

		Path path = new Path(outputPath);
		path.getFileSystem(conf).delete(path, true);

		FileInputFormat.addInputPath(job, new Path(inputPath));
		FileOutputFormat.setOutputPath(job, new Path(outputPath));

		// 提交任務
		if (job.waitForCompletion(true)) {
			System.out.println("-----------------MR Finished-------------------");
		}
		System.out.println("Finished");
	}

	public static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
		@Override
		public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
			String line = value.toString();
			String[] words = line.split(" ");
			for (String word : words) {
				context.write(new Text(word), new IntWritable(1));
			}
		}
	}

	public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
		@Override
		protected void reduce(Text key, Iterable<IntWritable> values, Context context)
				throws IOException, InterruptedException {

			int sum = 0;
			for (IntWritable value : values) {
				sum += value.get();
			}

			context.write(key, new IntWritable(sum));
		}
	}
}

說明

我百度、必應、google、stackoverflow等都查過,解決方案無非是以下幾種:

(1)setJarByClass(無效);

(2)用hadoop的eclipse外掛(無效);

(3)Run as Hadoop(無效)

總而言之,下面的相關連結的內容都試過了

嘗試

(1)手動打包成jar,上傳到linux(執行成功)

(2)手動打包成jar,Windows下呼叫“hadoop jar ..”(執行成功)

(3)IDEA或Eclipse提交任務(都失敗)

分析

提交mapreduce任務的原理是這樣的,參考《Hadoop權威指南 第3版》P207:

(1)拷貝相關jar包、配置資訊、分片資訊到HDFS(預設10個備份),提交任務至master;

(2)slave節點分配到任務後,從HDFS取得上述資料,並執行;

(3)執行完畢後,刪除上述jar包、配置資訊xml等

報錯的位置是ApplicationMaster,也就是說與客戶端的程式碼無關,任務已經提交到master,並且準備運行了,但是缺少類(這個可以看上面的異常棧,看到YarnChild)。

直接執行jar包的結果,可以看到HDFS左側有10個備份,右側顯示的job.jar就是執行時的jar包,job.xml就是Configuration的內容:


用IDE執行的情況,可以看到,是沒有job.jar的,slave節點當然就拿不到Mapper/Reducer等類,但是由於有job.xml,所以可以拿到一串字串表示Mapper/Reducer等類:


解決方案

(1)如果沒有出現這種情況的,不太清楚,沒試過其他的機子(確實有群友是可以正常執行的);

(2)打包成jar,然後執行jar;

簡化做法

我個人比較懶,但是花了一週多的時間都沒解決這個問題,所以準備按第三種方案來操作:

上面的程式碼中的Configuration要改成:

Configuration conf = new Configuration();
conf.addResource("core-site.xml");
conf.addResource("mapred-site.xml");
conf.addResource("yarn-site.xml");

(1)Eclipse,將配置資訊都設定放到core-site.xml等檔案。然後每次執行前都要生成jar,這樣是適用於每一個類的,程式碼裡不用寫這一段。實際釋出時就將下面的mapred.jar刪掉。


(2)IDEA, Intellij Idea 將java專案打包成jar,這個連結裡就是IDEA打包jar的方法,下面我展示一下我的簡化:







執行時,首先是Build-Make Project,生成了一個out資料夾,並且裡面有hadoop.jar(這個可以隨便改,沒必要一定就叫hadoop.jar),然後再正常地執行java程式,這樣就不會報錯了。每次寫程式,只要額外點一下Build-Make Project就好,又簡單又方便:


相關連結