Eclipse遠端提交MapReduce任務到Hadoop叢集

阿新 • • 發佈：2019-01-03

一、介紹

以前寫完MapReduce任務以後總是打包上傳到Hadoop叢集，然後通過shell命令去啟動任務，然後在各個節點上去檢視Log日誌檔案，後來為了提高開發效率，需要找到通過Ecplise直接將MaprReduce任務直接提交到Hadoop叢集中。該章節講述使用者如何從Eclipse的壓縮包最終完成Eclipse提價任務給MapReduce叢集。

二、詳解

1、安裝Eclipse，安裝hadoop外掛

（1）首先下載Eclipse的壓縮包，然後可以從這裡下載hadoop 2.7.1的ecplise外掛和其他一些搭建環境中所需要的檔案，然後解壓ecplise，並放置到D盤中

（2）將下載的資源中的Hadoop-ecplise-plugin.jar 外掛放到ecplise的外掛目錄中： D:\ecplise\plugins\ 。然後開啟ecplise。

（3）將Hadoop-2.7.1解壓一份到D盤中，並配置相應的環境變數，並將%HADOOP_HOME%\bin 檔案加新增到Path環境中

（4）然後選在ecplise中配置hadoop外掛：

A、Window---->show view -----> other ,在其中選中MapReduce tool

B: Window---->Perspective------>Open Perspective -----> othrer

C : Window ----> Perferences ----> Hadoop Map/Reduce ,然後將剛剛解壓的檔案Hadoop檔案選中

D、配置HDFS連線：該MapReduce view中建立一個新的MapReduce連線

當做完這些，我們就能在Package Exploer 中看到DFS，然後衝中可以看到HDFS上的檔案：

2、進行MapReduce開發

（1）將hadoop-ecplise資料夾中的hadoopbin.zip進行解壓，將會得到下列檔案，並將這些檔案放入到HADOOP_HOME\bin目錄下，然後將hadoop.dll檔案放入到C:\Window\System32資料夾中

（2）從叢集中下載： log4j.properties,core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml 這五個檔案。然後寫出一個WordCount的例子，然後將這五個檔案放入到src資料夾下：

（3）修改mapred-site.xml和yarn-site.xml檔案

A、mapred-site.xml上新增一下幾個keyvalue鍵值：

<property>
<name>mapred.remote.os</name>
<value>Linux</value>
</property>

<property>
<name>mapreduce.app-submission.cross-platform</name>
<value>true</value>
</property>

<property>
<name>mapreduce.application.classpath</name>
<value>/home/hadoop/hadoop/hadoop-2.7.1/etc/hadoop,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/common/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/common/lib/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/hdfs/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/hdfs/lib/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/lib/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/yarn/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/yarn/lib/*</value>
</property>

B、yarn-site.xml檔案中新增一下引數：

<property>
<name>yarn.application.classpath</name>
<value>/home/hadoop/hadoop/hadoop-2.7.1/etc/hadoop,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/common/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/common/lib/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/hdfs/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/hdfs/lib/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/lib/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/yarn/*,
        /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/yarn/lib/*</value>
</property>

這裡需要解釋一下，在Hadoop2.6之前，因為其原始碼中適配了Linux作業系統中的環境變臉表示符號$，而當在window下使用這些程式碼是，因為兩個系統之間的變數符是不一樣的，所以會導致以下的錯誤

org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control

在Hadoop2.6之前需要通過修改原始碼後打jar包替換舊的Jar包檔案，具體的流程請看下面這篇部落格：

在這裡我們通過修改mapreduce.application.classpath 和 yarn.application.classpath這兩個引數，將其修改成絕對路徑，這樣就不會出現上述的錯誤。

（3）開始WordCount函式：

package wc;
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.classification.InterfaceAudience.Public;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.record.compiler.JBoolean;

public class WCMapReduce {
	
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException
	{
		Configuration conf=new Configuration();
		Job job=Job.getInstance(conf);
		job.setJobName("word count");
		job.setJarByClass(WCMapReduce.class);
		job.setJar("E:\\Ecplise\\WC.jar");
		//配置任務map和reduce類
		job.setMapperClass(WCMap.class);
		job.setReducerClass(WCReduce.class);
		//輸出型別
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		//檔案格式
		job.setInputFormatClass(TextInputFormat.class);
		job.setOutputFormatClass(TextOutputFormat.class);
		//設定輸出輸入路徑
		FileInputFormat.addInputPath(job,new Path("hdfs://192.98.12.234:9000/Test/"));
		FileOutputFormat.setOutputPath(job, new Path("hdfs://192.98.12.234:9000/result"));
		//啟動任務
		job.waitForCompletion(true);
	}
	
	public static class WCMap extends Mapper<LongWritable, Text, Text, IntWritable>
	{
		private static Text outKey=new Text();
		private static IntWritable outValue=new IntWritable(1);
		@Override
		protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			String words=value.toString();
			StringTokenizer tokenizer=new StringTokenizer(words,"\\s");
			while(tokenizer.hasMoreTokens())
			{
				String word=tokenizer.nextToken();
				outKey.set(word);
				context.write(outKey, outValue);
			}
		}
	}
	
	public static class WCReduce extends Reducer<Text, IntWritable, Text, IntWritable>
	{
		private static IntWritable outValue=new IntWritable(); 
		@Override
		protected void reduce(Text arg0, Iterable<IntWritable> arg1,
				Reducer<Text, IntWritable, Text, IntWritable>.Context arg2) throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			int sum=0;
			for(IntWritable i:arg1)
			{
				sum+=i.get();
			}
			outValue.set(sum);
			arg2.write(arg0,outValue);
		}
	}

}

需要注意的是，因為這裡實現的是遠端提交方法，所以在遠端提交時需要將任務的jar包傳送到叢集中，但是ecplise中並沒有自帶這種框架，因此需要先將jar打好在相應的檔案中，然後在程式中，通過下行程式碼指定jar的位置。

job.setJar("E:\\Ecplise\\WC.jar");

（4）配置提交任務的使用者環境變數：

如果windows上的使用者名稱稱和linux上啟動叢集的使用者名稱稱不相同時，則需要新增一個環境變數來實現任務的提交：

（5）執行結果

16/03/30 21:09:14 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/192.98.12.234:8032
16/03/30 21:09:14 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/30 21:09:14 INFO input.FileInputFormat: Total input paths to process : 1
16/03/30 21:09:14 INFO mapreduce.JobSubmitter: number of splits:1
16/03/30 21:09:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459331173846_0031
16/03/30 21:09:15 INFO impl.YarnClientImpl: Submitted application application_1459331173846_0031
16/03/30 21:09:15 INFO mapreduce.Job: The url to track the job: http://hadoop1:8088/proxy/application_1459331173846_0031/
16/03/30 21:09:15 INFO mapreduce.Job: Running job: job_1459331173846_0031
16/03/30 21:09:19 INFO mapreduce.Job: Job job_1459331173846_0031 running in uber mode : false
16/03/30 21:09:19 INFO mapreduce.Job:  map 0% reduce 0%
16/03/30 21:09:24 INFO mapreduce.Job:  map 100% reduce 0%
16/03/30 21:09:28 INFO mapreduce.Job:  map 100% reduce 100%
16/03/30 21:09:29 INFO mapreduce.Job: Job job_1459331173846_0031 completed successfully
16/03/30 21:09:29 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=19942
		FILE: Number of bytes written=274843
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=15533
		HDFS: Number of bytes written=15671
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=9860
		Total time spent by all reduces in occupied slots (ms)=2053
		Total time spent by all map tasks (ms)=2465
		Total time spent by all reduce tasks (ms)=2053
		Total vcore-seconds taken by all map tasks=2465
		Total vcore-seconds taken by all reduce tasks=2053
		Total megabyte-seconds taken by all map tasks=10096640
		Total megabyte-seconds taken by all reduce tasks=2102272
	Map-Reduce Framework
		Map input records=289
		Map output records=766
		Map output bytes=18404
		Map output materialized bytes=19942
		Input split bytes=104
		Combine input records=0
		Combine output records=0
		Reduce input groups=645
		Reduce shuffle bytes=19942
		Reduce input records=766
		Reduce output records=645
		Spilled Records=1532
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=33
		CPU time spent (ms)=1070
		Physical memory (bytes) snapshot=457682944
		Virtual memory (bytes) snapshot=8013651968
		Total committed heap usage (bytes)=368050176
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=15429
	File Output Format Counters 
		Bytes Written=15671

因為MapReduce任務在src檔案下配置那5個檔案時，會在本地種啟動任務。當任務在本地執行的，任務的名稱中就會出現local，而上述的任務名稱中並沒有出現local，因此成功將任務提交到了Linux 叢集中

Eclipse遠端提交MapReduce任務到Hadoop叢集

Eclipse遠端提交MapReduce任務到Hadoop叢集

IDE遠端提交mapreduce任務至linux，遇到ClassNotFoundException: Mapper

windows下idea中搭建hadoop開發環境，向遠端hadoop叢集提交mapreduce任務

解決eclipse遠端連線MapReduce，提示HADOOP_HOME or hadoop.home.dir are not set.

linux 系統 eclipse提交job到hadoop叢集上的一些坑

ubuntu搭建hadoop 2.7.2 Single Node Cluster及windows eclipse yarn提交Mapreduce筆記

windows用eclipse遠端連線Ubuntu下hadoop

Intellij IDEA遠端向hadoop叢集提交mapreduce作業

元資料與資料治理｜Intellij IDEA提交遠端Hadoop MapReduce任務（第八篇）

配置IDEA開發環境向遠端叢集提交MapReduce應用

Hadoop 學習筆記八任務遠端提交--Java遠端提交

hadoop叢集在eclipse中執行mapreduce的一些問題

windows系統作為driver遠端提交任務給spark standalone叢集demo

SparkSubmit.main（）方法提交外部引數，遠端提交standalone叢集任務

hadoop學習之HDFS（2.5）：windows下eclipse遠端連線linux下的hadoop叢集並測試wordcount例子

hadoop 把mapreduce任務從本地提交到hadoop集群上運行

hadoop shell命令遠端提交

win7 系統eclipse環境下測試執行hadoop 的 wordcount mapreduce。

win10配置eclipse開發環境及執行hadoop例項及叢集執行

hadoop叢集執行jar包報錯（eclipse導jar）

Eclipse遠端提交MapReduce任務到Hadoop叢集

相關推薦