windows Hadoop開發環境搭建及遠端提交

阿新 • • 發佈：2019-01-02

這篇文章將介紹如何搭建hadoop的開發環境，並且詳細描述如何通過intellij idea開發hadoop的map-reduce程式以及遠端提交。
前提：

需要在本機下載hadoop,不需要修改配置安裝，但需要設定下hadoop_home,java_home等
下載winutils,並解壓放在$Hadoop_HOME/bin目錄下
如果叢集配置中都是指定的主機名，那麼需要在你本機hosts中加上叢集主機解析（不加也可以，就是不太方便）

方法一：maven專案

1、intellij idea建立maven專案這裡就不多說了，先建立一個maven專案。
2、配置pom.xml檔案，補全pom.xml檔案之後，idea會自動下載jar包並引入。

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId 
>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId 
>hadoop-hdfs</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.8.0</version>
</dependency>
</dependencies>

方法二：新建java專案

1、intellij idea建立java專案

2、新增依賴

這裡寫圖片描述

匯入成功後

這裡寫圖片描述

3、編寫程式碼

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.StringTokenizer;

public class WordCount {

public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

private static void deleteDir(Configuration conf, String dirPath) throws IOException {
FileSystem fs = FileSystem.get(conf);
Path targetPath = new Path(dirPath);
if (fs.exists(targetPath)) {
boolean delResult = fs.delete(targetPath, true);
if (delResult) {
System.out.println(targetPath + " has been deleted sucessfullly.");
} else {
System.out.println(targetPath + " deletion failed.");
}
}

}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
/* String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
//先刪除output目錄
deleteDir(conf, otherArgs[otherArgs.length - 1]);*/
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

統計 args第一個引數對應的檔案目錄中所有檔案中單詞出現的次數
輸出結果在第二個引數對應的檔案目錄中會自動建立目錄執行前要保證目錄不存在

4、編輯configuration
這裡寫圖片描述

5、執行成功

這裡寫圖片描述

遠端配置

新建Resource目錄，配置為專案Resources

這裡寫圖片描述

新增core-site.xml檔案到Resource目錄下

這裡寫圖片描述

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.89.135:9000</value>
</property>
</configuration>

可以直接從Hadoo的配置檔案複製過來

修改configurations
修改輸入輸入檔案地址為遠端hdfs地址
這裡寫圖片描述

本地提交

如果你的hadoop和idea在同一臺伺服器上，那麼你可以選擇Local提交
1、把coer-site.xml、log4j.properties複製到專案的原始碼根目錄下（保證編譯後在class目錄下能找到該兩個檔案），為什麼要這樣呢？因為你直接在idea中提交job，會載入class資料夾下的配置檔案，如果沒有log4j.properties檔案，則會提示log4j沒有初始化，結果是沒有任務資訊列印。core-site.xml一樣，如果不放到原始檔目錄下，則會報hdfs許可權等問題。
2、在idea中直接執行該類的主方法，就可以提交到本地hadoop偽分佈安裝模式上了，可以對程式碼進行除錯。
3、注意:我們在hadoop的配置檔案mapred-site.xml指定了YARN排程，但是提交job的時候，根據debug之後發現，呼叫的是LocalCluster。並沒有使用YARN.有如下兩點原因：
【原因1：】需要把mared-site.xml檔案和yarm.xml檔案放到resource資料夾下
【原因2：】需要把檔案程式打包才能進行遠端提交job見：下一節遠端提交

遠端提交

如果你的hadoop是叢集或者是其他伺服器，idea在不同的伺服器你可以選擇遠端提交，在hadoop-2.8.0中使用YARN進行排程。
1、把core-site.xml、hdfs-site.xml、mapred-site.xml、yarn.xml、log4j.properties等檔案放到resource目錄，如果不新增這些檔案，相關設定需要在程式碼中指定

conf.set("mapreduce.job.jar", "E:\\hadoop\\myhadoop\\out\\artifacts\\wordcount\\wordcount.jar");//指定Jar包，也可以在job中設定
conf.set("mapreduce.framework.name", "yarn");//以yarn形式提交
conf.set("yarn.resourcemanager.hostname", "master");
conf.set("mapreduce.app-submission.cross-platform", "true");//跨平臺提交

如果叢集設定了hdfs訪問許可權限制，比如開啟了指定使用者xxx才能訪問那麼可以在程式裡設定

System.setProperty("HADOOP_USER_NAME", "xxx")

2、先把該project進行打包,使用maven或者idea的自動打包功能進行打包

maven

mvn package

Idea自動打包

因為叢集上已經有了相關的環境，這裡打包就不用新增依賴到了，選擇Empty。這樣除錯時Build速度快。

Project Structure=>Artifacts=> 點左上角的 + =>Empty =>Output Layout + => Module Output =>選擇專案資料夾=>點選jar包，設定MainClass 即可

3、需要在程式程式碼中設定job.setJar

job.setJar("E:\\hadoop\\myhadoop\\out\\artifacts\\wordcount\\wordcount.jar");

4、程式程式碼中：10020埠是hadoop歷史服務，需要在伺服器端啟動

mr-jobhistory-daemon.sh start historyserver & #啟動歷史服務

5、在idea中執行程式，就提交了job，並且該種job提交方式還可以進行在idea中進行原始碼除錯。

6、自動提交Jar包到叢集上（非必須）
Tools -> Deployment -> Configuration點選左上角 + ，Type選擇SFTP，然後配置伺服器ip和部署路徑，使用者名稱、密碼等選項之後選擇自動部署，這樣每次修改都會自動部署到伺服器，也可以右鍵，選擇Deployment，upload to …

常見問題：

問題1：

Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: Could not locate Hadoop executable: E:\hadoop-2.8.0\bin\winutils.exe -see https://wiki.apache.org/hadoop/WindowsProblems
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:716)
    at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:250)
    at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:267)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:771)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:515)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:555)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:533)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:313)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:146)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
    at WordCount.main(WordCount.java:92)
Caused by: java.io.FileNotFoundException: Could not locate Hadoop executable: E:\hadoop-2.8.0\bin\winutils.exe -see https://wiki.apache.org/hadoop/WindowsProblems
    at org.apache.hadoop.util.Shell.getQualifiedBinInner(Shell.java:598)
    at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:572)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:669)
    at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:441)
    at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:487)
    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
    at WordCount.main(WordCount.java:71)
Process finished with exit code 1

解決辦法：將winutil.exe放在$HADOOP_HOME/bin目錄下

問題2：

2017-08-04 12:31:00,668 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-08-04 12:31:01,230 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1181)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-08-04 12:31:01,230 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(79)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-08-04 12:31:01,495 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(171)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2017-08-04 12:31:01,542 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(289)) - Total input files to process : 1
2017-08-04 12:31:01,870 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(200)) - number of splits:1
2017-08-04 12:31:02,104 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_local1047774324_0001
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:606)
    at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:958)
    at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:203)
    at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190)
    at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:314)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:377)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
    at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:125)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:171)
    at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:758)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:242)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
    at java.security.AccessController.doPrivileged(Native Method)
2017-08-04 12:31:02,167 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(251)) - Cleaning up the staging area file:/tmp/hadoop/mapred/staging/alex1047774324/.staging/job_local1047774324_0001
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
    at WordCount.main(WordCount.java:92)

解決辦法：缺少hadoop.dll，把hadoop.dll放在$HADOOP_HOME/bin目錄下

問題3：

2017-08-04 12:47:49,125 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(897)) - Retrying connect to server: master/192.168.89.135:9000. Already tried 0 time(s); maxRetries=45

解決辦法：遠端主機沒有啟動hadoop,若啟動了檢查是否關閉了firewalld.service和iptables.service

問題4：

2017-11-29 21:10:22,214 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(123)) - Connecting to ResourceManager at master/192.168.89.136:8032
2017-11-29 21:10:23,259 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(289)) - Total input files to process : 1
2017-11-29 21:10:24,216 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(200)) - number of splits:1
2017-11-29 21:10:24,769 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_1511957984981_0007
2017-11-29 21:10:24,984 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(296)) - Submitted application application_1511957984981_0007
2017-11-29 21:10:25,024 INFO  [main] mapreduce.Job (Job.java:submit(1345)) - The url to track the job: http://master:8088/proxy/application_1511957984981_0007/
2017-11-29 21:10:25,024 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1390)) - Running job: job_1511957984981_0007
2017-11-29 21:10:28,088 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1411)) - Job job_1511957984981_0007 running in uber mode : false
2017-11-29 21:10:28,090 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1418)) -  map 0% reduce 0%
2017-11-29 21:10:28,164 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1431)) - Job job_1511957984981_0007 failed with state FAILED due to: Application application_1511957984981_0007 failed 2 times due to AM Container for appattempt_1511957984981_0007_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1511957984981_0007_02_000001
Exit code: 1
Exception message: /bin/bash: line 0: fg: no job control

Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 1
For more detailed output, check the application tracking page: http://master:8088/cluster/app/application_1511957984981_0007 Then click on links to logs of each attempt.
. Failing the application.
2017-11-29 21:10:28,199 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1436)) - Counters: 0

Process finished with exit code 1

這是因為windows 和遠端Linux叢集跨平臺造成的

解決辦法：在程式碼中新增

conf.set("mapreduce.app-submission.cross-platform", "true");

windows Hadoop開發環境搭建及遠端提交

方法一：maven專案

方法二：新建java專案

遠端配置

本地提交

遠端提交

常見問題：

windows Hadoop開發環境搭建及遠端提交

spark JAVA 開發環境搭建及遠端除錯

windows下idea中搭建hadoop開發環境，向遠端hadoop叢集提交mapreduce任務

NDK在windows下的開發環境搭建及開發過程

Windows下JAVA開發環境搭建及環境變數配置

windows本地sparkstreaming開發環境搭建及簡單例項

Windows下Vue開發環境搭建及相關問題

Spark+ECLIPSE+JAVA+MAVEN windows開發環境搭建及入門例項【附詳細程式碼】

Windows下Python開發環境搭建及 Python的HelloWorld示例

Windows下Python開發環境搭建及Pycharm安裝

Cordova 開發環境搭建及創建第一個app

Linux鞏固記錄（1） J2EE開發環境搭建及網絡配置

vuejs開發環境搭建及熱更新

ROS開發環境搭建及入門

智能合約開發環境搭建及Hello World合約

Java配置----JDK開發環境搭建及環境變量配置

JDK開發環境搭建及環境變量配置

spark JAVA 開發環境搭建及遠程調試

20181117--深入淺出區塊鏈智慧合約開發環境搭建及Hello World合約

Go：windows下go環境搭建及IDE安裝過程

windows Hadoop開發環境搭建及遠端提交

方法一：maven專案

方法二：新建java專案

遠端配置

本地提交

遠端提交

常見問題：

相關推薦