1. 程式人生 > >hadoop[11]-本地執行模式

hadoop[11]-本地執行模式

每次除錯都打包上傳到伺服器,效率很低,所以可以在本地模擬執行,以第9節的程式碼為例,設定要處理的文字和輸出目錄為本地目錄:

//設定要處理的文字資料存放路徑
FileInputFormat.setInputPaths(wordCountJob, "d:/wordcount/srcdata");
//設定最終輸出結果存放路徑
FileOutputFormat.setOutputPath(wordCountJob, new Path("d:/wordcount/output"));

完整程式碼如下:

package com.wange;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCountJobSubmitter {
public static void main(String[] args) throws Exception { //System.setProperty("hadoop.home.dir", "E:/soft/hadoop-2.4.1"); Configuration config = new Configuration(); // 是否在本地執行,本質上是一下兩個引數。沒設定則會在本地模擬執行,設定了就會提交到yarn執行 //config.set("mapreduce.framework.name", "yarn"); //config.set("yarn.resourcemanager.hostname", "hadoop-server-00:9000");
// 執行在遠端的yarn叢集中 Job wordCountJob = Job.getInstance(config); //指定job所在的jar包 wordCountJob.setJarByClass(WordCountJobSubmitter.class); //設定mapper和reduce邏輯類 wordCountJob.setMapperClass(WordCountMapper.class); wordCountJob.setReducerClass(WordCountReducer.class); //設定map和reduce階段輸出的kv資料型別 wordCountJob.setMapOutputKeyClass(Text.class); wordCountJob.setMapOutputValueClass(IntWritable.class); wordCountJob.setOutputKeyClass(Text.class); wordCountJob.setOutputValueClass(IntWritable.class); //設定要處理的文字資料存放路徑 //FileInputFormat.setInputPaths(wordCountJob, "hdfs://hadoop-server-00:9000/wordcount/srcdata/"); FileInputFormat.setInputPaths(wordCountJob, "d:/wordcount/srcdata"); //設定最終輸出結果存放路徑 //FileOutputFormat.setOutputPath(wordCountJob, new Path("hdfs://hadoop-server-00:9000/wordcount/output/")); FileOutputFormat.setOutputPath(wordCountJob, new Path("d:/wordcount/output")); // 提交給hadoop叢集,true 是否要列印處理資訊 wordCountJob.waitForCompletion(true); } }
View Code

然後執行main程式,會遇到一些小坑,執行不起來。

坑1:Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the co

解決方法:需要引入jar包

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-common</artifactId>
    <version>2.4.1</version>
</dependency>

坑2:Exception in thread "main" java.lang.NullPointerException

解決方法:設定 System.setProperty("hadoop.home.dir", "E:/soft/hadoop-2.4.1");  下載windows下的執行需要的庫檔案:https://pan.baidu.com/s/17lkdxPTcKeWN-puLEqqXKw 提取碼: ds5k,將下載的檔案解壓到本地hadoop的bin目錄下,此處的本地目錄為:E:\soft\hadoop-2.4.1\bin

這樣就可以完美運行了,在本地模擬執行,也可以使用hdfs的路徑,如:

FileInputFormat.setInputPaths(wordCountJob, "hdfs://hadoop-server-00:9000/wordcount/srcdata/");
FileOutputFormat.setOutputPath(wordCountJob, new Path("hdfs://hadoop-server-00:9000/wordcount/output/"));

執行的時候會出現許可權問題,需要登入到hdfs伺服器,設定目錄許可權就可以了,設定許可權命令為:hadoop fs -chmod 777 /wordcount