1. 程式人生 > >Mac下hadoop運行word count的坑

Mac下hadoop運行word count的坑

ack world apache 默認 轉換成 OS 刪除 .lib logs

Mac下hadoop運行word count的坑

Word count體現了Map Reduce的經典思想,是分布式計算中中的hello world。然而博主很幸運地遇到了Mac下特有的問題Mkdirs failed to create,特此記錄

一、代碼

  1. WCMapper.java
package wordcount;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.StringUtils;
import java.io.IOException; /** * 四個泛型中,前兩個是指mapper輸入的數據類型 * KEYIN是輸入的key類型,VALUEIN是輸入的value類型 * map和reduce的數據輸入輸出都是以key-value對的形式分裝的 * 默認情況下,框架傳遞給我們的mapper的輸入數據中 * key是要處理的文本中第一行的起始偏移量,value是這一行的內容 * * Long->LongWritable實現hadoop自己的序列化接口,內容更精簡,傳輸效率高 * String->Text */ public class WCMapper extends
Mapper<LongWritable, Text, Text, LongWritable>{ //mapreduce框架每一行數據就調用一次改方法 @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { // 具體的業務邏輯就寫在這個方法中,而且需要的處理的key-value已經傳遞進來 // 將這一行的內容轉換成string String line = value.toString
(); // 切分單詞 String[] words = StringUtils.split(line, ‘ ‘); // 通過context把結果輸出 for (String word: words){ context.write(new Text(word), new LongWritable(1)); } } }
  1. WCReducer.java
package wordcount;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
    // 框架在map處理完成之後,將所有k-v對緩存起來
    // 進行分組,然後傳遞一個組<key, values{}>
    // 調用一次reduce方法
    @Override
    protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
        long count = 0;
        // 遍歷values,累加求和
        for (LongWritable value: values){
            count += value.get();
        }
        // 輸出這一個單詞的統計結果
        context.write(key, new LongWritable(count));
    }
}
  1. WCRunner.java(啟動項)
package wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;


/**
 * 用來描述一個特定的作業
 * 比如,該作業使用哪個類作為邏輯處理的map,哪個作為reduce
 * 還可以指定該作業要需要的數據所在的路徑
 * 還可以指定該作業輸出的結果放到哪個路徑
 */
public class WCRunner {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        // 設置整個job需要的jar包
        // 通過WCRuner來找到其他依賴WCMapper和WCReducer
        job.setJarByClass(WCRunner.class);

        // 本job使用的mapper和reducer類
        job.setMapperClass(WCMapper.class);
        job.setReducerClass(WCReducer.class);

        // 指定reducer的輸出kv類型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);

        // 指定mapper的輸出kv類型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        // 指定原始數據存放在哪裏
        FileInputFormat.setInputPaths(job,new Path("/wc/input/"));
        // 指定處理結果的輸出數據存放在哪裏
        FileOutputFormat.setOutputPath(job, new Path("/wc/output/"));

        // 將job提交運行
        job.waitForCompletion(true);

    }
}

二、問題重現

寫好代碼後打包成jar,博主是用IDEA直接圖形化操作的,然後提交到hadoop上運行

hadoop jar hadoopStudy.jar wordcount.WCRunner

結果未像官網和其他很多教程中說的那樣出結果,而是報錯

Exception in thread "main" java.io.IOException: Mkdirs failed to create /var/folders/vf/rplr8k812fj018q5lxcb5k940000gn/T/hadoop-unjar1598612687383099338/META-INF/license
    at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:146)
    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:119)
    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:94)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:227)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:153)

最後折騰了半天,發現是Mac的問題,在stackoverflow中找到解釋

The issue is that a /tmp/hadoop-xxx/xxx/LICENSE file and a
/tmp/hadoop-xxx/xxx/license directory are being created on a
case-insensitive file system when unjarring the mahout jobs.

刪除原來壓縮包的META-INF/LICENS,再重新壓縮,解決問題~

zip -d hadoopStudy.jar META-INF/LICENSE
jar tvf hadoopStudy.jar | grep LICENSE

技術分享圖片

然後把新的jar上傳到hadoop上運行

hadoop jar hadoopStudy.jar wordcount.WCRunner

技術分享圖片

bingo!

三、運行結果

順便用瀏覽器看一下運行結果

  • 輸入文件wc/input/input.txt

技術分享圖片

  • 輸出文件/wc/output/part-r-00000]

技術分享圖片

運行結果顯然是正確的,再也不敢隨便說Mac大法好了……

Mac下hadoop運行word count的坑