Hadoop Mapreduce模板

阿新 • • 發佈：2017-07-23

-h util apr final drive extend extends rabl private

技術分享

Mapper

 1 package com.scb.jason.mapper;
 2 
 3 import org.apache.hadoop.io.IntWritable;
 4 import org.apache.hadoop.io.LongWritable;
 5 import org.apache.hadoop.io.Text;
 6 import org.apache.hadoop.mapreduce.Mapper;
 7 
 8 import java.io.IOException;
 9 import java.util.StringTokenizer;
10 
11 /**
12 
  * Created by Administrator on 2017/7/23.
13  */
14 // Step 1: Map Class
15 public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
16 
17     private Text mapOutputkey =  new Text();
18     private final static IntWritable mapOutputValue = new IntWritable(1);
19 
20     @Override
 
21     protected void setup(Context context) throws IOException, InterruptedException {
22         super.setup(context);
23     }
24 
25     @Override
26     protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
27         String lineValue = value.toString();
 
28         StringTokenizer stringTokenizer = new StringTokenizer(lineValue);
29         while(stringTokenizer.hasMoreTokens()){
30             String wordValue = stringTokenizer.nextToken();
31             mapOutputkey.set(wordValue);
32             context.write(mapOutputkey,mapOutputValue);
33         }
34     }
35 
36     @Override
37     protected void cleanup(Context context) throws IOException, InterruptedException {
38         super.cleanup(context);
39     }
40 
41     @Override
42     public void run(Context context) throws IOException, InterruptedException {
43         super.run(context);
44     }
45 }

Reducer

 1 package com.scb.jason.reducer;
 2 
 3 import org.apache.hadoop.io.IntWritable;
 4 import org.apache.hadoop.io.Text;
 5 import org.apache.hadoop.mapreduce.Reducer;
 6 
 7 import java.io.IOException;
 8 
 9 /**
10  * Created by Administrator on 2017/7/23.
11  */
12 // Step 2: Reduce Class
13 public class WordCountReducer extends Reducer<Text, IntWritable,Text,IntWritable> {
14 
15     private IntWritable outputValue = new IntWritable();
16 
17     @Override
18     protected void setup(Context context) throws IOException, InterruptedException {
19         super.setup(context);
20     }
21 
22     @Override
23     protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
24         int sum = 0;
25         for(IntWritable value:values){
26             sum += value.get();
27         }
28         outputValue.set(sum);
29         context.write(key,outputValue);
30     }
31 
32     @Override
33     protected void cleanup(Context context) throws IOException, InterruptedException {
34         super.cleanup(context);
35     }
36 
37     @Override
38     public void run(Context context) throws IOException, InterruptedException {
39         super.run(context);
40     }
41 }

Driver

 1 package com.scb.jason.driver;
 2 
 3 import com.scb.jason.mapper.WordCountMapper;
 4 import com.scb.jason.reducer.WordCountReducer;
 5 import org.apache.hadoop.conf.Configuration;
 6 import org.apache.hadoop.conf.Configured;
 7 import org.apache.hadoop.fs.FileSystem;
 8 import org.apache.hadoop.fs.Path;
 9 import org.apache.hadoop.io.IntWritable;
10 import org.apache.hadoop.io.Text;
11 import org.apache.hadoop.mapreduce.Job;
12 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
13 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
14 import org.apache.hadoop.util.Tool;
15 import org.apache.hadoop.util.ToolRunner;
16 
17 import java.io.IOException;
18 
19 /**
20  * Created by Administrator on 2017/7/17.
21  */
22 public class WordCount extends Configured implements Tool {
23 
24     // Step 3: Driver
25     public int run(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
26         Configuration configuration = new Configuration();
27         FileSystem fs = FileSystem.get(configuration);
28 
29         Job job = Job.getInstance(configuration,this.getClass().getSimpleName());
30         job.setJarByClass(this.getClass());
31 
32         Path input = new Path(args[0]);
33         FileInputFormat.addInputPath(job,input);
34 
35         job.setMapperClass(WordCountMapper.class);
36         job.setMapOutputKeyClass(Text.class);
37         job.setMapOutputValueClass(IntWritable.class);
38 
39         job.setReducerClass(WordCountReducer.class);
40         job.setMapOutputKeyClass(Text.class);
41         job.setMapOutputValueClass(IntWritable.class);
42 
43         Path outPath = new Path(args[1]);
44         if(fs.exists(outPath)){
45             fs.delete(outPath,true);
46         }
47         FileOutputFormat.setOutputPath(job,outPath);
48 
49         boolean isSuccess = job.waitForCompletion(true);
50         return isSuccess?1:0;
51     }
52 
53     public static void main(String[] args) throws Exception {
54         int exitCode = ToolRunner.run(new WordCount(),args);
55         System.exit(exitCode);
56     }
57 
58 }

Hadoop Mapreduce模板

-h util apr final drive extend extends rabl private Mapper 1 package com.scb.jason.mapper; 2 3 import org.apache.hadoop.io.IntWrita

Hadoop Mapreduce之WordCount實現

註意 com split gin 繼承 [] leo ring exce 1.新建一個WCMapper繼承Mapper public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritab

16-hadoop-mapreduce簡介

沒有 size 需求 val reduce 進行 light mapper merger mapreduce是hadoop的核心組件, 設計理念是移動計算而不是移動數據, mapreduce的思想是‘分而治之‘, 將復雜的任務分解成幾個簡單的任務去執行 1, 數據和計算規

Hadoop MapReduce輸入輸出類型

imu finally configure 獲得命名 pfile 計算 uil 大文件一、輸入格式　　1、輸入分片split 　　　　　　一個分片對應一個map任務；　　　　　　一個分片包含一個表（整個文件）上的若幹行，而一條記錄（單行）對應一行；　　　　　　分片

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/input

utf test exceptio 執行 cep exc 文件目錄 XML 配置原我是這樣寫的 //輸入數據所在的文件目錄 FileInputFormat.addInputPath(job, new Path("/input/")); //mapreduce執行後

使用hadoop mapreduce分析mongodb數據

Hadoop MapReduce 官方教程 -- WordCount示例

get pre red oop hadoop apache tor ria pac Hadoop MapReduce 官方教程 -- WordCount示例： http://hadoop.apache.org/docs/r1.0.4/cn/mapred_tutorial.h

hadoop mapreduce

path comm apach 配置日誌 src 寫在前面 onf extends log4j 寫在前面：需要保證hadoop版本各個jar版本一致，否則可能出現各種哦莫名奇妙的錯誤！ maven 依賴： <?xml version="1.0" e

hadoop-mapreduce-(1)-統計單詞數量

fig pack lib let ack 函數 text dex pri 編寫map程序 package com.cvicse.ump.hadoop.mapreduce.map; import java.io.IOException; import org.apach

hadoop mapreduce開發實踐之HDFS文件分發by streaming

submit ast nap direct 如同 lis slots cal ado 1、分發HDFS文件（-cacheFile）需求：wordcount（只統計指定的單詞），但是該文件非常大,可以先將該文件上傳到hdfs，通過-cacheFile的方式進行分發； -ca

hadoop mapreduce開發實踐之HDFS壓縮文件（-cacheArchive）

delete info dset odi .gz .tar.gz package cal 2.6.0 1、分發HDFS壓縮文件（-cacheArchive）需求：wordcount（只統計指定的單詞【the,and,had...】），但是該文件存儲在HDFS上的壓縮文件,

hadoop mapreduce開發實踐之輸出數據壓縮

實踐 shuff file apr 存儲壓縮 ras 最終 item 1、hadoop 輸出數據壓縮 1.1、為什麽要壓縮？輸出數據較大時，使用hadoop提供的壓縮機制對數據進行壓縮，可以指定壓縮的方式。減少網絡傳輸帶寬和存儲的消耗；可以對map的輸出進行壓縮（m

[Hadoop]-MapReduce-使用篇

上下 str ext TP 排序 input 數據 void throws 1.Mapper 　　1.1 泛型參數　　　　Mapper有四個泛型參數,Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>: 　　　　　　KEYIN:

hadoop基礎之初識Hadoop MapReduce架構

沒有 bsp NPU 有一個簡單 ont hdf image 運行 Hadoop的mapreduce是一個快速、高效、簡單用於編寫的並運行處理大數據程序並應用在大數據集群上的編程框架。它將復雜的、運行於大規模集群上的並行計算過程高度的抽象到兩個函數：map、reduce。

hadoop 3.1.1 Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

name ado org 3.1.1 div 9.png per mapred 技術分享啟動hdfs後執行share目錄中自帶的mapreduce程序時報如下錯誤找到$HADOOP_HOME/etc/mapred-site.xml,增加以下配置 1 <p

hadoop MapReduce java示例

method prope import .lib mapper key maven sna artifact wordcount工作流程input-> 拆分Split->映射map->派發Shuffle->縮減reduce->outputhad

Hadoop Mapreduce運行流程

rgs sub frame 退出 16px extend cte ont 提交 Mapreduce的運算過程為兩個階段：　　第一個階段的map task相互獨立，完全並行；　　第二個階段的reduce task也是相互獨立，但依賴於上一階段所有map task並發實例的

關於Hadoop MapReduce 執行少包問題解決

這是一個Hadoop中極為常見的丟包少類的問題，希望能幫到大家問題描述命令：hadoop jar 執行包主函式引數-1 引數-2 執行產生異常異常一： Exit code: 1 Stack trace: ExitCodeException exitCo

Hadoop Mapreduce的shuffle過程詳解

1、map task讀取資料時預設呼叫TextInputFormat的成員RecoreReader，RecoreReader呼叫自己的read()方法，進行逐行讀取，返回一個key、value; 2、返回的key、value交給自定義的map方法，輸出的context.write(key,value)，再交

Hadoop-mapreduce 程式在windows上執行需要注意的問題

1.在主程式中需要新增這幾個引數配置 Configuration conf = new Configuration(); // 1、設定job執行時要訪問的預設檔案系統 conf.set("fs.defaultFS", HADOOP_ROOT_PATH);

Hadoop Mapreduce模板

相關推薦