MapReduce簡單案例

阿新 • • 發佈：2020-07-27

MapReduce簡單案例

MapReduce簡單案例

案例一檔案合併和去重操作

對於兩個輸入檔案，即檔案A和檔案B，請編寫MapReduce程式，對兩個檔案進行合併，並剔除其中重複的內容，得到一個新的輸出檔案C。下面是輸入檔案和輸出檔案的一個樣例供參考。

輸入檔案A的樣例如下：

資料
20150101 x
20150103 x
20150104 y
20150102 y
20150105 z
20150106 x

輸入檔案B的樣例如下：

資料
20150101 y
20150102 y
20150103 x
20150104 z
20150105 y

根據輸入檔案A和B合併得到的輸出檔案C的樣例如下：

資料
20150101 x
20150101 y
20150102 y
20150103 x
20150104 y
20150104 z
20150105 y
20150105 z
20150106 x

程式碼：

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class hebing {
    public static class Mymapper extends Mapper<Object, Text, Text, Text> {  
        public void map(Object key, Text value, Context content) throws IOException, InterruptedException {  
            content.write(value, new Text(""));  
        }  
    }  
        public static class Myreducer extends Reducer<Text, Text, Text, Text> {  
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {  
            context.write(key, new Text(""));  
        }  
    }
        public static void main(String[] args) throws Exception{

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf,"hebing");
        job.setJarByClass(hebing.class);
        job.setMapperClass(hebing.Mymapper.class);
        job.setCombinerClass(hebing.Myreducer.class);
        job.setReducerClass(hebing.Myreducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/input"));
        FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/output"));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

案例二實現對輸入檔案的排序

現在有多個輸入檔案，每個檔案中的每行內容均為一個整數。要求讀取所有檔案中的整數，進行升序排序後，輸出到一個新的檔案中，輸出的資料格式為每行兩個整數，第一個數字為第二個整數的排序位次，第二個整數為原待排列的整數。下面是輸入檔案和輸出檔案的一個樣例供參考。

輸入檔案1的樣例如下：

資料
33
37
12
40

輸入檔案2的樣例如下：

資料
4
16
39
5

輸入檔案3的樣例如下：

資料
1
45
25

根據輸入檔案1、2和3得到的輸出檔案如下：

序號	資料
1	1
2	4
3	5
4	12
5	16
6	25
7	33
8	37
9	39
10	40
11	45

程式碼：

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class Sort {
    public static class Mymapper extends Mapper<Object, Text, IntWritable, IntWritable>{
        private static IntWritable v = new IntWritable();
        public void map(Object key, Text value, Context context) throws IOException,InterruptedException{
        v.set(Integer.parseInt(value.toString()));
        context.write(v, new IntWritable(1));
        }
    }
    public static class Myreducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable>{
        private static IntWritable line_num = new IntWritable(1);
        public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException{
           for(IntWritable num : values) {
             context.write(line_num, key);
             line_num = new IntWritable(line_num.get() + 1);
    }
  }
}
    public static void main(String[] args) throws Exception{
    	/**Designed by 王立同**/
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf,"Sort");
        job.setJarByClass(Sort.class);
        job.setMapperClass(Sort.Mymapper.class);
        job.setReducerClass(Sort.Myreducer.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/input"));
        FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/output"));
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }
}

案例三對給定的表格進行資訊挖掘

下面給出一個child-parent的表格，要求挖掘其中的父子輩關係，給出祖孫輩關係的表格。輸入檔案內容如下：

child	parent
Steven	Lucy
Steven	Jack
Jone	Lucy
Jone	Jack
Lucy	Mary
Lucy	Frank
Jack	Alice
Jack	Jesse
David	Alice
David	Jesse
Philip	David
Philip	Alma
Mark	David
Mark	Alma

輸出檔案內容如下：

grandchild	grandparent
Steven	Alice
Steven	Jesse
Jone	Alice
Jone	Jesse
Steven	Mary
Steven	Frank
Jone	Mary
Jone	Frank
Philip	Alice
Philip	Jesse
Mark	Alice
Mark	Jesse

程式碼：

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class Child2Parent {
    public static class Mymapper extends Mapper<Object, Text, Text, Text>{
        public void map(Object key, Text value, Context context) throws IOException,InterruptedException{
             String[] cap=value.toString().split("[\\s|\\t]+");//分割資料
              if (!"child".equals(cap[0])) {
                  String cName = cap[0];
                  String pName = cap[1];
                  context.write(new Text(pName), new Text("r#"+cName));//打標籤
                  context.write(new Text(cName), new Text("l#"+pName));
              }
        }
    }
    public static class Myreduce extends Reducer<Text, Text, Text, Text>{
    	public static int runtime = 0;
        public void reduce(Text key, Iterable<Text> values,Context context) throws IOException,InterruptedException{
            if (runtime == 0) {
                context.write(new Text("grandchild"), new Text("grandparent"));
                runtime++;
            }
            List<String> grandChild = new ArrayList<>();
            List<String> grandParent = new ArrayList<>();
            for (Text text : values) {
                String[] relation = text.toString().split("#");
                if ("l".equals(relation[0])) {
                    grandChild.add(relation[1]);
                } else {
                    grandParent.add(relation[1]);
                }
            }
            for (String l:grandChild) {
                for (String r:grandParent) {
                	context.write(new Text(r), new Text(l));
                }
            }
        }
    }
    public static void main(String[] args) throws Exception{
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf,"TableJoin");
        job.setJarByClass(Child2Parent.class);
        job.setMapperClass(Child2Parent.Mymapper.class);
        job.setReducerClass(Child2Parent.Myreduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/input"));
        FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/output"));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

MapReduce簡單案例

MapReduce簡單案例目錄MapReduce簡單案例案例一檔案合併和去重操作案例二實現對輸入檔案的排序案例三對給定的表格進行資訊挖掘

SpringBoot整合mybatis簡單案例過程解析

這篇文章主要介紹了SpringBoot整合mybatis簡單案例過程解析,文中通過示例程式碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下

詳解mac python+selenium+Chrome 簡單案例

第一步：下載selenium pip install selenium 第二步：下載和你當前谷歌瀏覽器對應的驅動Chromedriver

cc分享java：Flex 容器的常用屬性與簡單案例

專用於Flex容器上的常用屬性 1.display:flex: 宣告是flex box容器容器內的子元素就會成為彈性專案

kafka第三天（簡單案例程式碼）

kafka配置類 /** * @ClassName * @Description TODO * @AUTHOR admin * @DATE 2020/6/29 17:35 */ @Configuration

轉載——Python實現Reids任務佇列（簡單案例）

首先得了解redis資料庫的lpush和brpop命令： 1.Redis Lpush 命令將一個或多個值插入到列表頭部。如果 key 不存在，一個空列表會被建立並執行 LPUSH 操作。當 key 存在但不是列表型別時，返回一個錯誤。

socket簡單案例實現

socket簡單案例實現終於還是吃了自己的狗糧...... 關於客戶端-服務端網路模型

單元測試框架unittest介紹和簡單案例

單元測試單元測試（unit testing）是指對軟體中的最小可測試單元進行檢查和驗證。對於單元測試中單元的含義，一般來說，要根據實際情況去判定其具體含義，如C語言中單元指一個函式，Java裡單元指一個類，圖形化的軟

vue簡單案例_動態新增刪除使用者資料

1 <!DOCTYPE html> 2 <html lang=\"en\"> 3 <head> 4<meta charset=\"UTF-8\"> 5<title>新增刪除使用者資料</title>

flask框架開啟定時任務簡單案例flask_apscheduler

#所需模組flask_apscheduler #encodig=utf-8 from flask import Flask, request from flask_apscheduler import APScheduler

0011.MapReduce程式設計案例2

目錄05-26-實現自連線的MapReduce程式05-27-分析倒排索引的過程倒排索引資料處理的過程.png05-28-使用MapReduce實現倒排索引105-29-使用MapReduce實現倒排索引2使用MapReduce實現倒排索引05-30-使用MRUnit05-31-第一

Python-多程序_多執行緒簡單案例

import os import multiprocessing 複製檔案 def copy_file(file_name,source_dir,dest_dir): #拼接原始檔路徑和目標檔案路徑

樸素貝葉斯學習日誌——簡單案例python計算過程

思路 1、根據貝葉斯公式：P（輸出|輸入）=P（輸入|輸出）*P（輸出）/P（輸入）

python uuid生成唯一id或str的最簡單案例

介紹： UUID是128位的全域性唯一識別符號，通常由32位元組的字串表示。使用：

MapReduce練習案例3 - 自定義分割槽

技術標籤：大資料MapReduceHadoop大資料hadoopmapreduce 更多大資料專欄文章請點選 : –> 小馬哥大資料專欄博文導航 <–

海量資料多執行緒非同步返回結果簡單案例

技術標籤：後端java 背景兩個系統做整合，由於資源有限，需要查詢每天的生產資料做報表分析。由於基礎資料有其他用途且演算法保密的情況下，沒辦法在資料方系統做完資料分析直接拿報表結果。所以必須要把基礎資

pytest 3個簡單案例

第一個 \'\'\'第一種實現\'\'\' class TestMy: def fun1(self): self.d = u2.connect() # 捕獲連線異常

Java native 關鍵字簡單案例（Mac）

java native關鍵字：允許開發者通過呼叫c/c++的程式滿足自己的開發需求。在java中宣告一個native方法，但不通過java實現，而是用c/c++實現這個方法。

HashMap的簡單案例

在記憶體中的學生資訊管理系統一、增刪改查方法類 package com.csoracle.練習; import java.util.HashMap;

SpringBoot+Redis 整合驗證碼的簡單案例

一次學習過程中簡單的記錄一、下載安裝Redis 這裡就不多說了，下載安裝好Redis，最好是把Redis Desktop Manager一起安裝了，視覺化看的舒服一點。

MapReduce簡單案例

MapReduce簡單案例

案例一 檔案合併和去重操作

案例二 實現對輸入檔案的排序

案例三 對給定的表格進行資訊挖掘

相關推薦

案例一檔案合併和去重操作

案例二實現對輸入檔案的排序

案例三對給定的表格進行資訊挖掘