MapReduce單表關聯

阿新 • • 發佈：2018-11-08

資料：
找出孩子的爺爺奶奶姥姥老爺

child parent
Tom Lucy
Tom Jack
Jone Lucy
Jone Jack
Lucy Marry
Lucy Jesse
Jack Alice
Jack Jesse
Terry Alice
Terry Jesse
Philip Terry
Philip Alma
Mark Terry
Mark Alma

結果：

Jone    Alice
Tom    Alice
Jone    Jesse
Tom    Jesse
Jone    Marry
Tom    Marry
Jone    Jesse
Tom    Jesse
Mark    Alice
Philip    Alice
Mark    Jesse
Philip    Jesse

Mapper：

一個坑：每次放入context.write（）的時候都需要重新new 一個Text出來。不可以用原來的Text.set()方法

package _SingleTable;


import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * @Author:Dapeng
 * @Discription:
 * @Date:Created in 上午 10:11 2018/11/8 0008
  
*/
public class SingleTableMap extends Mapper<LongWritable,Text,Text,Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {


        String line = value.toString();
        String[] wordArr = line.split("\\s+");
         
if(!"child".equals(wordArr[0])){
            //設定parents
            context.write(new Text(wordArr[0]),new Text("1:" + wordArr[1]));
            //設定son
            context.write(new Text(wordArr[1]),new Text("2:" + wordArr[0]));
        }

    }
}

Reducer

package _SingleTable;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

/**
 * @Author:Dapeng
 * @Discription:
 * @Date:Created in 上午 10:11 2018/11/8 0008
 */
public class SingleTableReduce extends Reducer<Text,Text,Text,Text> {
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {


        List<String> parents = new ArrayList<String>();
        List<String> childs = new ArrayList<String>();
        Text t1 = new Text();
        Text t2 = new Text();

        for(Text t : values){
            String str = t.toString();
            String[] s = str.split(":");

            if ("1".equals(s[0])) {
                parents.add(s[1]);
            } else if("2".equals(s[0])) {
                childs.add(s[1]);
            }

        }

       for(String p :parents){
            for(String c:childs){
                t1.set(p);
                t2.set(c);
                context.write(t2,t1);
            }
       }
    }
}

package _SingleTable;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * @Author:Dapeng
 * @Discription:
 * @Date:Created in 上午 10:11 2018/11/8 0008
 */
public class SingleTableMain {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //0.建立一個job
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf,"single_table");
        job.setJarByClass(SingleTableMain.class);
        //1.輸入檔案
        //預設用TextInputFormat
        FileInputFormat.addInputPath(job,new Path("file:/D:/hadoopFile/singleTable/data.txt"));
        //2.編寫mapper
        job.setMapperClass(SingleTableMap.class);
        //設定輸出的格式
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        //3.shuffle

        //4.reduce
        job.setReducerClass(SingleTableReduce.class);
        //設定輸出的格式
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        //5.輸出
        FileOutputFormat.setOutputPath(job,new Path("file:/D:/hadoopFile/singleTable/out"));

        //6.執行

        boolean result = job.waitForCompletion(true);
        System.out.println(result);
    }
}

MapReduce單表關聯

資料：找出孩子的爺爺奶奶姥姥老爺 child parentTom LucyTom JackJone LucyJone JackLucy MarryLucy JesseJack AliceJack JesseTerry AliceTerry JessePhilip TerryPhilip AlmaMar

MarReduce小練習 - 單表關聯（使用識別符號）

題意：根據兒子、父母關係，輸出孫子、爺奶關係：輸入： &nbs

MarReduce小練習 - 單表關聯（使用靜態變數）

題意：根據兒子、父母關係，輸出孫子、爺奶關係：輸入： &nbs

Hadoop MapReduce多表關聯查詢-案例

案例：將工廠名和所在地點輸出原始資料為factory.txt工廠庫和address.txt地址庫【factory.txt】如下： factoryname addressId Beijing Red Star

資料庫單表查詢，list分頁關聯展示

應用背景一般我們在開發資料量不是很大的中小型企業系統來說，直接使用SQL關聯，多表聯合查詢就可以了，因為這樣在專案開發過程中非常的高效。但是一旦在遇到大資料量的背景前提下，原始的關聯查詢方式逐漸的顯現出了越來越多的弊端。我們引入海爾電商技術

mysql單表自關聯查詢

好久沒寫sql了，這個系統居然要求是全棧型別。從後到前都要做（我的js都忘光了），先記錄下一個sql這塊吧。一需求：現在有個表crm，裡面存放著：id,recommended_id,其中recommended_id屬於id範圍，就是推薦人的id。現在的需求是，查詢全部

反射 + 註解實現動態匯入功能，單表匯入與有關聯的外檢表，以及物件內表關聯

最近專案中有5個匯入模組，不想複製貼上，加上最近對註解和反射有點想用的的衝動，寫了個粗略的動態匯入： PS:以下內容過長，容易引起舒適度不爽，請做好心理準備一、需求分析： 0、匯入的資料列頭是中文，所以需要用反射 + 註解進行對應 1、基礎欄位，如user 的 name 、age;

不使用left-join等多表關聯查詢，只用單表查詢和Java程式，簡便實現“多表查詢”效果

上次我們提到，不使用left-loin關聯查詢，可能是為了提高效率或者配置快取，也可以簡化一下sql語句的編寫。只寫單表查詢，sql真得太簡單了。問題是，查詢多個表的資料還是非常需要的。因此，存在這麼一個強烈的需求：查詢2個集合，怎麼合併2個集合中的資料為1個集合，且資料關聯要正確。

Mapreduce之多表關聯Join---（附例子)

需求： address.txt: 1 Beijing 2 Guangzhou 3 Shenzhen 4 Xian factory.txt: Beijing Red Star 1 Shenzhen Thunder 3 Gu

hadoop實現單表和多表關聯

補充一個單錶鏈接的例子： ublic class Single { private static class SingleMapper extends Mapper<LongWritable, Text, Text,

關於Rack()自增長的一則雙表關聯更新

oracle 排序 rank A表(tb_abc):AB1aa02002bb03003cc05004dd18005ee22006ff3300B表(tb_abcc):AB1aa(0201)2aa(0202)3bb(0301)4bb(0302)5bb(0303)6cc(0501)括號裏是預期值規則:

Hibernate單表映射學習筆記之一——hibernalnate開發環境配置

pass ransac over 構造方法參數會話 signed rate ets 　　1、什麽是ORM？　　Object/Relationship Mapping：對象/關系映射　　2、寫SQL語句不好之處：　　（1）不同數據庫使用的SQL語法不同（PL/

ETL增量單表同步簡述_根據timestamp增量

font 表同步增量 pri 表設計 tro cluster add log ETL增量單表同步簡述 1. 實現需求當原數據庫的表有新增、更新、刪除操作時，將改動數據同步到目標庫對應的數據表。 2. 設計思路設計總體流程圖如下：步驟簡單說明： 1、設置job的執

ETL增量單表同步簡述_根據dateTime增量

通過要求 ima cnblogs arch job eat blog 必備 ETL增量單表同步簡述 1. 實現需求當原數據庫的表有新增、更新、刪除操作時，將改動數據同步到目標庫對應的數據表。 2. 設計思路設計總體流程圖如下：步驟簡單說明： 1、設置job的執行屬

ETL全量單表同步簡述

etl enter 1.8 family unity kettle mage 更新 nbsp ETL全量單表同步簡述 1. 實現需求當原數據庫的表有新增、更新、刪除操作時，將改動數據同步到目標庫對應的數據表。 2. 設計思路設計總體流程圖如下：註意點： 1、數據庫

hibernate_05_單表操作_對象類型

exception doctype @override 1.0 服務註冊 dial 照片 dia [] 本篇使用hibernate輸出一個對象（圖片）先寫一個java類 1 package com.imooc.hibernate; 2 3 import java

優先使用單表查詢，而非聯合查詢

gda 優先下大雨 crud 架構關聯查詢鏈式操作 column 一個優先使用單表查詢，而非聯合查詢發表於2016/7/4 17:49:09 1866人閱讀分類：研發架構一、小雷的見解 1.編碼規範 CRUD，命名規範，可以通用。比如類名、方

多表關聯查詢

多表關聯多表關聯查詢 mage .com bsp src logs log images 　　多表關聯查詢

mysql中單表多timestamp設置default問題

arch normal 顯式 spa width update 成功 reat _id mysql中，同一個表多個timesatmp字段設置default的時候，經常會報錯。一個表只能有一個設置default的字段。但是有時只有一個字段設置default也會報錯。會報：In

MySql單表最大8000W+ 之數據庫遇瓶頸記

代碼 redis緩存 12g 接下來 win matrix omap 復雜攻擊前言昨晚救火到兩三點，早上七點多醒來，朦朧中醒來發現電腦還開著，趕緊爬起來看昨晚執行的SQL命令結果。由於昨晚升級了阿裏雲的RDS，等了將近兩個小時還在升降級中，早上阿裏雲那邊回復升級

MapReduce單表關聯

相關推薦