使用mapreduce 將hdfs中的資料匯入到到hbase 中

阿新 • • 發佈：2019-01-03

package hbase;

import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

public class BatchImport {
    static class BatchImportMapper extends Mapper<LongWritable, Text, LongWritable, Text>{
        SimpleDateFormat dateformat1=new SimpleDateFormat("yyyyMMddHHmmss");
        Text v2 = new Text();
        
        protected void map(LongWritable key, Text value, Context context) throws java.io.IOException ,InterruptedException {
            final String[] splited = value.toString().split("\t");
            try {
                final Date date = new Date(Long.parseLong(splited[0].trim()));
                final String dateFormat = dateformat1.format(date);
                String rowKey = splited[1]+":"+dateFormat;
                v2.set(rowKey+"\t"+value.toString());
                context.write(key, v2);
            } catch (NumberFormatException e) {
                final Counter counter = context.getCounter("BatchImport", "ErrorFormat");
                counter.increment(1L);
                System.out.println("出錯了"+splited[0]+" "+e.getMessage());
            }
        };
    }
    
    static class BatchImportReducer extends TableReducer<LongWritable, Text, NullWritable>{
        protected void reduce(LongWritable key, java.lang.Iterable<Text> values,     Context context) throws java.io.IOException ,InterruptedException {
            for (Text text : values) {
                final String[] splited = text.toString().split("\t");
                
                final Put put = new Put(Bytes.toBytes(splited[0]));
                put.add(Bytes.toBytes("cf"), Bytes.toBytes("date"), Bytes.toBytes(splited[1]));
                put.add(Bytes.toBytes("cf"), Bytes.toBytes("msisdn"), Bytes.toBytes(splited[2]));
                //省略其他欄位，呼叫put.add(....)即可
                context.write(NullWritable.get(), put);
            }
        };
    }
    
    public static void main(String[] args) throws Exception {
        final Configuration configuration = new Configuration();
        //設定zookeeper
        configuration.set("hbase.zookeeper.quorum", "hadoop0");
        //設定hbase表名稱
        configuration.set(TableOutputFormat.OUTPUT_TABLE, "wlan_log");
        //將該值改大，防止hbase超時退出
        configuration.set("dfs.socket.timeout", "180000");
        
        final Job job = new Job(configuration, "HBaseBatchImport");
        
        job.setMapperClass(BatchImportMapper.class);
        job.setReducerClass(BatchImportReducer.class);
        //設定map的輸出，不設定reduce的輸出型別
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(Text.class);
        
        job.setInputFormatClass(TextInputFormat.class);
        //不再設定輸出路徑，而是設定輸出格式型別
        job.setOutputFormatClass(TableOutputFormat.class);
        
        FileInputFormat.setInputPaths(job, "hdfs://hadoop0:9000/input");
        
        job.waitForCompletion(true);
    }
}

MapReduce將HDFS文字資料匯入HBase中

HBase本身提供了很多種資料匯入的方式，通常有兩種常用方式：使用HBase提供的TableOutputFormat，原理是通過一個Mapreduce作業將資料匯入HBase 另一種方式就是使用HBase原生Client API 本文就是示範如何通過M

將sqlserver的資料匯入hbase中

將sqlserver的資料匯入hbase中 1.解壓sqoop-sqlserver-1.0.tar.gz，並改名（可以不改） tar -zxvf sqoop- sql

如何將不同型別資料匯入Elaticsearch中？

題記 Elaticsearch的原理明白了以後，手頭有很多不同型別的資料，如: 1）單條資料，如程式中自己構造的JSON格式資料； 2）符合Elasticsearch索引規範的批量資料； 3）日誌檔案，格式*.log; 4）結構化資料，儲存在mysql

33.如何將不同型別資料匯入Elaticsearch中(ES同步小結)

題記Elaticsearch的原理明白了以後，手頭有很多不同型別的資料，如: 1）單條資料，如程式中自己構造的JSON格式資料； 2）符合Elasticsearch索引規範的批量資料； 3）日誌檔案，格式*.log; 4）結構化資料，儲存在mysql、oracle等關係型資料

hive over hbase方式將文字庫資料匯入hbase

1，建立hbase表Corpus >> create 'Corpus','CF' 2，建立hive->hbase外表logic_Corpus,並對應hbase中的Corpus表 >> CREATE EXTERNAL TABLE logic_Co

使用mapreduce 將hdfs中的資料匯入到到hbase 中

package hbase; import java.text.SimpleDateFormat; import java.util.Date; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase

MapReduce 中如何處理HBase中的資料？如何讀取HBase資料給Map？如何將結果儲存到HBase中？

MapReduce 中如何處理HBase中的資料？如何讀取HBase資料給Map？如何將結果儲存到HBase中？ Mapper類：包括一個內部類(Context)和四個方法(setup,map,cleanup,run)； &n

通過sqoop將MySQL資料庫中的資料匯入Hbase

從接觸到大資料到成功的實現一個功能期間走了不少彎路也踩了不少坑，這裡作為我的學習筆記也可以作為小白們的前車之鑑，少走彎路，有不正確之處，望指出環境準備： hadoop、hbase、sqoop、mys

利用sqoop將hive資料匯入Oracle中（踩的坑）

教程很多，這裡只說踩過的坑 1.下載sqoop時，還得下一個bin的包，拿到sqoop-1.4.6.jar 的包，放到hadoop的lib目錄下 2.匯入oracle，執行程式碼時，使用者名稱和表名必須大寫！且資料庫建表時表名必須大寫！示例程式碼： sqoop expo

mysql匯入資料load data infile用法(將txt檔案中的資料匯入表中)

我們常常匯入資料！mysql有一個高效匯入方法，那就是load data infile 下面來看案例說明基本語法： load data [low_priority] [local] infile 'file_name txt' [replace | ignor

Java將Excel表格中資料匯入至資料庫中的表中

上一節介紹了Java將資料庫表中資料匯出至Excel表格，那麼本節來介紹它的逆過程，也就是將Excel表格中的資料逐行匯入資料庫中的表中，依然需要使用Apache的POI，上一節已經說過也附了這個jar包的下載地址，這一節就不過多的說，直接講如

flume將kafka中topic資料匯入hive中

一、首先更加資料的表結構在hive中進行表的建立。 create table AREA1(unid string,area_punid string,area_no string,area_name s

將MySQL中資料匯入到MongoDB中

第一步：將user表從MySQL中匯出，右鍵，點選匯出嚮導，選擇格式為xlsx。第二步：匯出完成後，雙擊開啟user.xlsx，將user.xlsx另存為csv格式的檔案。（切記不可直接修改後綴名，會導致亂碼，無法匯入到MongoDB中，血的教訓）第三步：

oracle通過load data 將資料匯入表中通過儲存過程進行批量處理

說明:雖然沒圖，但文字表述很清楚，自己做過的專案留著備用（這只是初版，比較繁瑣，但很明確）準備工作做完之後，後期可直接使用。如後期excel資料有變更，只需改動對應的部分即可，不涉及改動的可直接使用。實際操作步驟依照excel資料模版格式準備好建表語句，將中間過渡

SparkSql將資料來源Hive中資料匯入MySql例項

背景：能看到這篇部落格的夥計兒，應該是充分理解了[理想是豐滿的現實是骨感] 這句名言了吧。為啥子這麼說呢，那就是不就是個SparkSql從hive匯入到mysql嗎有什麼技術含量，但是呢不斷地踩坑ing填坑ing。廢話不多說，直接上硬菜。 package co

利用sqoop將hive資料匯入Oracle中

首先：如oracle則執行sqoop list-databases --connect jdbc:oracle:thin:@//192.168.27.235:1521/ORCL --username DATACENTER -P 來測試是否能正確連線資料庫　如mysql則執行sq

Hive 實戰練習（一）—按照日期將每天的資料匯入Hive表中

需求：每天會產生很多的日誌檔案資料，有這麼一種需求：需要將每天產生的日誌資料在晚上12點鐘過後定時執行操作，匯入到Hive表中供第二天資料分析使用。要求建立分割槽表，並按照日期分割槽。資料檔案命名是以當天日期命名的，如2015-01-09.txt一、建立分割

將模板word中的特定欄位替換（將資料匯入word中）

一、將模板word中的特定欄位替換（將資料匯入word中）所用jar包一、將模板word中的特定欄位替換（將資料匯入word中）所用jar包開發程式碼 /** * @Title createContract * @description 生成合

將Excel的資料匯入SqlServer的表中

記錄一下最近從Excel匯入大量資料到SqlServer表中的步驟。在將Excel資料準備好以後。 1、右鍵SQL Server中需要匯入資料的庫名，選擇【任務】—【匯入資料】如圖： 2、彈

用sqoop將mysql的資料匯入到hive表中，原理分析

Sqoop 將 Mysql 的資料匯入到 Hive 中準備Mysql 資料如圖所示，準備一張表，資料隨便造一些，當然我這裡的資料很簡單。編寫命令編寫引數檔案個人習慣問題，我喜歡把引數寫到檔案裡，然後再命令列引用。 vim mysql-info， #

使用mapreduce 將hdfs中的資料匯入到到hbase 中

相關推薦