從hdfs中插入資料到hbase中

阿新 • • 發佈：2019-01-14

package mr.hdfstoHbase;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2;
import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * 從hdfs中插入資料到hbase中，批量匯入
 * 從hdfs中map階段獲取相關資料，儲存為hfile格式
 * 然後在hfile插入hbase
 */
public class HdfsToHbaseBulk {
    public static class HdfsToHbaseBulkMapper extends
            Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
        private ImmutableBytesWritable mapKey = new ImmutableBytesWritable();

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] split = value.toString().split(",");
            mapKey.set(Bytes.toBytes(split[0]));
            Put put = new Put(Bytes.toBytes(split[0]));
            put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes(split[1]));
            put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("age"), Bytes.toBytes(split[2]));
            context.write(mapKey, put);
        }
    }


    public static void main(String[] args) throws Exception {
        System.setProperty("hadoop.home.dir", "E:\\software\\bigdate\\hadoop-2.6.0-cdh5.15.0\\hadoop-2.6.0-cdh5.15.0");
        //在除錯階段 可以在window下本地執行

        //和hdfs 連線
        Configuration conf = new Configuration();

        //hdfs入口
        conf.set("fs.defaultFS", "hdfs://wang:9000");
        //和hbase連線
        conf.set("zookeeper.znode.parent", "/hbase");
        conf.set("hbase.zookeeper.quorum", "wang");
        conf.set("hbase.zookeeper.property.clientPort", "2181");

        Job job = Job.getInstance(conf);
        job.setJobName("HdfsToHbaseBulkJob");
        job.setJarByClass(HdfsToHbaseBulk.class);

        //設定input hdfs路徑
        Path inputPath = new Path("/user/wang/hbase_data/human.txt");
        FileInputFormat.addInputPath(job, inputPath);

        //map  叢集中執行任務  遵循 移動計算 而不移動資料
        //Mapper類
        job.setMapperClass(HdfsToHbaseBulkMapper.class);
        //key類
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        //value類
        job.setMapOutputValueClass(Put.class);

        //output
        Path outputPath = new Path("/user/wang/hbase_data/BlukOut5");
      
        Connection connection = ConnectionFactory.createConnection(conf);
        Table table = connection.getTable(TableName.valueOf(Bytes.toBytes("hadoop:human")));
        RegionLocator regionLocator = connection.getRegionLocator(TableName.valueOf(Bytes.toBytes("hadoop:human")));
        HFileOutputFormat2.configureIncrementalLoad(job,table,regionLocator);
        job.setOutputFormatClass(HFileOutputFormat2.class);
        //執行任務
        FileOutputFormat.setOutputPath(job, outputPath);
        boolean flag = job.waitForCompletion(true);
        //mr任務正常執行後，會在對應目錄下生成hfile檔案
        if(flag){
            //hbase 匯入hfile格式檔案使用LoadIncrementalHFiles
            LoadIncrementalHFiles loadIncrementalHFiles = new LoadIncrementalHFiles(conf);
            //解析hile檔案，將hfile 上傳到hbase
            loadIncrementalHFiles.doBulkLoad(outputPath,connection.getAdmin(),table,regionLocator);
        }
    }
}

Hive通過查詢語句向表中插入資料過程中發現的坑

前言最近在學習使用Hive（版本0.13.1）的過程中，發現了一些坑，它們或許是Hive提倡的比關係資料庫更加自由的體現（同時引來一些問題），或許是一些bug。總而言之，這些都需要使用Hive的開發人員額外注意。本文旨在列舉我發現的3個通過查詢語句向表中插入資料過程中的問題，

從表中插入資料到另外一張表

方法一：程式碼 1 select into 和 insert into select 兩種表複製語句 2 select * into destTbl from srcTbl 3 4 insert into destTbl(fld1, fld2) sele

通過資料庫批量向kettle中插入資料，建立trans和job的模板（按照不同的要求需要自行調整）

import psycopg2 # 用來操作資料庫的類 class GPCommand(object): # 類的初始化 def __init__(self): self.hostname = 'XXX.XX.X.XX' self.username

mybatis 在oracle資料庫中插入資料時獲取自增ID sequence序列

在oracle中sequence就是序號，每次取的時候它會自動增加。sequence與表沒有關係。 Create Sequence 首先要有CREATE SEQUENCE或者CREATE ANY SEQUENCE許可權。建立語句如下： CREATE SEQUEN

Mybatis的mapper.xml檔案中插入資料返回自增主鍵

使用MyBatis往MySQL資料庫中插入一條記錄後，返回該條記錄的自增主鍵值。Mapper檔案應該怎麼寫呢？ Mybatis的Mapper的標籤中有一個屬性，我們一起來看看： useGenerateKeys這個屬性，意思就是使用自增。我們需要將這個欄位設定為 true 。同時，還需

MapReduce 中如何處理HBase中的資料？如何讀取HBase資料給Map？如何將結果儲存到HBase中？

MapReduce 中如何處理HBase中的資料？如何讀取HBase資料給Map？如何將結果儲存到HBase中？ Mapper類：包括一個內部類(Context)和四個方法(setup,map,cleanup,run)； &n

通過資料庫批量向kettle中插入資料

import psycopg2 用來操作資料庫的類 class GPCommand(object): # 類的初始化 def init(self): self.hostname = ‘10.1.2.42’

mysql中插入資料value與values的區別

value與values區別 VALUE插入多行 VALUES插入多行對比之下，插入多行時，用VALUE比較快根據所得出的結論，應該在插入單行的時候使用VALUES，在插入多行的時候使用VALUE 吐槽一下：不得不說，這真的和他們兩個的名字相反，真奇

java向MySql資料庫中插入資料

package test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.SQLExc

用SQL語句向表格中插入資料

向表格中插入資料 SQL語言使用insert語句向資料庫表格中插入或新增新的資料行。Insert語句的使用格式如下： insert into tablename (first_column,...last_column) values (first_v

在資料(MS-SQL)表中插入資料和更新資料

insert into dbo.time -- insert data (id,name) values(3, 'herry'); update dbo.time1 set id=4 where name='coly' -- update data

關於hibernate中插入資料報read-only mode只讀模式錯誤

專案中有時候會遇到插入資料到表中去報錯只讀模式 org.springframework.dao.InvalidDataAccessApiUsageException: Write operations are not allowed in read-only mode (F

ORACLE向表中插入資料的不同方法

---恢復內容開始--- 最近開始學習ORACLE了，作為一名萌新，分享一下學習心得，有錯誤的地方歡迎批評指正。今天介紹一下向ORACLE資料庫表中插入資料的兩種方法。 1.第一種結構 INSERT INTO TABLE_NAME（COLUNMN1，COLUNMN2...） VALUES

使用JDBC向SqlServer資料庫中插入資料

在實際的開發的當中我們會發現在資料庫中插入資料是比查詢資料難的因為查詢只需要一個固定的值就可以進行查詢但是插入的話需要對照資料庫的建表因為有些鍵值不允許為空示例程式碼： package sqlserver.controller; im

以使用QSqlQuery向資料庫中插入資料為例，做一個小結

背景：最近在使用Qt+SQLite寫一個本地資料庫管理程式(使用者不懂SQL)，在寫向資料庫中插入資料的相關的函式時，我遇到了幾個問題(暫時就這些)： 1.向指定欄位插入指定資料時，讀取到的資料都是字串型別，然而不同欄位的資料型別是不同的，這裡需要獲取不同欄位的資料型別，再做型別轉換 2.使用

k8s叢集中 spark訪問hbase中資料

hbase資料分割槽是按照region進行的，分割槽的location就是各個region的location。那麼後續分配executor時可以按照region所在機器分配對應executor，直接在本機讀取資料計算。我們先來往hbase裡面寫兩個資料 h

java mybatis 關於中向資料庫中插入資料時，報錯java.lang.NullPointerException的問題

今天在實現向MySQL的資料庫insert一個Object資料時出現一個錯誤； org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.exceptio

向mysql資料庫中插入資料時顯示“Duplicate entry '1′ for key ‘PRIMARY' ”錯誤

錯誤情況如題，出現這個錯誤的原因十分簡單：很明顯，這是主鍵的問題。在一張資料表中是不能同時出現多個相同主鍵的資料的這就是錯誤的原因，解決的方法： 1.可以將這張表設定成無主鍵（mysql支

往MySQL中插入資料

（1）使用ASP.NET畫出如下介面（2）圖一的程式碼如下 using MySql.Data.MySqlClient; using System; using System.Collections.Generic; using System.Linq; using Sys

關於java中向資料庫中插入資料時，報錯Caused by: java.lang.NullPointerException的問題

今天在實現一個update資料時出現一個錯誤； org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.exceptions.PersistenceException:

從hdfs中插入資料到hbase中

相關推薦