Hbase與Mapreduce整合的案例

阿新 • • 發佈：2019-01-21

【需求】將info列簇中的name這一列匯入到另外一張表中去

建表：
create 'test:stu_info','info','degree','work'
插入資料：6個rowkey 3個列簇
put 'test:stu_info','20170222_10001','degree:xueli','benke'
put 'test:stu_info','20170222_10001','info:age','18'
put 'test:stu_info','20170222_10001','info:sex','male'
put 'test:stu_info','20170222_10001','info:name','tom'
put 'test:stu_info','20170222_10001','work:job','bigdata'
put 'test:stu_info','20170222_10002','degree:xueli','gaozhong'
put 'test:stu_info','20170222_10002','info:age','22'
put 'test:stu_info','20170222_10002','info:sex','female'
put 'test:stu_info','20170222_10002','info:name','jack'
put 'test:stu_info','20170222_10003','info:age','22'
put 'test:stu_info','20170222_10003','info:name','leo'
put 'test:stu_info','20170222_10004','info:age','18'
put 'test:stu_info','20170222_10004','info:name','peter'
put 'test:stu_info','20170222_10005','info:age','19'
put 'test:stu_info','20170222_10005','info:name','jim'
put 'test:stu_info','20170222_10006','info:age','20'
put 'test:stu_info','20170222_10006','info:name','zhangsan'

create 't5' , {NAME=>'info'}

一個region就是一個maptask任務
在hadoop中的hadoop-env.sh檔案中新增相關的jar，進行整合依賴
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/moduels/hbase-0.98.6-hadoop2/lib/*

JAVA程式碼如下：

package com.bigdata.hadoop.mapreduce;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class TestDriver2 extends Configured implements Tool{

	public int run(String[] args) throws Exception {
		Configuration conf = this.getConf();
		Job job=Job.getInstance(conf,"mr-hbase2");
		job.setJarByClass(TestDriver2.class);     // class that contains mapper and reducer
		Scan scan = new Scan();
		// set other scan attrs
		TableMapReduceUtil.initTableMapperJob(
		  "test:stu_info",        // input table
		  scan,               // Scan instance to control CF and attribute selection
		  TestHbaseMap.class,     // mapper class
		  ImmutableBytesWritable.class,         // mapper output key
		  Put.class,  // mapper output value
		  job);
		TableMapReduceUtil.initTableReducerJob(
				  "test:info_name",        // output table
				  null,    // reducer class
				  job);
		job.setNumReduceTasks(1); 
		return job.waitForCompletion(true)? 0:1;
	}

	
	public static void main(String[] args) {

		Configuration conf=HBaseConfiguration.create();
		try {
			int status=ToolRunner.run(conf, new TestDriver2(), args);
			System.exit(status);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	
	

}

package com.bigdata.hadoop.mapreduce;

import java.io.IOException;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;

public class TestHbaseMap extends TableMapper<ImmutableBytesWritable, Put>{
	
	@Override
	protected void map(ImmutableBytesWritable key, Result value,Context context)
			throws IOException, InterruptedException {
		Put put=new Put(key.get());
		for(Cell cell:value.rawCells()){
			if("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))){
				//匹配info列簇的資料
				if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
					//匹配name列這列的資料
					put.add(cell);
				}
			}
		}
		
		context.write(key, put);
	}
}

打成jar包 mr-hbase.jar上傳linux

hbase目錄下執行如下程式碼
/opt/moduels/hadoop-2.5.0/bin/yarn jar /opt/datas/mr-hbase.jar

20170222_10001 column=info:name, timestamp=1497059738675, value=tom
20170222_10002 column=info:name, timestamp=1497059738956, value=jack
20170222_10003 column=info:name, timestamp=1497059739013, value=leo
20170222_10004 column=info:name, timestamp=1497059739121, value=peter
20170222_10005 column=info:name, timestamp=1497059739254, value=jim
20170222_10006 column=info:name, timestamp=1497059740585, value=zhangsan

importtsv格式化匯入
Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>
-》選項：-D表示指明某一個引數，key=value

-》將檔案上傳到HDFS

/opt/moduels/hadoop-2.5.0/bin/yarn jar lib/hbase-server-0.98.6-hadoop2.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:sex stu_info /test.tsv

-》如果不是預設的\t，就要在語句中指定輸入的分隔符
/opt/moduels/hadoop-2.5.0/bin/yarn jar lib/hbase-server-0.98.6-hadoop2.jar importtsv -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:sex stu_info /test2.csv

第一步：轉換Hfile ->其實就是storefile
/opt/moduels/hadoop-2.5.0/bin/yarn jar lib/hbase-server-0.98.6-hadoop2.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:sex -Dimporttsv.bulk.output=/testHfile stu_info /test3.tsv
第二步：匯入hbase 這一步不是 mapreduce程式把storefile檔案移動到 hbase對應表的目錄下
官網事例：/opt/moduels/hadoop-2.5.0/bin/yarn jar lib/hbase-server-0.98.6-hadoop2.jar completebulkload
usage: completebulkload /path/to/hfileoutputformat-output tablename
completebulkload

/opt/moduels/hadoop-2.5.0/bin/yarn jar lib/hbase-server-0.98.6-hadoop2.jar completebulkload /testHfile stu_info

注：利用Sqoop可以實現將資料從關係型資料庫匯入到Hbase中

Hbase與Mapreduce整合的案例

Hbase與Mapreduce整合的案例

HBase與MapReduce整合操作

HBase權威指南學習記錄（五、hbase與MapReduce整合）

HBase與MapReduce整合2-Hdfs2HBase

HBase新版本與MapReduce整合

Flume與Kafka整合案例詳解

hbase與flume整合程式設計

hbase與mapreduce同時執行的問題

HBase與Hive的整合案例二

HBase與Hive的整合案例一

基於LAMP php7.1搭建owncloud雲盤與ceph對象存儲S3借口整合案例

spring boot與jdbcTemplate的整合案例2

HBase與Sqoop的整合

HBase-與Hive的區別、與Sqoop的整合

HBase建表高階屬性，hbase應用案例看行鍵設計，HBase和mapreduce結合，從Hbase中讀取資料、分析，寫入hdfs，從hdfs中讀取資料寫入Hbase，協處理器和二級索引

hbase與solr的架構整合

【Spark深入學習 -12】Spark程序設計與企業級應用案例02

Mybatis中Mapper代理形式開發與spring整合

Spring與Mybatis整合

Atitit.angular.js 使用最佳實踐原理與常見問題解決與列表顯示案例 attilax總結

Hbase與Mapreduce整合的案例

相關推薦