Hive 與 Hbase的整合(四)
阿新 • • 發佈:2019-01-29
Hive與Hbase的整合功能的實現是利用兩者本身對外的API介面互相進行通訊,相互通訊主要是依靠hive_hbase-handler.jar工具類
一、將hbase 下相關的jar包拷貝到/home/centosm/hive/lib資料夾下面,如果已存在不同版本的則刪除hive中的再複製上去。
具體操作步驟如下:
1、備份hive下的lib包:
zip -r lib.zip lib
2、將hbase相關jar包複製到hive/lib中
$ more hbase-c
hbase-client-1.2.4.jar hbase-common-1.2.4.jar hbase-common-1.2 .4-tests.jar
$ cp hbase-c* ../../hive/lib
$ cp hbase-server-1.2.4.jar ../../hive/lib
$ cp hbase-protocol-1.2.4.jar ../../hive/lib
注:如果存在多版本jar包,則移除hive/lib下不同版本的jar
3、修改 hive-site.xml
<property>
<name>hive.aux.jars.path</name>
<value>file:////home/centosm/hive/lib/hive-hbase-handler-2.1.0.jar,file:/// /home/centosm/hive/lib/hbase-client-1.2.4.jar,file:////home/centosm/hive/lib/hbase-common-1.2.4.ja,file:////home/centosm/hive/lib/hbase-server-1.2.4.jar,file:////home/centosm/hive/lib/hbase-protocol-1.2.4.jar,file:////home/centosm/hive/lib/zookeeper-3.4.6.jar</value>
</property>
4、拷貝hbase/conf下的hbase-site.xml檔案到所有hadoop節點(包括master)的hadoop/conf下。
配置完成,下面是進行測試:
hbase(main):030:0> create 'student1',{NAME => 'info',VERSIONS => 1}
0 row(s) in 11.3110 seconds
=> Hbase::Table - user1
hbase(main):031:0> desc 'user1'
Table user1 is ENABLED
user1
COLUMN FAMILIES DESCRIPTION
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.1620 seconds
hbase(main):032:0> put 'user1','1','info:name','zhangsan'
0 row(s) in 2.2950 seconds
hbase(main):033:0> put 'user1','1','info:age','25'
0 row(s) in 0.0420 seconds
hbase(main):034:0> put 'user1','2','info:name','lisi'
0 row(s) in 0.0400 seconds
hbase(main):035:0> put 'user1','2','info:age','22'
0 row(s) in 0.0820 seconds
hbase(main):036:0> put 'user1','3','info:name','wangswu'
0 row(s) in 0.0120 seconds
hbase(main):037:0> put 'user1','3','info:age','21'
0 row(s) in 0.0400 seconds
hbase(main):038:0> scan 'user1'
ROW COLUMN+CELL
1 column=info:age, timestamp=1497973586021, value=25
1 column=info:name, timestamp=1497973585291, value=zhangsan
2 column=info:age, timestamp=1497973586131, value=22
2 column=info:name, timestamp=1497973586080, value=lisi
3 column=info:age, timestamp=1497973589317, value=21
3 column=info:name, timestamp=1497973586233, value=wangswu
3 row(s) in 0.0640 seconds
hbase(main):039:0>
hive>
> CREATE EXTERNAL TABLE user1 (
> rowkey string,
> info map<STRING,STRING>
> ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:")
> TBLPROPERTIES ("hbase.table.name" = "user1");
OK
Time taken: 50.083 seconds
hive> show tables;
OK
test
user1
Time taken: 0.28 seconds, Fetched: 2 row(s)
hive> select * from user1;
OK
1 {"age":"25","name":"zhangsan"}
2 {"age":"22","name":"lisi"}
3 {"age":"21","name":"wangswu"}
Time taken: 6.746 seconds, Fetched: 3 row(s)
hive>
通過 Hive查詢Hbase中現存的表
1、檢視hbase中該表的結構
hbase(main):035:0> desc 'student'
Table student is ENABLED
student
COLUMN FAMILIES DESCRIPTION
{NAME => 'course', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCK
CACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'grade', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKC
ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 5.9550 seconds
hbase(main):036:0> scan 'student'
ROW COLUMN+CELL
hehe column=course:English, timestamp=1498057899732, value=89
rowKey0 column=grade:sid, timestamp=1498057899855, value=1108040
rowKey1 column=grade:sid, timestamp=1498057899855, value=1108041
rowKey2 column=grade:sid, timestamp=1498057899855, value=1108042
rowKey3 column=grade:sid, timestamp=1498057899855, value=1108043
rowKey4 column=grade:sid, timestamp=1498057899855, value=1108044
rowKey5 column=grade:sid, timestamp=1498057899855, value=1108045
rowKey6 column=grade:sid, timestamp=1498057899855, value=1108046
rowKey7 column=grade:sid, timestamp=1498057899855, value=1108047
rowKey8 column=grade:sid, timestamp=1498057899855, value=1108048
rowKey9 column=grade:sid, timestamp=1498057899855, value=1108049
ycb column=course:English, timestamp=1498057899602, value=88
12 row(s) in 4.2640 seconds
如上所述hbase的student表,根據上述可設計在hive中的建表語句以及其查詢結果如下:
hive> CREATE EXTERNAL TABLE student(key string, English string,sid string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,course:English,grade:sid")
> TBLPROPERTIES("hbase.table.name" = "student");
OK
Time taken: 3.733 seconds
==================================================
hive> desc student;
OK
key string
english string
sid string
Time taken: 0.086 seconds, Fetched: 3 row(s)
===================================================
hive> select * from student;
OK
hehe 89 NULL
rowKey0 NULL 1108040
rowKey1 NULL 1108041
rowKey2 NULL 1108042
rowKey3 NULL 1108043
rowKey4 NULL 1108044
rowKey5 NULL 1108045
rowKey6 NULL 1108046
rowKey7 NULL 1108047
rowKey8 NULL 1108048
rowKey9 NULL 1108049
ycb 88 NULL
Time taken: 0.893 seconds, Fetched: 12 row(s)
hive>