1. 程式人生 > >Hive 與 Hbase的整合(四)

Hive 與 Hbase的整合(四)

Hive與Hbase的整合功能的實現是利用兩者本身對外的API介面互相進行通訊,相互通訊主要是依靠hive_hbase-handler.jar工具類

一、將hbase 下相關的jar包拷貝到/home/centosm/hive/lib資料夾下面,如果已存在不同版本的則刪除hive中的再複製上去。
具體操作步驟如下:

1、備份hive下的lib包:
zip -r lib.zip lib

2、將hbase相關jar包複製到hive/lib中

$ more hbase-c
hbase-client-1.2.4.jar        hbase-common-1.2.4.jar        hbase-common-1.2
.4-tests.jar $ cp hbase-c* ../../hive/lib $ cp hbase-server-1.2.4.jar ../../hive/lib $ cp hbase-protocol-1.2.4.jar ../../hive/lib

注:如果存在多版本jar包,則移除hive/lib下不同版本的jar

3、修改 hive-site.xml

<property>
     <name>hive.aux.jars.path</name>
     <value>file:////home/centosm/hive/lib/hive-hbase-handler-2.1.0.jar,file:///
/home/centosm/hive/lib/hbase-client-1.2.4.jar,file:////home/centosm/hive/lib/hbase-common-1.2.4.ja,file:////home/centosm/hive/lib/hbase-server-1.2.4.jar,file:////home/centosm/hive/lib/hbase-protocol-1.2.4.jar,file:////home/centosm/hive/lib/zookeeper-3.4.6.jar</value>
</property>

4、拷貝hbase/conf下的hbase-site.xml檔案到所有hadoop節點(包括master)的hadoop/conf下。

配置完成,下面是進行測試:

hbase(main):030:0> create 'student1',{NAME => 'info',VERSIONS => 1}
0 row(s) in 11.3110 seconds

=> Hbase::Table - user1
hbase(main):031:0> desc 'user1'
Table user1 is ENABLED                                                                                                                                                                 
user1                                                                                                                                                                                  
COLUMN FAMILIES DESCRIPTION                                                                                                                                                            
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                               
1 row(s) in 0.1620 seconds

hbase(main):032:0> put 'user1','1','info:name','zhangsan'
0 row(s) in 2.2950 seconds

hbase(main):033:0> put 'user1','1','info:age','25'
0 row(s) in 0.0420 seconds

hbase(main):034:0> put 'user1','2','info:name','lisi'
0 row(s) in 0.0400 seconds

hbase(main):035:0> put 'user1','2','info:age','22'
0 row(s) in 0.0820 seconds

hbase(main):036:0> put 'user1','3','info:name','wangswu'
0 row(s) in 0.0120 seconds

hbase(main):037:0> put 'user1','3','info:age','21'
0 row(s) in 0.0400 seconds

hbase(main):038:0> scan 'user1'
ROW                                            COLUMN+CELL                                                                                                                             
 1                                             column=info:age, timestamp=1497973586021, value=25                                                                                      
 1                                             column=info:name, timestamp=1497973585291, value=zhangsan                                                                               
 2                                             column=info:age, timestamp=1497973586131, value=22                                                                                      
 2                                             column=info:name, timestamp=1497973586080, value=lisi                                                                                   
 3                                             column=info:age, timestamp=1497973589317, value=21                                                                                      
 3                                             column=info:name, timestamp=1497973586233, value=wangswu                                                                                
3 row(s) in 0.0640 seconds

hbase(main):039:0> 
hive> 
    > CREATE EXTERNAL TABLE user1 (
    > rowkey string,
    > info map<STRING,STRING>
    > ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:")
    > TBLPROPERTIES ("hbase.table.name" = "user1");

OK
Time taken: 50.083 seconds
hive> show tables;
OK
test
user1
Time taken: 0.28 seconds, Fetched: 2 row(s)
hive> select * from user1;
OK
1   {"age":"25","name":"zhangsan"}
2   {"age":"22","name":"lisi"}
3   {"age":"21","name":"wangswu"}
Time taken: 6.746 seconds, Fetched: 3 row(s)
hive> 

通過 Hive查詢Hbase中現存的表
1、檢視hbase中該表的結構

hbase(main):035:0> desc 'student'
Table student is ENABLED                                                                                                                                                                                         
student                                                                                                                                                                                                          
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                      
{NAME => 'course', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCK
CACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                                                 
{NAME => 'grade', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKC
ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                                                  
2 row(s) in 5.9550 seconds

hbase(main):036:0>  scan 'student'
ROW                                                   COLUMN+CELL                                                                                                                                                
 hehe                                                 column=course:English, timestamp=1498057899732, value=89                                                                                                   
 rowKey0                                              column=grade:sid, timestamp=1498057899855, value=1108040                                                                                                   
 rowKey1                                              column=grade:sid, timestamp=1498057899855, value=1108041                                                                                                   
 rowKey2                                              column=grade:sid, timestamp=1498057899855, value=1108042                                                                                                   
 rowKey3                                              column=grade:sid, timestamp=1498057899855, value=1108043                                                                                                   
 rowKey4                                              column=grade:sid, timestamp=1498057899855, value=1108044                                                                                                   
 rowKey5                                              column=grade:sid, timestamp=1498057899855, value=1108045                                                                                                   
 rowKey6                                              column=grade:sid, timestamp=1498057899855, value=1108046                                                                                                   
 rowKey7                                              column=grade:sid, timestamp=1498057899855, value=1108047                                                                                                   
 rowKey8                                              column=grade:sid, timestamp=1498057899855, value=1108048                                                                                                   
 rowKey9                                              column=grade:sid, timestamp=1498057899855, value=1108049                                                                                                   
 ycb                                                  column=course:English, timestamp=1498057899602, value=88                                                                                                   
12 row(s) in 4.2640 seconds

如上所述hbase的student表,根據上述可設計在hive中的建表語句以及其查詢結果如下:


hive> CREATE EXTERNAL TABLE student(key string, English string,sid string)     
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'     
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,course:English,grade:sid")     
    > TBLPROPERTIES("hbase.table.name" = "student"); 
OK
Time taken: 3.733 seconds
==================================================
hive> desc student;
OK
key                     string                                      
english                 string                                      
sid                     string                                      
Time taken: 0.086 seconds, Fetched: 3 row(s)
===================================================
hive> select * from student;
OK
hehe    89      NULL
rowKey0 NULL    1108040
rowKey1 NULL    1108041
rowKey2 NULL    1108042
rowKey3 NULL    1108043
rowKey4 NULL    1108044
rowKey5 NULL    1108045
rowKey6 NULL    1108046
rowKey7 NULL    1108047
rowKey8 NULL    1108048
rowKey9 NULL    1108049
ycb     88      NULL
Time taken: 0.893 seconds, Fetched: 12 row(s)
hive>