1. 程式人生 > >Hive連線Hbase操作資料

Hive連線Hbase操作資料

Hive連線Hbase操作資料

 

版權宣告:本文為博主原創文章,未經博主允許不得轉載。轉載請註明來自http://blog.csdn.net/lr131425 https://blog.csdn.net/lr131425/article/details/72722932

Hive整合HBase原理

Hive是基於Hadoop的一個數據倉庫工具,可以將結構化的資料檔案對映為一張資料庫表,並提供完整的sql查詢功能,可以將sql語句轉換為MapReduce任務進行執行。 其優點是學習成本低,可以通過類SQL語句快速實現簡單的MapReduce統計,不必開發專門的MapReduce應用,十分適合資料倉庫的統計分析。

Hive與HBase整合的實現是利用兩者本身對外的API介面互相進行通訊,相互通訊主要是依靠Hive安裝包lib/hive-hbase-handler.jar工具類,它負責Hbase和Hive進行通訊的。

 

hadoop,hbase,hive都已經叢集正常安裝。

hadoop,hbase,hive都已正常啟動。

 

命令列模式連線連線hbase   

[[email protected] bin]# ./hbase shell

list 看下table

 

 
  1. hbase(main):006:0* list

  2. TABLE

  3. test

  4. user

  5. 2 row(s) in 0.4750 seconds

看下錶結構

 

 
  1. hbase(main):007:0> describe 'user'

  2. Table user is ENABLED

  3. user

  4. COLUMN FAMILIES DESCRIPTION

  5. {NAME => 'account', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEE

  6. P_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

  7. {NAME => 'address', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEE

  8. P_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

  9. {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_D

  10. ELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

  11. {NAME => 'userid', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP

  12. _DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

  13. 4 row(s) in 0.7020 seconds

  14.  

 

然後掃描user表資料看看。

 

 
  1. hbase(main):004:0> scan 'user'

  2. ROW COLUMN+CELL

  3. lisi column=account:name, timestamp=1495708477345, value=lisi

  4. lisi column=account:passport, timestamp=1495708477353, value=96857123123231

  5. lisi column=account:password, timestamp=1495708477349, value=654321

  6. lisi column=address:city, timestamp=1495708477381, value=\xE6\xB7\xB1\xE5\x9C\xB3

  7. lisi column=address:province, timestamp=1495708477377, value=\xE5\xB9\xBF\xE4\xB8\x9C

  8. lisi column=info:age, timestamp=1495708477358, value=38

  9. lisi column=info:sex, timestamp=1495708477363, value=\xE5\xA5\xB3

  10. lisi column=userid:id, timestamp=1495708477330, value=002

  11. zhangsan column=account:name, timestamp=1495708405658, value=zhangsan

  12. zhangsan column=account:passport, timestamp=1495708405699, value=968574321

  13. zhangsan column=account:password, timestamp=1495708405669, value=123456

  14. zhangsan column=address:city, timestamp=1495708405773, value=\xE6\xB7\xB1\xE5\x9C\xB3

  15. zhangsan column=address:province, timestamp=1495708405764, value=\xE5\xB9\xBF\xE4\xB8\x9C

  16. zhangsan column=info:age, timestamp=1495708405712, value=26

  17. zhangsan column=info:sex, timestamp=1495708405755, value=\xE7\x94\xB7

  18. zhangsan column=userid:id, timestamp=1495708405444, value=001

  19. 2 row(s) in 0.2020 seconds

 

在hive/bin中執行hive命令列模式

[[email protected] bin]# ./hive

執行建立關聯hbase關聯語句hbase_user表

會發現有報錯:return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations

 

 
  1. hive> CREATE EXTERNAL TABLE hbase_user(key string, idcard string,passport string,country string,name string,password string,

  2. > province string,city string,age string,sex string ,id string)

  3. > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

  4. > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,account:idcard,account:passport,account:country,account:name,account:password,

  5. > address:province,address:city,info:age,info:sex,userid:id")

  6. > TBLPROPERTIES("hbase.table.name" = "user");

  7. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations

  8. at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:312)

  9. at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)

  10. at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)

  11. at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)

  12. at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)

  13. at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)

  14. at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)

  15. at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)

  16. at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:811)

  17. at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)

  18. at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)

  19. at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:303)

  20. at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:313)

  21. at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:205)

  22. at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:742)

  23.  

can not get location 以及下面的日誌 應該是連不上hbase,用jps檢視下hbase的執行情況

 

 
  1. [[email protected] conf]# jps

  2. 3945 HMaster

  3. 18681 RunJar

  4. 2699 NameNode

  5. 3330 NodeManager

  6. 2951 SecondaryNameNode

  7. 3226 ResourceManager

  8. 3874 HQuorumPeer

  9. 18901 Jps

發現一切正常

接著檢視下hive日誌發現: 都是zookeeper連線失敗 

 Opening socket connection to server localhost/127.0.0.1:2181

 

 
  1. 2017-05-25T03:06:12,259 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

  2. 2017-05-25T03:06:12,260 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

  3. java.net.ConnectException: Connection refused

  4. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

  5. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)

  6. at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

  7. at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

  8. 2017-05-25T03:06:13,362 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

  9. 2017-05-25T03:06:13,363 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

  10. java.net.ConnectException: Connection refused

  11. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

  12. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)

  13. at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

  14. at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

  15. 2017-05-25T03:06:13,465 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

  16. 2017-05-25T03:06:13,466 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

  17. java.net.ConnectException: Connection refused

  18. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

  19. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)

  20. at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

  21. at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

  22. 2017-05-25T03:06:14,568 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

  23. 2017-05-25T03:06:14,569 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

  24.  

去看下hive-site.xml的配置檔案,配置上

 

 
  1. <property>

  2. <name>hive.zookeeper.quorum</name>

  3. <value>master,node1,node2</value>

  4. <description>

  5. List of ZooKeeper servers to talk to. This is needed for:

  6. 1. Read/write locks - when hive.lock.manager is set to

  7. org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,

  8. 2. When HiveServer2 supports service discovery via Zookeeper.

  9. 3. For delegation token storage if zookeeper store is used, if

  10. hive.cluster.delegation.token.store.zookeeper.connectString is not set

  11. 4. LLAP daemon registry service

  12. </description>

  13. </property>

 
  1. <property>

  2. <name>hive.zookeeper.client.port</name>

  3. <value>2181</value>

  4. <description>

  5. The port of ZooKeeper servers to talk to.

  6. If the list of Zookeeper servers specified in hive.zookeeper.quorum

  7. does not contain port numbers, this value is used.

  8. </description>

  9. </property>


而 hbase叢集中的用zookeeper的預設的埠是2222,所以為了埠統一,     把hbase-site.xml中的改成2181,記得重啟服務

或者直接  把hbase-site.xml 複製到hive的conf目錄下,hive會讀取hbase的zookeeper的 zookeeper.quorum和 zookeeper.port

兩種方法都可以解決問題


接著在hive中再次執行 create table語句

 

 
  1. hive> CREATE EXTERNAL TABLE hbase_user(key string, idcard string,passport string,country string,name string,password string,

  2. > province string,city string,age string,sex string ,id string)

  3. > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

  4. > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,account:idcard,account:passport,account:country,account:name,account:password,

  5. > address:province,address:city,info:age,info:sex,userid:id")

  6. > TBLPROPERTIES("hbase.table.name" = "user");

  7. OK

  8. Time taken: 20.323 seconds

  9. hive> show tables;

  10. OK

  11. apachelog

  12. hbase_user

  13. Time taken: 2.75 seconds, Fetched: 2 row(s)


執行成功,

接著用hiveql查詢下資料

 

 
  1. hive> select * from hbase_user;

  2. OK

  3. lisi NULL 96857123123231 NULL lisi 654321 廣東 深圳 38 女 002

  4. zhangsan NULL 968574321 NULL zhangsan 123456 廣東 深圳 26 男 001

  5. Time taken: 5.798 seconds, Fetched: 2 row(s)

  6. hive> describe hbase_user;

  7. OK

  8. key string

  9. idcard string

  10. passport string

  11. country string

  12. name string

  13. password string

  14. province string

  15. city string

  16. age string

  17. sex string

  18. id string

  19. Time taken: 3.785 seconds, Fetched: 11 row(s)

  20. hive> select key ,idcard,password,country,name, passport,province,city,age,sex,id from hbase_user;

  21. OK

  22. lisi NULL 654321 NULL lisi 96857123123231 廣東 深圳 38 女 002

  23. zhangsan NULL 123456 china zhangsan 968574321 廣東 深圳 26 男 001

  24. Time taken: 2.341 seconds, Fetched: 2 row(s)

..null 是因為 hbase的column沒有設定idcard欄位值,和 country的值所以是為null

給hbase 表 user設定country看看,和idcard

./hbase shell

 

 
  1. hbase(main):003:0> put 'user','zhangsan','account:idcard','420923156366998855';

  2. hbase(main):004:0* put 'user','lisi','account:idcard','520369856366998855';

  3. hbase(main):005:0* put 'user','lisi','account:country','china';

 

 
  1. hive> select key ,idcard,password,country,name, passport,province,city,age,sex,id from hbase_user;

  2. OK

  3. lisi 520369856366998855 654321 china lisi 96857123123231 廣東 深圳 38 女 002

  4. zhangsan 420923156366998855 123456 china zhangsan 968574321 廣東 深圳 26 男 001

  5. Time taken: 2.388 seconds, Fetched: 2 row(s)

  6. hive> select * from hbase_user where name='zhangsan';

  7. OK

  8. zhangsan 420923156366998855 968574321 china zhangsan 123456 廣東 深圳 26 男 001

  9. Time taken: 2.651 seconds, Fetched: 1 row(s)

  10. hive> select count(key) from hbase_user;

  11. WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

  12. Query ID = root_20170525040249_f808c765-79f6-43c0-aa94-ebfed7751091

  13. Total jobs = 1

  14. Launching Job 1 out of 1

  15. Number of reduce tasks determined at compile time: 1

  16. In order to change the average load for a reducer (in bytes):

  17. set hive.exec.reducers.bytes.per.reducer=<number>

  18. In order to limit the maximum number of reducers:

  19. set hive.exec.reducers.max=<number>

  20. In order to set a constant number of reducers:

  21. set mapreduce.job.reduces=<number>

  22. Starting Job = job_1495621107567_0001, Tracking URL = http://master:8088/proxy/application_1495621107567_0001/

  23. Kill Command = /usr/tools/hadoop/bin/hadoop job -kill job_1495621107567_0001

  24. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

  25. FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. PermGen space

 


執行count的時候,執行mapreducer,PermGen space了。。。。  也是醉了

window7機器上了運行了虛擬機器,3個linux組成的  hadoop,hbase,hive叢集, 8g記憶體  記憶體使用率走橫線,..... 

實際上有時是可以count出來資料的,估計windows開了不少程序