1. 程式人生 > >hbase(一)

hbase(一)

為什麼有hbase?

隨著資料量越來越大,傳統的關係型資料庫不能滿足儲存需求,hive雖然能滿足儲存,但是不能滿足非結構化或者半結構化的資料儲存和高效查詢。

HBASE是什麼?

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.

Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables – billions of rows X millions of columns – atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. HBASE是一個開源的、分散式的、多版本的(資料可以保留多個版本)、可擴充套件的非關係型資料庫。 HBASE是bigtable的開源java版本。是建立在hdfs之上,提供高可靠性、高效能、列式儲存、可伸縮、實時讀寫的nosql資料庫。

RDBMS:mysql,sqlserver,oracle,db2,access,excel等 NoSQL:HBASE、MongoDB、Redis、memcache等

適用場景: 需要處理海量的非結構化的資料進行儲存,需要隨機的近實時的讀寫資料

HBASE和hadoop的關係 HBASE是基於hadoop、儲存依賴於hdfs

hbase的架構

client,zookeeper,hmaster hregionserver,hlog,hregion,store,memstore,sorefile,hfile

client: hbase的客戶端,包含訪問HBASE的介面(linux shell 、java api) client維護著一些cache來加快對HBASE的訪問,比如region的位置資訊

zookeeper 監控master的狀態,保證有且僅有一個active的master,達到高可用 儲存所有的region的定址入口—root表在哪臺伺服器上 實時監控hregionserver的狀態,將regionserver的上下線資訊實時的通知給master 儲存HBASE的所有表資訊(HBASE的schma),包括表名、列簇(column family)

hmaster(hbase的老大) 為regionserver分配region(新建HBASE表等) 負責regionserver的負載均衡 負責hregion的重新分配(regionserver異常、hregion變大時的一分為二) hdfs上的垃圾檔案回收 處理schema的更新請求

hregionserver:(HBASE的小弟) regionserver維護master分配給它的region(管理region) 處理client對region的Io請求,並和hdfs進行互動 regionserver負責切分在執行過程中變大的region

hlog: 對HBASE的操作進行記錄,使用wal寫資料,優先寫入hlog裡面,然後寫到memstore中,以防止資料丟失是可以進行回滾。

hregion: HBASE中分散式儲存和負載均衡的最小單元,表或者表的一小部分

store: 相當於一個列簇

memstore: 記憶體緩衝區,用於進行批量重新整理資料到hdfs上

hstorefile: hbase中的資料以hfile的形式儲存到hdfs中

各元件之間的數量關係:

hmaster:hregionserver=1:n hregionserver:hlog=1:1 hregionserver:hregion=1:n hregion:store=1:n store:memstore=1:1 store:storefile=1:n storefile:hfile=1:1

HBASE的特點:###

模式:無模式 資料型別:單一,只支援byte[] 多版本:每個值可以儲存多個版本 列式儲存:每個列簇的資料儲存到一個檔案裡 稀疏儲存:如果key-value為null時,整個的資料不會佔用儲存空間

HBASE的關鍵字 rowkey:行鍵(相當於mysql 的主鍵,不允許重複、有順序) column family:列簇(列的集合) column:列 timestamp:時間戳(顯示當前時間) version:版本號 cell:單元格

排序 1、在rowkey上有序,按照字典順序正序排列 2、在列簇上有序,按照字典順序進行排列 3、在列上有序,按照字典順序進行排列

HBASE的安裝

1、Standalone hbase (1)解壓並配置環境變數

tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local

vi /etc/profile

export HBASE_HOME=/usr/local/hbase-1.2.1  
export PATH=$PATH:$HBASE_HOME/bin:  

source /etc/profile

(2)配置hbase的引數

cd  conf
vi hbase-env.sh
JAVA_HOME=/usr/local/jdk1.8.0_181


vi hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hbasedata</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zkdata</value>
</property>

測試 hbase version

啟動 bin/start-hbase.sh

連線客戶端 hbase shell

2、Pseudo-Distributed(略) 配置檔案中設定:

<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

3、Advanced - Fully Distributed

(1)解壓並配置環境變數

tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local
rm -rf docs
vi /etc/profile

export HBASE_HOME=/usr/local/hbase-1.2.1
export PATH=$PATH:$HBASE_HOME/bin:

source /etc/profile

(2)配置hbase的引數

cd ./conf
vi hbase-env.sh
exportJAVA_HOME=/usr/local/jdk1.8.0_181
export HBASE_MANAGES_ZK=false

注意:這裡jdk如果為JDK8+,下面兩句註釋掉

# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"

vi conf/regionservers //設定regionserver機器,hadoop01,hadoop02,hadoop03
vi backup-masters //備份master機器,hadoop02,hadoop03
vi hbase-site.xml

<!--配置hbase在hdfs上的根目錄-->
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop01:9000/hbase</value>
</property>

<!--開啟hbase的分散式叢集開關-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
	
<!--配置zk叢集的地址-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
    
<!--配置zk叢集的資料儲存位置-->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zkdata</value>
</property>	
	
<!-- 指定hbase的監控頁面埠 -->
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>

(3)注意:如果hdfs是高可用叢集,則需要將hdfs-site.xml和core-site.xml兩個檔案copy到hbase的conf目錄下,不是則忽略

[[email protected] hadoop]# cp hdfs-site.xml core-site.xml $HBASE_HOME/conf

(4)分發hbase到其他兩臺機器

scp -r hbase-1.2.1 [email protected]:$PWD
scp -r hbase-1.2.1 [email protected]:$PWD

(5)啟動hbase叢集 hbase依賴於zookeeper、hdfs,先啟動zk,再啟動hdfs,最後啟動hbase

zkServer.sh start
zkServer.sh status
start-dfs.sh
start-hbase.sh

檢視程序:

[[email protected] conf]# jps
56903 HRegionServer
55960 QuorumPeerMain
62328 Jps
56760 HMaster
56186 NameNode
56333 DataNode
59309 Main

web監控埠:
60010

hmaster:16010 hregionserver:16030 內部通訊埠:16020

注意:
時間同步

HBASE的shell命令

連線客戶端 hbase shell

可以通過help學習hbase shell的使用:

help
help 'COMMAND'
help 'COMMAND_GROUP'

hbase(main):004:0> help
HBase Shell, version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
  Group name: general
  Commands: status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump

  Group name: replication
  Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs

  Group name: snapshots
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot

  Group name: configuration
  Commands: update_all_config, update_config

  Group name: quotas
  Commands: list_quotas, set_quota

  Group name: security
  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures
  Commands: abort_procedure, list_procedures

  Group name: visibility labels
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html

檢視:

list
list_namespace

namespace:名稱空間、名稱空間或者組的概念,相當於庫(但沒有庫的概念)

hbase(main):002:0> list_namespace
NAMESPACE                                                                                                      
default                                                                                                        
hbase                                                                                                          
2 row(s) in 0.1020 seconds

hbase有預設的兩個namespace: default hbase

檢視使用方法: help ‘namespace’

Command: create_namespace
Create namespace; pass namespace name,
and optionally a dictionary of namespace configuration.
Examples:

  hbase> create_namespace 'ns1'
  hbase> create_namespace 'ns1', {'PROPERTY_NAME'=>'PROPERTY_VALUE'}

Command: describe_namespace
Describe the named namespace. For example:
  hbase> describe_namespace 'ns1'

Command: drop_namespace
Drop the named namespace. The namespace must be empty.

Command: list_namespace
List all namespaces in hbase. Optional regular expression parameter could
be used to filter the output. Examples:

  hbase> list_namespace
  hbase> list_namespace 'abc.*'

Command: list_namespace_tables
List all tables that are members of the namespace.
Examples:

  hbase> list_namespace_tables 'ns1'

操作一下:

hbase(main):008:0> create_namespace 'ns1'
0 row(s) in 0.0810 seconds

hbase(main):011:0> list_namespace
NAMESPACE                                                                                                      
default                                                                                                        
hbase                                                                                                          
ns1                                                                                                            
3 row(s) in 0.0390 seconds

hbase(main):012:0> alter_namespace 'ns1',{METHOD => 'set','NAME' => 'gaoyuanyuan'}
0 row(s) in 0.0900 seconds

hbase(main):013:0> describe_namespace 'ns1'
DESCRIPTION                                                                                                    
{NAME => 'ns1', NAME => 'gaoyuanyuan'}                                                                         
1 row(s) in 0.0040 seconds

hbase(main):014:0> alter_namespace 'ns1',{METHOD => 'set','NAME' => 'gaoyuan'}
0 row(s) in 0.0340 seconds

hbase(main):015:0> describe_namespace 'ns1'
DESCRIPTION                                                                                                    
{NAME => 'ns1', NAME => 'gaoyuan'}                                                                             
1 row(s) in 0.0030 seconds

hbase(main):016:0> alter_namespace 'ns1',{METHOD => 'unset',NAME =>'NAME'}
0 row(s) in 0.0310 seconds

hbase(main):017:0> describe_namespace 'ns1'
DESCRIPTION                                                                                                    
{NAME => 'ns1'}                                                                                                
1 row(s) in 0.0110 seconds

hbase(main):018:0> drop_namespace 'ns1'
0 row(s) in 0.0540 seconds

hbase(main):019:0> list_namespace
NAMESPACE                                                                                                      
default                                                                                                        
hbase                                                                                                          
2 row(s) in 0.0310 seconds

create_namespace ‘ns1’ //建立 list_namespace //檢視 list_namespace_tables ‘ns1’ //檢視空間中的表 alter_namespace ‘ns1’, {METHOD => ‘set’, ‘NAME’ => ‘GAOYUANYUAN’} //新增/修改屬性: alter_namespace ‘ns1’, {METHOD => ‘unset’, NAME => ‘NAME’}//刪除屬性 drop_namespace ‘ns1’ ###不能強制刪除

DDL

Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters

建立表: create ‘ns1:t1’, {NAME => ‘f1’, VERSIONS => 5},{NAME => ‘f2’, VERSIONS => 3} create ‘ns1:t2’, ‘f1’, SPLITS => [‘10’, ‘20’, ‘30’, ‘40’]

hbase(main):024:0> create_namespace 'ns1'
0 row(s) in 0.0530 seconds

hbase(main):025:0> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5},{NAME => 'f2', VERSIONS => 3}
0 row(s) in 4.3940 seconds

=> Hbase::Table - ns1:t1
hbase(main):026:0> list_namespace_tables 'ns1'
TABLE                                                                                                          
t1                                                                                                             
1 row(s) in 0.0200 seconds

hbase(main):027:0> create 'ns1:t2','f1',SPLITS => ['10','20','30','40']//分為5個region
0 row(s) in 2.2900 seconds

=> Hbase::Table - ns1:t2
hbase(main):028:0> list_namespace_tables 'ns1'
TABLE                                                                                                          
t1                                                                                                             
t2                                                                                                             
2 row(s) in 0.0280 seconds

通過網頁也可以檢視到。

修改表:(有就更新,沒有則新增) alter ‘ns1:t1’,‘f1’,{NAME => ‘f2’, VERSIONS => 3,BLOOMFILTER => ‘ROWCOL’,IN_MEMORY => ‘true’},{NAME => ‘f3’, VERSIONS => 6,BLOOMFILTER => ‘ROWCOL’,TTL => 246060}

hbase(main):029:0> alter 'ns1:t1','f1',{NAME => 'f2', VERSIONS => 3,BLOOMFILTER => 'ROWCOL',IN_MEMORY => 'true'},{NAME => 'f3', VERSIONS => 6,BLOOMFILTER => 'ROWCOL',TTL => 24*60*60}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 7.0220 seconds

刪除列簇:

alter 'ns1:t1', NAME => 'f1', METHOD => 'delete'

查看錶定義

describe ‘ns1:t1’

刪除表:

disable 'ns1:t1'
drop 'ns1:t1'

DML

Group name: dml Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

插入資料:(不能一次性插入多列)

		表名	  rowkey行鍵		列名  列值
put 'ns1:t1','rk0001','f1:name','zhangsan'
put 'ns1:t1','rk0001','f1:age','18'
put 'ns1:t1','rk0001','f1:sex','1'

put 'ns1:t1','rk0002','f1:name','gaoyuanyuan'
put 'ns1:t1','rk0002','f1:age','18'
put 'ns1:t1','rk0002','f1:sex','2'

put 'ns1:t1','rk0003','f1:name','jiajingwen'
put 'ns1:t1','rk0003','f1:age','18'
put 'ns1:t1','rk0003','f1:sex','2'

put 'ns1:t1','rk0001111','f1:name','canglaoshi'
put 'ns1:t1','rk0001111','f1:age','18'
put 'ns1:t1','rk0001111','f1:sex','1'

put 'ns1:t1','rk0001','f2:addr','beijing'
put 'ns1:t1','rk0001','f1:size','123'

更新資料

put 'ns1:t1','rk0001','f1:name','zs1'

掃描資料:

scan 'ns1:t1'
scan 'ns1:t1',{COLUMNS => 'f1:name'}
scan 'ns1:t1',{COLUMNS => ['f1:name','f2:addr']}
scan 'ns1:t1', {RAW => true, VERSIONS => 10}

scan 'ns1:t1', {COLUMNS => 'f1:name', TIMERANGE => [1539173350832,1539173421219],VERSIONS => 3}   ###包頭不包尾

查詢資料:GET

get 'ns1:t1','rk0001'

刪除資料:DELETE

	delete 'ns1:t1','rk0001','f1:age'
	deleteall 'ns1:t1','rk0001'

注:incr只能對long型的列進行自增操作