hbase(一)
為什麼有hbase?
隨著資料量越來越大,傳統的關係型資料庫不能滿足儲存需求,hive雖然能滿足儲存,但是不能滿足非結構化或者半結構化的資料儲存和高效查詢。
HBASE是什麼?
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables – billions of rows X millions of columns – atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. HBASE是一個開源的、分散式的、多版本的(資料可以保留多個版本)、可擴充套件的非關係型資料庫。 HBASE是bigtable的開源java版本。是建立在hdfs之上,提供高可靠性、高效能、列式儲存、可伸縮、實時讀寫的nosql資料庫。
RDBMS:mysql,sqlserver,oracle,db2,access,excel等 NoSQL:HBASE、MongoDB、Redis、memcache等
適用場景: 需要處理海量的非結構化的資料進行儲存,需要隨機的近實時的讀寫資料
HBASE和hadoop的關係 HBASE是基於hadoop、儲存依賴於hdfs
hbase的架構
client,zookeeper,hmaster hregionserver,hlog,hregion,store,memstore,sorefile,hfile
client: hbase的客戶端,包含訪問HBASE的介面(linux shell 、java api) client維護著一些cache來加快對HBASE的訪問,比如region的位置資訊
zookeeper 監控master的狀態,保證有且僅有一個active的master,達到高可用 儲存所有的region的定址入口—root表在哪臺伺服器上 實時監控hregionserver的狀態,將regionserver的上下線資訊實時的通知給master 儲存HBASE的所有表資訊(HBASE的schma),包括表名、列簇(column family)
hmaster(hbase的老大) 為regionserver分配region(新建HBASE表等) 負責regionserver的負載均衡 負責hregion的重新分配(regionserver異常、hregion變大時的一分為二) hdfs上的垃圾檔案回收 處理schema的更新請求
hregionserver:(HBASE的小弟) regionserver維護master分配給它的region(管理region) 處理client對region的Io請求,並和hdfs進行互動 regionserver負責切分在執行過程中變大的region
hlog: 對HBASE的操作進行記錄,使用wal寫資料,優先寫入hlog裡面,然後寫到memstore中,以防止資料丟失是可以進行回滾。
hregion: HBASE中分散式儲存和負載均衡的最小單元,表或者表的一小部分
store: 相當於一個列簇
memstore: 記憶體緩衝區,用於進行批量重新整理資料到hdfs上
hstorefile: hbase中的資料以hfile的形式儲存到hdfs中
各元件之間的數量關係:
hmaster:hregionserver=1:n hregionserver:hlog=1:1 hregionserver:hregion=1:n hregion:store=1:n store:memstore=1:1 store:storefile=1:n storefile:hfile=1:1
HBASE的特點:###
模式:無模式 資料型別:單一,只支援byte[] 多版本:每個值可以儲存多個版本 列式儲存:每個列簇的資料儲存到一個檔案裡 稀疏儲存:如果key-value為null時,整個的資料不會佔用儲存空間
HBASE的關鍵字 rowkey:行鍵(相當於mysql 的主鍵,不允許重複、有順序) column family:列簇(列的集合) column:列 timestamp:時間戳(顯示當前時間) version:版本號 cell:單元格
排序 1、在rowkey上有序,按照字典順序正序排列 2、在列簇上有序,按照字典順序進行排列 3、在列上有序,按照字典順序進行排列
HBASE的安裝
1、Standalone hbase (1)解壓並配置環境變數
tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local
vi /etc/profile
export HBASE_HOME=/usr/local/hbase-1.2.1
export PATH=$PATH:$HBASE_HOME/bin:
source /etc/profile
(2)配置hbase的引數
cd conf
vi hbase-env.sh
JAVA_HOME=/usr/local/jdk1.8.0_181
vi hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hbasedata</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zkdata</value>
</property>
測試 hbase version
啟動 bin/start-hbase.sh
連線客戶端 hbase shell
2、Pseudo-Distributed(略) 配置檔案中設定:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
3、Advanced - Fully Distributed
(1)解壓並配置環境變數
tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local
rm -rf docs
vi /etc/profile
export HBASE_HOME=/usr/local/hbase-1.2.1
export PATH=$PATH:$HBASE_HOME/bin:
source /etc/profile
(2)配置hbase的引數
cd ./conf
vi hbase-env.sh
exportJAVA_HOME=/usr/local/jdk1.8.0_181
export HBASE_MANAGES_ZK=false
注意:這裡jdk如果為JDK8+,下面兩句註釋掉
# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
vi conf/regionservers //設定regionserver機器,hadoop01,hadoop02,hadoop03
vi backup-masters //備份master機器,hadoop02,hadoop03
vi hbase-site.xml
<!--配置hbase在hdfs上的根目錄-->
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop01:9000/hbase</value>
</property>
<!--開啟hbase的分散式叢集開關-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!--配置zk叢集的地址-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<!--配置zk叢集的資料儲存位置-->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zkdata</value>
</property>
<!-- 指定hbase的監控頁面埠 -->
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
(3)注意:如果hdfs是高可用叢集,則需要將hdfs-site.xml和core-site.xml兩個檔案copy到hbase的conf目錄下,不是則忽略
[[email protected] hadoop]# cp hdfs-site.xml core-site.xml $HBASE_HOME/conf
(4)分發hbase到其他兩臺機器
scp -r hbase-1.2.1 [email protected]:$PWD
scp -r hbase-1.2.1 [email protected]:$PWD
(5)啟動hbase叢集 hbase依賴於zookeeper、hdfs,先啟動zk,再啟動hdfs,最後啟動hbase
zkServer.sh start
zkServer.sh status
start-dfs.sh
start-hbase.sh
檢視程序:
[[email protected] conf]# jps
56903 HRegionServer
55960 QuorumPeerMain
62328 Jps
56760 HMaster
56186 NameNode
56333 DataNode
59309 Main
web監控埠:
60010
hmaster:16010 hregionserver:16030 內部通訊埠:16020
注意:
時間同步
HBASE的shell命令
連線客戶端 hbase shell
可以通過help學習hbase shell的使用:
help
help 'COMMAND'
help 'COMMAND_GROUP'
hbase(main):004:0> help
HBase Shell, version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: list_quotas, set_quota
Group name: security
Commands: grant, list_security_capabilities, revoke, user_permission
Group name: procedures
Commands: abort_procedure, list_procedures
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
檢視:
list
list_namespace
namespace:名稱空間、名稱空間或者組的概念,相當於庫(但沒有庫的概念)
hbase(main):002:0> list_namespace
NAMESPACE
default
hbase
2 row(s) in 0.1020 seconds
hbase有預設的兩個namespace: default hbase
檢視使用方法: help ‘namespace’
Command: create_namespace
Create namespace; pass namespace name,
and optionally a dictionary of namespace configuration.
Examples:
hbase> create_namespace 'ns1'
hbase> create_namespace 'ns1', {'PROPERTY_NAME'=>'PROPERTY_VALUE'}
Command: describe_namespace
Describe the named namespace. For example:
hbase> describe_namespace 'ns1'
Command: drop_namespace
Drop the named namespace. The namespace must be empty.
Command: list_namespace
List all namespaces in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
hbase> list_namespace
hbase> list_namespace 'abc.*'
Command: list_namespace_tables
List all tables that are members of the namespace.
Examples:
hbase> list_namespace_tables 'ns1'
操作一下:
hbase(main):008:0> create_namespace 'ns1'
0 row(s) in 0.0810 seconds
hbase(main):011:0> list_namespace
NAMESPACE
default
hbase
ns1
3 row(s) in 0.0390 seconds
hbase(main):012:0> alter_namespace 'ns1',{METHOD => 'set','NAME' => 'gaoyuanyuan'}
0 row(s) in 0.0900 seconds
hbase(main):013:0> describe_namespace 'ns1'
DESCRIPTION
{NAME => 'ns1', NAME => 'gaoyuanyuan'}
1 row(s) in 0.0040 seconds
hbase(main):014:0> alter_namespace 'ns1',{METHOD => 'set','NAME' => 'gaoyuan'}
0 row(s) in 0.0340 seconds
hbase(main):015:0> describe_namespace 'ns1'
DESCRIPTION
{NAME => 'ns1', NAME => 'gaoyuan'}
1 row(s) in 0.0030 seconds
hbase(main):016:0> alter_namespace 'ns1',{METHOD => 'unset',NAME =>'NAME'}
0 row(s) in 0.0310 seconds
hbase(main):017:0> describe_namespace 'ns1'
DESCRIPTION
{NAME => 'ns1'}
1 row(s) in 0.0110 seconds
hbase(main):018:0> drop_namespace 'ns1'
0 row(s) in 0.0540 seconds
hbase(main):019:0> list_namespace
NAMESPACE
default
hbase
2 row(s) in 0.0310 seconds
create_namespace ‘ns1’ //建立 list_namespace //檢視 list_namespace_tables ‘ns1’ //檢視空間中的表 alter_namespace ‘ns1’, {METHOD => ‘set’, ‘NAME’ => ‘GAOYUANYUAN’} //新增/修改屬性: alter_namespace ‘ns1’, {METHOD => ‘unset’, NAME => ‘NAME’}//刪除屬性 drop_namespace ‘ns1’ ###不能強制刪除
DDL
Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters
建立表: create ‘ns1:t1’, {NAME => ‘f1’, VERSIONS => 5},{NAME => ‘f2’, VERSIONS => 3} create ‘ns1:t2’, ‘f1’, SPLITS => [‘10’, ‘20’, ‘30’, ‘40’]
hbase(main):024:0> create_namespace 'ns1'
0 row(s) in 0.0530 seconds
hbase(main):025:0> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5},{NAME => 'f2', VERSIONS => 3}
0 row(s) in 4.3940 seconds
=> Hbase::Table - ns1:t1
hbase(main):026:0> list_namespace_tables 'ns1'
TABLE
t1
1 row(s) in 0.0200 seconds
hbase(main):027:0> create 'ns1:t2','f1',SPLITS => ['10','20','30','40']//分為5個region
0 row(s) in 2.2900 seconds
=> Hbase::Table - ns1:t2
hbase(main):028:0> list_namespace_tables 'ns1'
TABLE
t1
t2
2 row(s) in 0.0280 seconds
通過網頁也可以檢視到。
修改表:(有就更新,沒有則新增) alter ‘ns1:t1’,‘f1’,{NAME => ‘f2’, VERSIONS => 3,BLOOMFILTER => ‘ROWCOL’,IN_MEMORY => ‘true’},{NAME => ‘f3’, VERSIONS => 6,BLOOMFILTER => ‘ROWCOL’,TTL => 246060}
hbase(main):029:0> alter 'ns1:t1','f1',{NAME => 'f2', VERSIONS => 3,BLOOMFILTER => 'ROWCOL',IN_MEMORY => 'true'},{NAME => 'f3', VERSIONS => 6,BLOOMFILTER => 'ROWCOL',TTL => 24*60*60}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 7.0220 seconds
刪除列簇:
alter 'ns1:t1', NAME => 'f1', METHOD => 'delete'
查看錶定義
describe ‘ns1:t1’
刪除表:
disable 'ns1:t1'
drop 'ns1:t1'
DML
Group name: dml Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
插入資料:(不能一次性插入多列)
表名 rowkey行鍵 列名 列值
put 'ns1:t1','rk0001','f1:name','zhangsan'
put 'ns1:t1','rk0001','f1:age','18'
put 'ns1:t1','rk0001','f1:sex','1'
put 'ns1:t1','rk0002','f1:name','gaoyuanyuan'
put 'ns1:t1','rk0002','f1:age','18'
put 'ns1:t1','rk0002','f1:sex','2'
put 'ns1:t1','rk0003','f1:name','jiajingwen'
put 'ns1:t1','rk0003','f1:age','18'
put 'ns1:t1','rk0003','f1:sex','2'
put 'ns1:t1','rk0001111','f1:name','canglaoshi'
put 'ns1:t1','rk0001111','f1:age','18'
put 'ns1:t1','rk0001111','f1:sex','1'
put 'ns1:t1','rk0001','f2:addr','beijing'
put 'ns1:t1','rk0001','f1:size','123'
更新資料
put 'ns1:t1','rk0001','f1:name','zs1'
掃描資料:
scan 'ns1:t1'
scan 'ns1:t1',{COLUMNS => 'f1:name'}
scan 'ns1:t1',{COLUMNS => ['f1:name','f2:addr']}
scan 'ns1:t1', {RAW => true, VERSIONS => 10}
scan 'ns1:t1', {COLUMNS => 'f1:name', TIMERANGE => [1539173350832,1539173421219],VERSIONS => 3} ###包頭不包尾
查詢資料:GET
get 'ns1:t1','rk0001'
刪除資料:DELETE
delete 'ns1:t1','rk0001','f1:age'
deleteall 'ns1:t1','rk0001'
注:incr只能對long型的列進行自增操作