好程式設計師大資料分享：Hbase精解

阿新 • • 發佈：2019-05-27

好程式設計師大資料分享：Hbase精解，為什麼有hbase?hbase是什麼?Hbase的架構。

　　一、為什麼有hbase?

　　資料量越來越大，傳統的關係型資料庫不能滿足儲存和查詢的需求。而hive雖然能滿足儲存的要求，但是hive不能滿足非結構化、半結構化資料的儲存和查詢。

　　二、hbase是什麼?

hbase是一個開源的、分散式的、多版本的、可擴充套件的非關係型資料庫。hbase是bigtable的開源java版本，建立在hdfs之上，提供高可靠性的、高效能、列式儲存、可伸縮、實時讀寫的nosql資料庫系統。適用的場景如：需要對海量非結構化的資料進行儲存。

　　需要隨機近實時的讀寫管理資料。

　　三、hbase的架構

client\zookeeper\hmaster\

hregionserver\hlog\hregion\memstore\storefile\hfile

client：hbase的客戶端，包含訪問hbase的介面(linux shell 、java api)

client維護一些cache來加快訪問hbase的速度，比如region的位置資訊。

zookeeper：監控hmaster的狀態，保證有些僅有一個active的hmaster，達到高可用。儲存所有region的定址入口，--root表在那臺伺服器上。實時監控hregionserver的狀態，將regionserver的上下線資訊實時通知給hmaster。儲存hbase的所有表的資訊(hbase的元資料)

hmaster：(hbase的老大)為regionserver分配region(新建表等)。負責regionserver的負載均衡。負責region的重新分配(hregionserver異常、hregion裂變)。hdfs上的垃圾檔案回收。處理schema的更新請求。

hregionserver：(hbase的小弟)hregionserver維護master分配給他的region(管理本機器上region)。處理client對這些region的IO請求，並和hdfs進行互動

region server負責切分在執行過程中變大的region。

hlog：對hbase的操作進行記錄，使用WAL寫資料，優先寫入log，然後再寫入memstore，以防資料丟死可以進行回滾。

hregion：hbase中分散式儲存和負載均衡的最小單元，表或者表的一部分。

store：相當於一個列簇。

memstore：128M記憶體緩衝區，用於將資料批量重新整理到hdfs上。

hstorefile(hfile)：hbase中的資料是以hfile的形式儲存在hdfs上。

　　各元件間的數量關係：

hmaster:hregionserver=1:n

hregionserver:hregion=1:n

hregionserver:hlog=1:1

hregion:hstore=1:n

store:memstore=1:1

store:storefile=1:n

storefile:hfile=1:1

hbase關鍵字詞：

rowkey:行鍵，和mysql的主鍵是一樣的，不允許重複，有順序。

columnfamily:列簇(列的集合)。

column:列。

timestamp:時間戳，預設顯示最新的時間戳。

version:版本號。

cell:單元格。

　　四、hbase和hadoop的關係

hbase是基於hadoop：hbase的儲存依賴於hdfs。具體說hbase的特點：

　　模式：無模式。

　　資料型別：單一 byte[]。

　　多版本：每個值都可以有多個版本。

　　列式儲存：一個列簇儲存到一個目錄。

　　稀疏儲存：如果key-value為null，則將不佔用儲存空間。

　　再說hbase的安裝：

1、standalone模式

1)解壓並配置環境變數

tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local

cd /usr/local

vi /etc/profile

source /etc/profile

2)測試hbase的安裝

hbase version

　　配置hbase的配置檔案

vi conf/hbase-env.sh

JAVA_HOME

　　注意：

# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+

export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"。

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"。

vi hbase-site.xml

hbase.rootdir

file:///usr/local/hbasedata

hbase.zookeeper.property.dataDir

/usr/local/zookeeperdata

　　啟動hbase服務：

bin/start-hbase/sh

　　啟動客戶端：

bin/hbase shell

2、偽分散式

3、全分散式

　　解壓並配置環境變數

　　配置hbase的配置檔案

vi conf/hbase-env.sh

export HBASE_MANAGES_ZK=false

vi regionservers

vi backup-masters

vi hbase-site.xml

hbase.cluster.distributed

true

hbase.rootdir

hdfs://qianfeng/hbase

hbase.zookeeper.property.dataDir

/usr/local/zookeeperdata

hbase.zookeeper.quorum

hadoop05:2181,hadoop06:2181,hadoop07:2181

　　注意：

　　如果hdfs是高可用的，要講hadoop下的core-site.xml和hdfs-site.xml copy到hbase/conf目錄下。

　　分發：

scp -r hbase-1.2.1 root@hadoop06:$PWD

scp -r hbase-1.2.1 root@hadoop07:$PWD

　　啟動：

1)啟動zk

2)啟動hdfs

3)啟動hbase

hbase叢集的時間必須同步。

hmaster：16010

hregionserver：16030

hbase的shell操作

help

help "COMMAND"

help "COMMAND_GROUP"

　　列舉出當前namespace下的所有表

list

　　建立表：

create 'test','f1', 'f2'

namespace：

hbase沒有庫的概念，但是有名稱空間或者組的概念，namespace相當於(庫)

hbase預設有兩個組：

default：

hbase：

　　列舉出所有的namespcae：

list_namespace

list_namespace_tables 'hbase'

create_namespace 'ns1'

describe_namespace 'ns1'

alter_namespace 'ns1', {METHOD => 'set', 'NAME' => 'gjz1'}

alter_namespace 'ns1', {METHOD => 'unset', NAME => 'NAME'}

drop_namespace 'ns1' ###只能刪除一個空的namespace

DDL:

Group name: ddl

Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters

　　建立表：

create 'test','f1', 'f2'

create 'ns1:t_userinfo',{NAME=>'base_info',BLOOMFILTER => 'ROWCOL',VERSIONS => '3'}

create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40'] --事先分配好region所管轄的rowkey的範圍。

　　修改表：(有則更新，無則新增)

alter 'ns1:t_userinfo',{NAME=>'extra_info',BLOOMFILTER => 'ROW',VERSIONS => '2'}

alter 'ns1:t_userinfo',{NAME=>'extra_info',BLOOMFILTER => 'ROWCOL',VERSIONS => '5'}

　　刪除列簇：

alter 'ns1:t_userinfo', NAME => 'extra_info', METHOD => 'delete'

alter 'ns1:t_userinfo', 'delete' => 'base_info'

　　刪除表：(先要禁用表)

disable 'ns1:t1'

drop 'ns1:t1'

DML:

Group name: dml

Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

　　插入資料：(不能一次性插入多列)

put 'ns1:test','u00001','cf1:name','zhangsan'

put 'ns1:t_userinfo','rk00001','base_info:name','gaoyuanyuan'

put 'ns1:t_userinfo','rk00001','extra_info:pic','picture'

　　更新資料：

put 'ns1:t_userinfo','rk00001','base_info:name','zhouzhiruo'

put 'ns1:t_userinfo','rk00002','base_info:name','zhaoming'

　　表掃描(scan)

scan 'ns1:t_userinfo'

scan 'ns1:t_userinfo',{COLUMNS => ['base_info:name','base_info:age']}

　　設定查詢條件：(包頭不包尾)

scan 'ns1:t_userinfo',{COLUMNS => ['base_info:name','base_info:age'],STARTROW=>'rk000012',LIMIT=>2}

scan 'ns1:t_userinfo',{COLUMNS => ['base_info:name','base_info:age'],STARTROW=>'rk000012',ENDROW=>'rk00002',LIMIT=>2}

　　查詢資料：(GET)

get 'ns1:t_userinfo','rk00001'

get 'ns1:t_userinfo','rk00001',{TIMERANGE=>[1534136591897,1534136667747]}

get 'ns1:t_userinfo','rk00001',{COLUMN=>['base_info:name','base_info:age'],VERSIONS =>4}

get 'ns1:t_userinfo','rk00001',{TIMESTAMP=>1534136580800}

　　刪除資料：(DELETE)

delete 'ns1:t_userinfo','rk00002','base_info:age'

'ns1:t_userinfo','rk00001',{TIMERANGE=>[1534138686498,1534138738862]}

　　刪除指定的版本：(往上刪除版本)

delete 'ns1:t_userinfo','rk00001','base_info:name',TIMESTAMP=>1534138686498

　　表判斷：

exists 'ns1:t_userinfo'

disable 'ns1:t_userinfo'

enable 'ns1:t_userinfo'

desc 'ns1:t_userinfo'

　　統計表：(統計效率較差，不建議使用)

count 'ns1:t_userinfo'

　　清空表：

truncate 'ns1:test'

　　學習大資料開發，內容包含Linux&&Hadoop生態體系、大資料計算框架體系、雲端計算體系、機器學習&&深度學習。根據好程式設計師提供的大資料學習路線圖可以讓你對學習大資料需要掌握的知識有個清晰的瞭解，並快速入

好程式設計師大資料分享：Hbase精解

好程式設計師大資料分享：Hbase精解

好程式設計師大資料分享Shell中陣列講解

好程式設計師大資料教程：SparkShell和IDEA中編寫Spark程式

好程式設計師大資料學習路線分享Hbase指令學習

好程式設計師大資料教學點睛：Hadoop基礎篇

好程式設計師大資料高階班分享10個大資料專業術語

好程式設計師大資料解析 SQL優化方案精解十則

好程式設計師大資料知識點精講大資料之Linux

好程式設計師大資料基礎教程分享TextFile分割槽問題

好程式設計師大資料學習路線分享UDF函式

好程式設計師大資料學習路線分享函式+map對映+元祖

好程式設計師大資料學習路線分享HDFS讀流程

好程式設計師大資料學習路線分享Scala系列之陣列

好程式設計師大資料學習路線分享Map學習筆記

好程式設計師大資料學習路線分享執行緒學習筆記二

好程式設計師大資料學習路線分享Scala系列之物件

好程式設計師大資料教程Hadoop全分佈安裝（非HA)

好程式設計師大資料入門學習之Hadoop技術優缺點

好程式設計師大資料學習路線之mapreduce概述

好程式設計師大資料學習路線之zookeeper乾貨

好程式設計師大資料分享：Hbase精解

相關推薦