1. 程式人生 > >HBase Shell常用過濾器

HBase Shell常用過濾器

建立表

create ‘test1’, ‘lf’, ‘sf’

lf: column family of LONG values (binary value); sf: column family of STRING values;

匯入資料

put ‘test1’, ‘user1|ts1’, ‘sf:c1’, ‘sku1’ put ‘test1’, ‘user1|ts2’, ‘sf:c1’, ‘sku188’ put ‘test1’, ‘user1|ts3’, ‘sf:s1’, ‘sku123’ put ‘test1’, ‘user2|ts4’, ‘sf:c1’, ‘sku2’ put ‘test1’, ‘user2|ts5’, ‘sf:c2’, ‘sku288’ put ‘test1’, ‘user2|ts6’, ‘sf:s1’, 'sku222

一個使用者(userX),在什麼時間(tsX),作為rowkey

對什麼產品(value:skuXXX),做了什麼操作作為列名,比如,c1: click from homepage; c2: click from ad; s1: search from homepage; b1: buy

查詢案例 誰的值=sku188

scan ‘test1’, FILTER=>“ValueFilter(=,‘binary:sku188’)” ROW           COLUMN+CELL user1|ts2     column=sf:c1, timestamp=1409122354918, value=sku188

誰的值包含88

scan ‘test1’, FILTER=>“ValueFilter(=,‘substring:88’)” ROW           COLUMN+CELL user1|ts2     column=sf:c1, timestamp=1409122354918, value=sku188 user2|ts5     column=sf:c2, timestamp=1409122355030, value=sku288

通過廣告點選進來的(column為c2)值包含88的使用者

scan ‘test1’, FILTER=>“ColumnPrefixFilter(‘c2’) AND ValueFilter(=,‘substring:88’)” ROW           COLUMN+CELL user2|ts5     column=sf:c2, timestamp=1409122355030, value=sku288

通過搜尋進來的(column為s)值包含123或者222的使用者

scan ‘test1’, FILTER=>“ColumnPrefixFilter(‘s’) AND ( ValueFilter(=,‘substring:123’) OR ValueFilter(=,‘substring:222’) )” ROW           COLUMN+CELL user1|ts3     column=sf:s1, timestamp=1409122354954, value=sku123 user2|ts6     column=sf:s1, timestamp=1409122355970, value=sku222

rowkey為user1開頭的

scan ‘test1’, FILTER => “PrefixFilter (‘user1’)” ROW           COLUMN+CELL user1|ts1 column=sf:c1, timestamp=1409122354868, value=sku1 user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188 user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123

FirstKeyOnlyFilter: 一個rowkey可以有多個version,同一個rowkey的同一個column也會有多個的值, 只拿出key中的第一個column的第一個version KeyOnlyFilter: 只要key,不要value

scan ‘test1’, FILTER=>“FirstKeyOnlyFilter() AND ValueFilter(=,‘binary:sku188’) AND KeyOnlyFilter()” ROW           COLUMN+CELL user1|ts2 column=sf:c1, timestamp=1409122354918, value=

從user1|ts2開始,找到所有的rowkey以user1開頭的

scan ‘test1’, {STARTROW=>‘user1|ts2’, FILTER => “PrefixFilter (‘user1’)”} ROW           COLUMN+CELL user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188 user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123

從user1|ts2開始,找到所有的到rowkey以user2開頭

scan ‘test1’, {STARTROW=>‘user1|ts2’, STOPROW=>‘user2’} ROW            COLUMN+CELL user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188 user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123

查詢rowkey裡面包含ts3的

scan ‘test1’, {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf(‘EQUAL’), SubstringComparator.new(‘ts3’))} ROW           COLUMN+CELL user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123

查詢rowkey裡面包含ts的

scan ‘test1’, {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf(‘EQUAL’), SubstringComparator.new(‘ts’))} ROW            COLUMN+CELL user1|ts1 column=sf:c1, timestamp=1409122354868, value=sku1 user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188 user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123 user2|ts4 column=sf:c1, timestamp=1409122354998, value=sku2 user2|ts5 column=sf:c2, timestamp=1409122355030, value=sku288 user2|ts6 column=sf:s1, timestamp=1409122355970, value=sku222

加入一條測試資料

put ‘test1’, ‘user2|err’, ‘sf:s1’, ‘sku999’

查詢rowkey裡面以user開頭的,新加入的測試資料並不符合正則表示式的規則,故查詢不出來

scan ‘test1’, {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf(‘EQUAL’),RegexStringComparator.new(’^user\d+|ts\d+$’))} ROW           COLUMN+CELL user1|ts1 column=sf:c1, timestamp=1409122354868, value=sku1 user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188 user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123 user2|ts4 column=sf:c1, timestamp=1409122354998, value=sku2 user2|ts5 column=sf:c2, timestamp=1409122355030, value=sku288 user2|ts6 column=sf:s1, timestamp=1409122355970, value=sku222

加入測試資料

put ‘test1’, ‘user1|ts9’, ‘sf:b1’, ‘sku1’

b1開頭的列中並且值為sku1的

scan ‘test1’, FILTER=>“ColumnPrefixFilter(‘b1’) AND ValueFilter(=,‘binary:sku1’)” ROW            COLUMN+CELL user1|ts9 column=sf:b1, timestamp=1409124908668, value=sku1

SingleColumnValueFilter的使用,b1開頭的列中並且值為sku1的

scan ‘test1’, {COLUMNS => ‘sf:b1’, FILTER => SingleColumnValueFilter.new(Bytes.toBytes(‘sf’), Bytes.toBytes(‘b1’), CompareFilter::CompareOp.valueOf(‘EQUAL’), Bytes.toBytes(‘sku1’))} ROW            COLUMN+CELL user1|ts9 column=sf:b1, timestamp=1409124908668, value=sku1