1. 程式人生 > >hbase資料查詢及過濾器詳細使用

hbase資料查詢及過濾器詳細使用

地址:https://blog.csdn.net/m0_37739193/article/details/73615016

本文介紹了在hbase中簡單的資料查詢及過濾器(比較全)的使用,程式碼和命令均經過本人實測通過,辛苦的將這些整理出來為以後方便查詢

建立並插入資料:

  1. hbase(main):179:0> create 'scores','grade','course'  
  2. hbase(main):180:0> put 'scores','zhangsan01','course:art','90'  
  3. hbase(main):181:0> scan 'scores'  
  4. ROW                                                          COLUMN+CELL                                                                                                                                                                       
  5.  zhangsan01                                                  column=course:art, timestamp=1498003561726, value=90                                                                                                                              
  6. 1 row(s) in 0.0150 seconds  
  7. hbase(main):182:0> put 'scores','zhangsan01','course:math','99',1498003561726  
  8. (這裡手動設定時間戳的時候一定不能大於你當前的系統時間,否則的話無法刪除該資料,我這裡手動設定資料是為了下面的DependentColumnFilter過濾器試驗。你可以檢視一下插入第一條資料的時間戳,再插入第二條資料的時間戳為第一條資料的時間戳)  
  9. hbase(main):183:0> put 'scores','zhangsan01','grade:','101'  
  10. 問題:當我將這條插入的資料刪除之後再執行put 'scores','zhangsan01','grade:','101',1498003561726後能成功卻scan 'scores'後沒有該條資料,而再執行put 'scores','zhangsan01','grade:','101'後scan 'scores'卻能查到該條資料。如果想插入該條資料的時候手動設定時間戳的話,必須在第一次插入該條資料或者truncate後再插入。  
  11. hbase(main):184:0> put 'scores','zhangsan02','course:art','90'  
  12. hbase(main):185:0> get 'scores','zhangsan02','course:art'  
  13. COLUMN                                                       CELL                                                                                                                                                                              
  14.  course:art                                                  timestamp=1498003601365, value=90                                                                                                                                                 
  15. 1 row(s) in 0.0080 seconds  
  16. hbase(main):186:0> put 'scores','zhangsan02','grade:','102',1498003601365  
  17. hbase(main):187:0> put 'scores','zhangsan02','course:math','66',1498003561726  
  18. hbase(main):188:0> put 'scores','lisi01','course:math','89',1498003561726  
  19. hbase(main):189:0> put 'scores','lisi01','course:art','89'  
  20. hbase(main):190:0> put 'scores','lisi01','grade:','201',1498003561726  

查詢資料:

根據rowkey查詢:

  1. hbase(main):187:0> get 'scores','zhangsan01'  
  2. COLUMN                                                       CELL                                                                                                                                                                              
  3.  course:art                                                  timestamp=1498003561726, value=90                                                                                                                                                 
  4.  course:math                                                 timestamp=1498003561726, value=99                                                                                                                                                 
  5.  grade:                                                      timestamp=1498003593575, value=101                                                                                                                                                
  6. 3 row(s) in 0.0160 seconds  

根據列名查詢:
  1. hbase(main):188:0> scan 'scores',{COLUMNS=>'course:art'}  
  2. ROW                                                          COLUMN+CELL                                                                                                                                                                       
  3.  lisi01                                                      column=course:art, timestamp=1498003655021, value=89                                                                                                                              
  4.  zhangsan01                                                  column=course:art, timestamp=1498003561726, value=90                                                                                                                              
  5.  zhangsan02                                                  column=course:art, timestamp=1498003601365, value=90                                                                                                                              
  6. 3 row(s) in 0.0120 seconds  

查詢兩個rowkey之間的資料:
  1. hbase(main):205:0> scan 'scores',{STARTROW=>'zhangsan01',STOPROW=>'zhangsan02'}  
  2. ROW                                                          COLUMN+CELL                                                                                                                                                                       
  3.  zhangsan01                                                  column=course:art, timestamp=1498003561726, value=90                                                                                                                              
  4.  zhangsan01                                                  column=course:math, timestamp=1498003561726, value=99                                                                                                                             
  5.  zhangsan01                                                  column=grade:, timestamp=1498003593575, value=101                                                                                                                                 
  6. 1 row(s) in 0.0140 seconds  

查詢兩個rowkey且根據列名來查詢:
  1. hbase(main):206:0> scan 'scores',{COLUMNS=>'course:art',STARTROW=>'zhangsan01',STOPROW=>'zhangsan02'}  
  2. ROW                                                          COLUMN+CELL                                                                                                                                                                       
  3.  zhangsan01                                                  column=course:art, timestamp=1498003561726, value=90                                                                                                                              
  4. 1 row(s) in 0.0110 seconds  

查詢指定rowkey到末尾根據列名的查詢:
  1. hbase(main):207:0> scan 'scores',{COLUMNS=>'course:art',STARTROW=>'zhangsan01',STOPROW=>'zhangsan09'}  
  2. ROW                                                          COLUMN+CELL                                                                                                                                                                       
  3.  zhangsan01                                                  column=course:art, timestamp=1498003561726, value=90                                                                                                                              
  4.  zhangsan02                                                  column=course:art, timestamp=1498003601365, value=90                                                                                                                              
  5. 2 row(s) in 0.0310 seconds  

過濾器的使用:

引言 -- 引數基礎

有兩個引數類在各類Filter中經常出現,統一介紹下:
(1)比較運算子CompareOp
比較運算子用於定義比較關係,可以有以下幾類值供選擇:
EQUAL                      相等
GREATER                    大於
GREATER_OR_EQUAL           大於等於
LESS                       小於
LESS_OR_EQUAL              小於等於
NOT_EQUAL                  不等於

(2)比較器  ByteArrayComparable
通過比較器可以實現多樣化目標匹配效果,比較器有以下子類可以使用:
BinaryComparator           匹配完整位元組陣列 
BinaryPrefixComparator     匹配位元組陣列字首 
BitComparator          Performs a bitwise comparison, providing a BitwiseOp class with OR, and XOR operators.
NullComparator  Does not compare against an actual value but whether a given one is null, or not  null.
RegexStringComparator      正則表示式匹配
SubstringComparator        子串匹配

  1. import java.io.IOException;  
  2. import java.util.ArrayList;  
  3. import java.util.Arrays;  
  4. import java.util.List;  
  5. import org.apache.hadoop.conf.Configuration;  
  6. import org.apache.hadoop.hbase.Cell;  
  7. import org.apache.hadoop.hbase.CellUtil;  
  8. import org.apache.hadoop.hbase.HBaseConfiguration;  
  9. import org.apache.hadoop.hbase.TableName;  
  10. import org.apache.hadoop.hbase.client.Admin;  
  11. import org.apache.hadoop.hbase.client.Connection;  
  12. import org.apache.hadoop.hbase.client.ConnectionFactory;  
  13. import org.apache.hadoop.hbase.client.Get;  
  14. import org.apache.hadoop.hbase.client.Result;  
  15. import org.apache.hadoop.hbase.client.ResultScanner;  
  16. import org.apache.hadoop.hbase.client.Scan;  
  17. import org.apache.hadoop.hbase.client.Table;  
  18. import org.apache.hadoop.hbase.filter.BinaryComparator;  
  19. import org.apache.hadoop.hbase.filter.ColumnCountGetFilter;  
  20. import org.apache.hadoop.hbase.filter.ColumnPaginationFilter;  
  21. import org.apache.hadoop.hbase.filter.ColumnPrefixFilter;  
  22. import org.apache.hadoop.hbase.filter.ColumnRangeFilter;  
  23. import org.apache.hadoop.hbase.filter.DependentColumnFilter;  
  24. import org.apache.hadoop.hbase.filter.FamilyFilter;  
  25. import org.apache.hadoop.hbase.filter.Filter;  
  26. import org.apache.hadoop.hbase.filter.FilterList;  
  27. import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;  
  28. import org.apache.hadoop.hbase.filter.FuzzyRowFilter;  
  29. import org.apache.hadoop.hbase.filter.InclusiveStopFilter;  
  30. import org.apache.hadoop.hbase.filter.KeyOnlyFilter;  
  31. import org.apache.hadoop.hbase.filter.MultipleColumnPrefixFilter;  
  32. import org.apache.hadoop.hbase.filter.PageFilter;  
  33. import org.apache.hadoop.hbase.filter.PrefixFilter;  
  34. import org.apache.hadoop.hbase.filter.QualifierFilter;  
  35. import org.apache.hadoop.hbase.filter.RandomRowFilter;  
  36. import org.apache.hadoop.hbase.filter.RegexStringComparator;  
  37. import org.apache.hadoop.hbase.filter.RowFilter;  
  38. import org.apache.hadoop.hbase.filter.SingleColumnValueExcludeFilter;  
  39. import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;  
  40. import org.apache.hadoop.hbase.filter.SkipFilter;  
  41. import org.apache.hadoop.hbase.filter.SubstringComparator;  
  42. import org.apache.hadoop.hbase.filter.TimestampsFilter;  
  43. import org.apache.hadoop.hbase.filter.ValueFilter;  
  44. import org.apache.hadoop.hbase.filter.WhileMatchFilter;  
  45. import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;  
  46. import org.apache.hadoop.hbase.util.Bytes;  
  47. import org.apache.hadoop.hbase.util.Pair;  
  48. publicclass HbaseUtils {  
  49.     publicstatic Admin admin = null;  <