HBase原始碼分析 -- HBase Region 拆分(split)

阿新 • • 發佈：2018-12-31

程式碼版本：hbase-1.2.6

工程：hbase-server

類：org.apache.hadoop.hbase.regionserver.HRegion

需要解決的問題：

1、什麼時候觸發拆分？

2、拆分的策略是什麼？

1、判斷是否需要切分

方法： checkSplit

返回值： splitpoint

做了一些判斷後，其實是呼叫:

byte[] ret = splitPolicy.getSplitPoint();

2、切分策略

org.apache.hadoop.hbase.regionserver.RegionSplitPolicy

/**
   * @return the key at which the region should be split, or null
   * if it cannot be split. This will only be called if shouldSplit
   * previously returned true.
   */
  protected byte[] getSplitPoint() {
    byte[] explicitSplitPoint = this.region.getExplicitSplitPoint();
    if (explicitSplitPoint != null) {
      return explicitSplitPoint;
    }
    List<Store> stores = region.getStores();

    byte[] splitPointFromLargestStore = null;
    long largestStoreSize = 0;
    for (Store s : stores) {
      byte[] splitPoint = s.getSplitPoint();
      long storeSize = s.getSize();
      if (splitPoint != null && largestStoreSize < storeSize) {
        splitPointFromLargestStore = splitPoint;
        largestStoreSize = storeSize;
      }
    }

    return splitPointFromLargestStore;
  }

從上邊程式碼看如果explicitSplitPoint不為空，則使用這個，再往上查是forceSplit賦值的

如果explicitSplitPoint為空，則region.getStores() ，根據storeSize找到splitPoint

呼叫的是HStore的getSplitPoint方法：

@Override
  public byte[] getSplitPoint() {
    this.lock.readLock().lock();
    try {
      // Should already be enforced by the split policy!
      assert !this.getRegionInfo().isMetaRegion();
      // Not split-able if we find a reference store file present in the store.
      if (hasReferences()) {
        return null;
      }
      return this.storeEngine.getStoreFileManager().getSplitPoint();
    } catch(IOException e) {
      LOG.warn("Failed getting store size for " + this, e);
    } finally {
      this.lock.readLock().unlock();
    }
    return null;
  }

DefaultStoreFileManager

@Override
  public final byte[] getSplitPoint() throws IOException {
    if (this.storefiles.isEmpty()) {
      return null;
    }
    return StoreUtils.getLargestFile(this.storefiles).getFileSplitPoint(this.kvComparator);
  }

然後到SotreFile的：

 /**
   * Gets the approximate mid-point of this file that is optimal for use in splitting it.
   * @param comparator Comparator used to compare KVs.
   * @return The split point row, or null if splitting is not possible, or reader is null.
   */
  @SuppressWarnings("deprecation")
  byte[] getFileSplitPoint(KVComparator comparator) throws IOException {
    if (this.reader == null) {
      LOG.warn("Storefile " + this + " Reader is null; cannot get split point");
      return null;
    }
    // Get first, last, and mid keys.  Midkey is the key that starts block
    // in middle of hfile.  Has column and timestamp.  Need to return just
    // the row we want to split on as midkey.
    byte [] midkey = this.reader.midkey();
    if (midkey != null) {
      KeyValue mk = KeyValue.createKeyValueFromKey(midkey, 0, midkey.length);
      byte [] fk = this.reader.getFirstKey();
      KeyValue firstKey = KeyValue.createKeyValueFromKey(fk, 0, fk.length);
      byte [] lk = this.reader.getLastKey();
      KeyValue lastKey = KeyValue.createKeyValueFromKey(lk, 0, lk.length);
      // if the midkey is the same as the first or last keys, we cannot (ever) split this region.
      if (comparator.compareRows(mk, firstKey) == 0 || comparator.compareRows(mk, lastKey) == 0) {
        if (LOG.isDebugEnabled()) {
          LOG.debug("cannot split because midkey is the same as first or last row");
        }
        return null;
      }
      return mk.getRow();
    }
    return null;
  }

HBase原始碼分析 -- HBase Region 拆分(split)

程式碼版本：hbase-1.2.6 工程：hbase-server 類：org.apache.hadoop.hbase.regionserver.HRegion 需要解決的問題： 1、什麼時候觸發拆分？ 2、拆分的策略是什麼？ 1、判斷是否需要切分方法： check

HBase原始碼分析之如何找到region location

通過client的原始碼分析，我們發現每次建立連線前需要先找到rowkey所屬region的regionserver。本篇來分析一下這個找到regionserver的整個流程。從程式碼connection.getRegionLocator(tableName

HBase原始碼分析之HRegion上compact流程分析（二）

2016年03月03日 21:38:04 辰辰爸的技術部落格閱讀數：2767 版權宣告：本文為博主原創文章，未經博主允許不得轉載。 https://blog.csdn.net/lipeng_bigdata/article/details/50791205

HBase原始碼分析之HRegionServer上compact流程分析

前面三篇文章中，我們詳細敘述了compact流程是如何在HRegion上進行的，瞭解了它的很多細節方面的問題。但是，這個compact在HRegionServer上是如何進行的？合併時檔案是如何選擇的呢？在這篇文章中，你將找到答案！首先，在

hbase 原始碼分析（6）get 過程詳解

上一個章節將getregionLocator的客戶端分析完了，服務端就是一個scan方法，這個等到分析SCAN的時候再做說明。這一章節將分析GET過程。 **GET過程， 1）找到zk，拿到MATA裡的RegionService地址。 2）訪問第一

Hbase 原始碼分析之 Regionserver上的 Get 全流程

當regionserver收到來自客戶端的Get請求時，呼叫介面 public Result get(byte[] regionName, Get get) { ... HRegion region = getRegion(regionName); return regio

hbase 原始碼分析（20）總結

放在最後的話第一次寫這麼長時間的部落格。有點辛苦，主要是白天上班還不能寫。晚上會寫到一兩點。還好沒有放棄，hbase的基本寫完了。之後會不斷補充，完善。第一次寫，很多地方可能沒有考慮清楚。第一次寫，帶著學習的目的。不好請大家多擔待。第一次寫，學到

hbase 原始碼分析（15）compact 過程

上一個章節分析了spit過程。當時遺留了compact問題沒有分析。這個章節將重點分析一下。 compact流程: 這個流程沒有寫完，涉及都行太多了，都沒有心情寫了。先留著吧，入口：HStore.java 結束flush之後，會做這樣一個判斷。 p

HBase原始碼分析2 – RPC機制:客戶端

先澄清一些本文中術語的涵意客戶端 – 指的是HBase client API.提供了從使用者程式連線到HBase後臺伺服器即Master server及Region server的功能服務端 – 即指的是HBase的Master server 及 Region serv

HBase原始碼分析之regionserver讀取流程分析

資料的讀取包括Get和Scan2種，通過get的程式碼可以看出實際也是通過轉換為一個Scan來處理的。 //HRegion.java public List<Cell> get(Get get, boolean withCoprocessor)

HBase原始碼分析之KeyValue

HBase內部，單元格Cell的實現為KeyValue，它是HBase某行資料的某個單元格在記憶體中的組織形式，由Key Length、Value Length、Key、Value四大部分組成。其中，Key又由Row Length、Row、Column Fa

HBase 0.94.8 split 原始碼分析

1. 發起 hbase split1.1 HBaseAdmin.split /** * Split a table or an individual region. * Asynchronous operation. * * @param tabl

HBase的RPC原始碼分析

RPC服務是指跨網路的服務呼叫，客戶端發出服務請求，經過網路傳輸到服務端。服務端解析該請求，呼叫本地方法獲取結果，然後將結果作為響應包通過網路傳送回客戶端，這樣客戶端在呼叫遠端方法時就會像呼叫本地方法一樣簡單。 RPC呼叫時有兩個問題需要解決，其一是client端與se

HBase 1.1.3 balance相關原始碼分析一

HMaster類中與balance相關部分1、初始化//balancer作為HMaster的一個成員變數 LoadBalancer balancer; //ClusterStatusChore 這個會定時去執行balancer private ClusterStatus

hbase客戶端原始碼分析--deletetable

–hbase 刪除表 HBaseAdmin admin = new HBaseAdmin(conf); 可以檢視原始碼，其實低層也是呼叫建立 HConnectionImplementation 物件進行連線管理的 admin.disableTable(t

HBase的put流程原始碼分析

hbase是一個nosql型資料庫，本文我們會分析一下客戶的資料是通過什麼樣的路徑寫入到hbase的。HBase作為一種列族資料庫，其將相關性較高的列聚合成一個列族單元，不同的列族單元物理上儲存在不同的檔案（HFile）內。一個表的資料會水平切割成不同的region分佈在叢集中不同的regionserver上

HBase的Scan實現原始碼分析

public Cell peek() { if (this.current == null) { return null; } return this.current.peek(); } 講完了上述三個重要的資料結構，迴歸到hbase系統，HBase的表資料分為多個層次，分別是H

Hbase-0.98.6原始碼分析--Put寫操作Client端流程

客戶端程式寫資料通過HTable和Put進行操作，我們從客戶端程式碼開始分析寫資料的流程：可以看到，客戶端寫資料最終的呼叫了HTableInterface的put()方法，因為HTableInterface只是一個介面，所以最終呼叫的是它的

Apache HBase region拆分

本篇文章主要分享Apache HBase如何通過regions實現負載均衡以及如何管理region拆分。 HBase以表的形式儲存多行資料。表被劃分為”regions“。Regions分佈在叢集的不同節點上，通過RegionServer程序被客戶端呼叫。一個r

hbase客戶端原始碼分析--put流程

—client 的呼叫流程 table.put(put); 操作 HTable table = new HTable(conf, Bytes.toBytes(tableName)); 呼叫流程如上面的delete流程一樣首先建立一個muti的操作物件

HBase原始碼分析 -- HBase Region 拆分(split)

相關推薦