HBase Region合併分析

阿新 • • 發佈：2018-12-09

1.概述

HBase中表的基本單位是Region，日常在呼叫HBase API操作一個表時，互動的資料也會以Region的形式進行呈現。一個表可以有若干個Region，今天筆者就來和大家分享一下Region合併的一些問題和解決方法。

2.內容

在分析合併Region之前，我們先來了解一下Region的體系結構，如下圖所示：

從圖中可知，能夠總結以下知識點：

HRegion：一個Region可以包含多個Store；
Store：每個Store包含一個Memstore和若干個StoreFile；
StoreFile：表資料真實儲存的地方，HFile是表資料在HDFS上的檔案格式。

如果要檢視HFile檔案，HBase有提供命令，命令如下：

hbase hfile -p -f /hbase/data/default/ip_login/d0d7d881bb802592c09d305e47ae70a5/_d/7ec738167e9f4d4386316e5e702c8d3d

執行輸出結果，如下圖所示：

2.1 為什麼需要合併Region

那為什麼需要合併Region呢？這個需要從Region的Split來說。當一個Region被不斷的寫資料，達到Region的Split的閥值時（由屬性hbase.hregion.max.filesize來決定，預設是10GB），該Region就會被Split成2個新的Region。隨著業務資料量的不斷增加，Region不斷的執行Split，那麼Region的個數也會越來越多。

一個業務表的Region越多，在進行讀寫操作時，或是對該表執行Compaction操作時，此時叢集的壓力是很大的。這裡筆者做過一個線上統計，在一個業務表的Region個數達到9000+時，每次對該表進行Compaction操作時，叢集的負載便會加重。而間接的也會影響應用程式的讀寫，一個表的Region過大，勢必整個叢集的Region個數也會增加，負載均衡後，每個RegionServer承擔的Region個數也會增加。

因此，這種情況是很有必要的進行Region合併的。比如，當前Region進行Split的閥值設定為30GB，那麼我們可以對小於等於10GB的Region進行一次合併，減少每個業務表的Region，從而降低整個叢集的Region，減緩每個RegionServer上的Region壓力。

2.2 如何進行Region合併

那麼我們如何進行Region合併呢？HBase有提供一個合併Region的命令，具體操作如下：

# 合併相鄰的兩個Region
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME'
# 強制合併兩個Region
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true

但是，這種方式會有一個問題，就是隻能一次合併2個Region，如果這裡有幾千個Region需要合併，這種方式是不可取的。

2.2.1 批量合併

這裡有一種批量合併的方式，就是通過編寫指令碼（merge_small_regions.rb）來實現，實現程式碼如下：

# Test Mode:
#
# hbase org.jruby.Main merge_empty_regions.rb namespace.tablename <skip_size> <batch_regions> <merge?>
#
# Non Test - ie actually do the merge:
#
# hbase org.jruby.Main merge_empty_regions.rb namespace.tablename <skip_size> <batch_regions> merge
#
# Note: Please replace namespace.tablename with your namespace and table, eg NS1.MyTable. This value is case sensitive.

require 'digest'
require 'java'
java_import org.apache.hadoop.hbase.HBaseConfiguration
java_import org.apache.hadoop.hbase.client.HBaseAdmin
java_import org.apache.hadoop.hbase.TableName
java_import org.apache.hadoop.hbase.HRegionInfo;
java_import org.apache.hadoop.hbase.client.Connection
java_import org.apache.hadoop.hbase.client.ConnectionFactory
java_import org.apache.hadoop.hbase.client.Table
java_import org.apache.hadoop.hbase.util.Bytes

def list_bigger_regions(admin, table, low_size)
  cluster_status = admin.getClusterStatus()
  master = cluster_status.getMaster()
  biggers = []
  cluster_status.getServers.each do |s|
    cluster_status.getLoad(s).getRegionsLoad.each do |r|
      # getRegionsLoad returns an array of arrays, where each array
      # is 2 elements

      # Filter out any regions that don't match the requested
      # tablename
      next unless r[1].get_name_as_string =~ /#{table}\,/
      if r[1].getStorefileSizeMB() > low_size
        if r[1].get_name_as_string =~ /\.([^\.]+)\.$/
          biggers.push $1
        else
          raise "Failed to get the encoded name for #{r[1].get_name_as_string}"
        end
      end
    end
  end
  biggers
end

# Handle command line parameters
table_name = ARGV[0]
low_size = 1024
if ARGV[1].to_i >= low_size
  low_size=ARGV[1].to_i
end

limit_batch = 1000
if ARGV[2].to_i <= limit_batch
  limit_batch = ARGV[2].to_i
end
do_merge = false
if ARGV[3] == 'merge'
  do_merge = true
end

config = HBaseConfiguration.create();
connection = ConnectionFactory.createConnection(config);
admin = HBaseAdmin.new(connection);

bigger_regions = list_bigger_regions(admin, table_name, low_size)
regions = admin.getTableRegions(Bytes.toBytes(table_name));

puts "Total Table Regions: #{regions.length}"
puts "Total bigger regions: #{bigger_regions.length}"

filtered_regions = regions.reject do |r|
  bigger_regions.include?(r.get_encoded_name)
end

puts "Total regions to consider for Merge: #{filtered_regions.length}"

filtered_regions_limit = filtered_regions

if filtered_regions.length < 2
  puts "There are not enough regions to merge"
  filtered_regions_limit = filtered_regions
end

if filtered_regions.length > limit_batch
   filtered_regions_limit = filtered_regions[0,limit_batch]
   puts "But we will merge : #{filtered_regions_limit.length} regions because limit in parameter!"
end


r1, r2 = nil
filtered_regions_limit.each do |r|
  if r1.nil?
    r1 = r
    next
  end
  if r2.nil?
    r2 = r
  end
  # Skip any region that is a split region
  if r1.is_split()
    r1 = r2
    r2 = nil
  puts "Skip #{r1.get_encoded_name} bcause it in spliting!"
    next
  end
  if r2.is_split()
    r2 = nil
 puts "Skip #{r2.get_encoded_name} bcause it in spliting!"
    next
  end
  if HRegionInfo.are_adjacent(r1, r2)
    # only merge regions that are adjacent
    puts "#{r1.get_encoded_name} is adjacent to #{r2.get_encoded_name}"
    if do_merge
      admin.mergeRegions(r1.getEncodedNameAsBytes, r2.getEncodedNameAsBytes, false)
      puts "Successfully Merged #{r1.get_encoded_name} with #{r2.get_encoded_name}"
      sleep 2
    end
    r1, r2 = nil
  else
    puts "Regions are not adjacent, so drop the first one and with the #{r2.get_encoded_name} to  iterate again"
    r1 = r2
    r2 = nil
  end
end
admin.close

該指令碼預設是合併1GB以內的Region，個數為1000個。如果我們要合併小於10GB，個數在4000以內，指令碼（merging-region.sh）如下：

#! /bin/bash

num=$1

echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : RegionServer Start Merging..."
if [ ! -n "$num" ]; then
    echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Default Merging 10 Times."
    num=10
elif [[ $num == *[!0-9]* ]]; then
    echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Input [$num] Times Must Be Number."
    exit 1
else
    echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : User-Defined Merging [$num] Times."
fi

for (( i=1; i<=$num; i++ ))
do
    echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Merging [$i] Times,Total [$num] Times."
    hbase org.jruby.Main merge_small_regions.rb namespace.tablename 10240  4000 merge
    sleep 5
done

在merging-region.sh指令碼中，做了引數控制，可以迴圈來執行批量合併指令碼。可能在實際操作過程中，批量執行一次Region合併，合併後的結果Region還是有很多（可能此時又有新的Region生成），這是我們可以使用merging-region.sh這個指令碼多次執行批量合併Region操作，具體操作命令如下：

# 預設迴圈10次，例如本次迴圈執行5次
sh merging-region.sh 5

2.3 如果在合併Region的過程中出現永久RIT怎麼辦

在合併Region的過程中出現永久RIT怎麼辦？筆者在生產環境中就遇到過這種情況，在批量合併Region的過程中，出現了永久MERGING_NEW的情況，雖然這種情況不會影響現有叢集的正常的服務能力，但是如果叢集有某個節點發生重啟，那麼可能此時該RegionServer上的Region是沒法均衡的。因為在RIT狀態時，HBase是不會執行Region負載均衡的，即使手動執行balancer命令也是無效的。

如果不解決這種RIT情況，那麼後續有HBase節點相繼重啟，這樣會導致整個叢集的Region驗證不均衡，這是很致命的，對叢集的效能將會影響很大。經過查詢HBase JIRA單，發現這種MERGING_NEW永久RIT的情況是觸發了HBASE-17682的BUG，需要打上該Patch來修復這個BUG，其實就是HBase原始碼在判斷業務邏輯時，沒有對MERGING_NEW這種狀態進行判斷，直接進入到else流程中了。原始碼如下：

for (RegionState state : regionsInTransition.values()) {
        HRegionInfo hri = state.getRegion();
        if (assignedRegions.contains(hri)) {
          // Region is open on this region server, but in transition.
          // This region must be moving away from this server, or splitting/merging.
          // SSH will handle it, either skip assigning, or re-assign.
          LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
        } else if (sn.equals(state.getServerName())) {
          // Region is in transition on this region server, and this
          // region is not open on this server. So the region must be
          // moving to this server from another one (i.e. opening or
          // pending open on this server, was open on another one.
          // Offline state is also kind of pending open if the region is in
          // transition. The region could be in failed_close state too if we have
          // tried several times to open it while this region server is not reachable)
          if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
            LOG.info("Found region in " + state +
              " to be reassigned by ServerCrashProcedure for " + sn);
            rits.add(hri);
          } else if(state.isSplittingNew()) {
            regionsToCleanIfNoMetaEntry.add(state.getRegion());
          } else {
            LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
          }
        }
      }

修復之後的程式碼如下：

for (RegionState state : regionsInTransition.values()) {
        HRegionInfo hri = state.getRegion();
        if (assignedRegions.contains(hri)) {
          // Region is open on this region server, but in transition.
          // This region must be moving away from this server, or splitting/merging.
          // SSH will handle it, either skip assigning, or re-assign.
          LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
        } else if (sn.equals(state.getServerName())) {
          // Region is in transition on this region server, and this
          // region is not open on this server. So the region must be
          // moving to this server from another one (i.e. opening or
          // pending open on this server, was open on another one.
          // Offline state is also kind of pending open if the region is in
          // transition. The region could be in failed_close state too if we have
          // tried several times to open it while this region server is not reachable)
          if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
            LOG.info("Found region in " + state +
              " to be reassigned by ServerCrashProcedure for " + sn);
            rits.add(hri);
          } else if(state.isSplittingNew()) {
            regionsToCleanIfNoMetaEntry.add(state.getRegion());
          } else if (isOneOfStates(state, State.SPLITTING_NEW, State.MERGING_NEW)) {
             regionsToCleanIfNoMetaEntry.add(state.getRegion());
           }else {
            LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
          }
        }
      }

但是，這裡有一個問題，目前該JIRA單只是說了需要去修復BUG，打Patch。但是，實際生產情況下，面對這種RIT情況，是不可能長時間停止叢集，影響應用程式讀寫的。那麼，有沒有臨時的解決辦法，先臨時解決當前的MERGING_NEW這種永久RIT，之後在進行HBase版本升級操作。

辦法是有的，在分析了MERGE合併的流程之後，發現HBase在執行Region合併時，會先生成一個初始狀態的MERGING_NEW。整個Region合併流程如下：

從流程圖中可以看到，MERGING_NEW是一個初始化狀態，在Master的記憶體中，而處於Backup狀態的Master記憶體中是沒有這個新Region的MERGING_NEW狀態的，那麼可以通過對HBase的Master進行一個主備切換，來臨時消除這個永久RIT狀態。而HBase是一個高可用的叢集，進行主備切換時對使用者應用來說是無感操作。因此，面對MERGING_NEW狀態的永久RIT可以使用對HBase進行主備切換的方式來做一個臨時處理方案。之後，我們在對HBase進行修復BUG，打Patch進行版本升級。

3.總結

HBase的RIT問題，是一個比較常見的問題，在遇到這種問題時，可以先冷靜的分析原因，例如檢視Master的日誌、仔細閱讀HBase Web頁面RIT異常的描述、使用hbck命令檢視Region、使用fsck檢視HDFS的block等。分析出具體的原因後，我們在對症下藥，做到大膽猜想，小心求證。

4.結束語

這篇部落格就和大家分享到這裡，如果大家在研究學習的過程當中有什麼問題，可以加群進行討論或傳送郵件給我，我會盡我所能為您解答，與君共勉！

另外，博主出書了《Hadoop大資料探勘從入門到進階實戰》，喜歡的朋友或同學，可以在公告欄那裡點選購買連結購買博主的書進行學習，在此感謝大家的支援。

HBase Region合併分析

1.概述

2.內容

2.1 為什麼需要合併Region

2.2 如何進行Region合併

2.2.1 批量合併

2.3 如果在合併Region的過程中出現永久RIT怎麼辦

3.總結

4.結束語

HBase Region合併分析

HBase原始碼分析 -- HBase Region 拆分(split)

HBase的compact分析

HBase源碼分析之WAL

淺析HBase region的單點問題

分布式存儲系統Kudu與HBase的簡要分析與對比

Spark Stream整合flum和kafka，資料儲存在HBASE上，分析後存入資料庫

Hbase Region in transition (RIT) 異常解決

HBase region is not online 問題修復

分散式儲存系統Kudu與HBase的簡要分析與對比

HBase（08）——HBase Region管理及容錯性

Spark Hbase GeoMesa編寫分析模組

大資料Spark優化讀取Hbase--region 提高並行數過程詳細解析

Hbase region 管理

HBase Region 的分裂

HBase Region自動切分的所有細節都在這裡了

HBase Region劃分策略總結

HBase 線上問題分析小記

Hbase Region Load Balance on Table Level

hbase region分配，RS下線處理

HBase Region合併分析

1.概述

2.內容

2.1 為什麼需要合併Region

2.2 如何進行Region合併

2.2.1 批量合併

2.3 如果在合併Region的過程中出現永久RIT怎麼辦

3.總結

4.結束語

相關推薦