1. 程式人生 > >一致性hash

一致性hash

foreach 同時 uri 我們 ffffff war print md5 imp

本文同時發表在https://github.com/zhangyachen/zhangyachen.github.io/issues/74

為什麽要使用一致性HASH

傳統的將請求映射到Cache的方法一般都是hash(object)%N,object可以代表請求者的ip,hash代表某一類hash函數,可以是crc32、md5或者自定義的函數。

技術分享圖片

如果此時想加一臺Cache,N就變成了4,結果如下圖:

技術分享圖片

可以看出,緩存加入後,之前的緩存全部失效,流量全部打到了DB上,對DB造成極大的壓力。
所以,我們想要的結果是,加入新的Cache後,不影響之前已經分配的object,也就是需要滿足單調性:

單調性是指如果已經有一些內容通過哈希分派到了相應的緩沖中,又有新的緩沖加入到系統中。哈希的結果應能夠保證原有已分配的內容可以被映射到新的緩沖中去,而不會被映射到舊的緩沖集合中的其他緩沖區。

一致性HASH原理

我們將hash之後的值分配到0 - 2^32-1的範圍中。可以將這個範圍想象成一個圓環。當有object1請求時,對它進行hash算法,得到一個整數,映射到圓環的對應位置,比如A點,此時在圓環上的A點是沒有Cache機器的,此時我們沿著圓環順時針尋找Cache,找到了CacheA,所以object1對應的Cache機器是CacheA。object2和object3同理。
註意:Cahce機器也是有ip地址的,也是利用和object同一套hash算法分配到圓環上的相應位置。

技術分享圖片

我們現在看看當添加機器會出現什麽情況:

技術分享圖片

我們在CacheB和CacheC中添加了CahcheD,發現原有的object1,object2,object3的請求還是對應到了原有的Cache機器上(因為object對應的hash值不變,對應的到圓環上的位置也不變,Cache機器在圓環上的位置也不變),影響的只是圖中紅色區域部分,這部分之前對應到的是CacheC,現在對應到了CacheD。

當我們刪除一臺Cache機器呢?比如CacheB:

技術分享圖片

可以看出,之前的object2本來分配到CacheB,現在分配到了CacheD,影響到的也只是CacheA到CacheB之間的圓環區域(圖中紅色區域),object1和object3分配到的Cache機器還不變。

虛擬節點

我們看上圖可以發現,CacheC是空閑的,沒有請求落到CahceC上,而我們希望的是所有請求都均勻的落到所有緩存機器上。為了解決這種情況,我們引入了虛擬節點的概念:

虛擬節點是實際節點在 hash 空間的復制品,一實際個節點對應了若幹個“虛擬節點”,這個對應個數也成為“復制個數”,虛擬節點在 hash 空間中以 hash 值排列。

我們首先設置“復制個數”,假如是2,即CacheA有2個虛擬節點,CacheA1和CacheA2。同理CacheB也是。

技術分享圖片

圖中紅色區域是虛擬節點,我們發現object2映射到了CacheA1,而CacheA1是CacheA的虛擬節點,所以obejct2的請求最終落到了CacheA上。同理object3的請求落到了CacheB1,而CacheB1是CacheB的虛擬節點,所以obejct3的請求最終落到了CacheB上。
註意CacheA1,CacheA2,CacheB1,CacheB2不是實際的Cache機器,只是根據hash ip之後放在圓環上的虛擬節點。

“虛擬節點”的 hash 計算可以采用對應節點的 IP 地址加數字後綴的方式。例如假設 cache A 的IP 地址為 202.168.14.241 。
引入“虛擬節點”前,計算 cache A 的 hash 值:
Hash(“202.168.14.241”);
引入“虛擬節點”後,計算“虛擬節”點 cache A1 和 cache A2 的 hash 值:
Hash(“202.168.14.2411”); // cache A1
Hash(“202.168.14.2412”); // cache A2

轉載一個用php實現的一致性hash算法:

<?php
/**
* Flexihash - A simple consistent hashing implementation for PHP.
*
* The MIT License
*
* Copyright (c) 2008 Paul Annesley
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
* @author Paul Annesley
* @link http://paul.annesley.cc/
* @copyright Paul Annesley, 2008
* @comment by MyZ (http://blog.csdn.net/mayongzhan)
*/

/**
* A simple consistent hashing implementation with pluggable hash algorithms.
*
* @author Paul Annesley
* @package Flexihash
* @licence http://www.opensource.org/licenses/mit-license.php
*/
class Flexihash
{

     /**
     * The number of positions to hash each target to.
     *
     * @var int
     * @comment 虛擬節點數,解決節點分布不均的問題
     */
     private $_replicas = 64;

     /**
     * The hash algorithm, encapsulated in a Flexihash_Hasher implementation.
     * @var object Flexihash_Hasher
     * @comment 使用的hash方法 : md5,crc32
     */
     private $_hasher;

     /**
     * Internal counter for current number of targets.
     * @var int
     * @comment 節點記數器
     */
     private $_targetCount = 0;

     /**
     * Internal map of positions (hash outputs) to targets
     * @var array { position => target, ... }
     * @comment 位置對應節點,用於lookup中根據位置確定要訪問的節點
     */
     private $_positionToTarget = array();

     /**
     * Internal map of targets to lists of positions that target is hashed to.
     * @var array { target => [ position, position, ... ], ... }
     * @comment 節點對應位置,用於刪除節點
     */
     private $_targetToPositions = array();

     /**
     * Whether the internal map of positions to targets is already sorted.
     * @var boolean
     * @comment 是否已排序
     */
     private $_positionToTargetSorted = false;

     /**
     * Constructor
     * @param object $hasher Flexihash_Hasher
     * @param int $replicas Amount of positions to hash each target to.
     * @comment 構造函數,確定要使用的hash方法和虛擬節點數,虛擬節點數越多,分布越均勻,但程序的分布式運算越慢
     */
     public function __construct(Flexihash_Hasher $hasher = null, $replicas = null)
     {
          $this->_hasher = $hasher ? $hasher : new Flexihash_Crc32Hasher();
          if (!empty($replicas)) $this->_replicas = $replicas;
     }

     /**
     * Add a target.
     * @param string $target
     * @chainable
     * @comment 添加節點,根據虛擬節點數,將節點分布到多個虛擬位置上
     */
     public function addTarget($target)
     {
          if (isset($this->_targetToPositions[$target]))
          {
               throw new Flexihash_Exception("Target ‘$target‘ already exists.");
          }

          $this->_targetToPositions[$target] = array();

          // hash the target into multiple positions
          for ($i = 0; $i < $this->_replicas; $i++)
          {
               $position = $this->_hasher->hash($target . $i);
               $this->_positionToTarget[$position] = $target; // lookup
               $this->_targetToPositions[$target] []= $position; // target removal
          }

          $this->_positionToTargetSorted = false;
          $this->_targetCount++;

          return $this;
     }

     /**
     * Add a list of targets.
     * @param array $targets
     * @chainable
     */
     public function addTargets($targets)
     {
          foreach ($targets as $target)
          {
               $this->addTarget($target);
          }

          return $this;
     }

     /**
     * Remove a target.
     * @param string $target
     * @chainable
     */
     public function removeTarget($target)
     {
          if (!isset($this->_targetToPositions[$target]))
          {
               throw new Flexihash_Exception("Target ‘$target‘ does not exist.");
          }

          foreach ($this->_targetToPositions[$target] as $position)
          {
               unset($this->_positionToTarget[$position]);
          }

          unset($this->_targetToPositions[$target]);

          $this->_targetCount--;

          return $this;
     }

     /**
     * A list of all potential targets
     * @return array
     */
     public function getAllTargets()
     {
          return array_keys($this->_targetToPositions);
     }

     /**
     * Looks up the target for the given resource.
     * @param string $resource
     * @return string
     */
     public function lookup($resource)
     {
          $targets = $this->lookupList($resource, 1);
          if (empty($targets)) throw new Flexihash_Exception(‘No targets exist‘);
          return $targets[0];
     }

     /**
     * Get a list of targets for the resource, in order of precedence.
     * Up to $requestedCount targets are returned, less if there are fewer in total.
     *
     * @param string $resource
     * @param int $requestedCount The length of the list to return
     * @return array List of targets
     * @comment 查找當前的資源對應的節點,
     *          節點為空則返回空,節點只有一個則返回該節點,
     *          對當前資源進行hash,對所有的位置進行排序,在有序的位置列上尋找當前資源的位置
     *          當全部沒有找到的時候,將資源的位置確定為有序位置的第一個(形成一個環)
     *          返回所找到的節點
     */
     public function lookupList($resource, $requestedCount)
     {
          if (!$requestedCount)
               throw new Flexihash_Exception(‘Invalid count requested‘);

          // handle no targets
          if (empty($this->_positionToTarget))
               return array();

          // optimize single target
          if ($this->_targetCount == 1)
               return array_unique(array_values($this->_positionToTarget));

          // hash resource to a position
          $resourcePosition = $this->_hasher->hash($resource);

          $results = array();
          $collect = false;

          $this->_sortPositionTargets();

          // search values above the resourcePosition
          foreach ($this->_positionToTarget as $key => $value)
          {
               // start collecting targets after passing resource position
               if (!$collect && $key > $resourcePosition)
               {
                    $collect = true;
               }

               // only collect the first instance of any target
               if ($collect && !in_array($value, $results))
               {
                    $results []= $value;
               }

               // return when enough results, or list exhausted
               if (count($results) == $requestedCount || count($results) == $this->_targetCount)
               {
                    return $results;
               }
          }

          // loop to start - search values below the resourcePosition
          foreach ($this->_positionToTarget as $key => $value)
          {
               if (!in_array($value, $results))
               {
                    $results []= $value;
               }

               // return when enough results, or list exhausted
               if (count($results) == $requestedCount || count($results) == $this->_targetCount)
               {
                    return $results;
               }
          }

          // return results after iterating through both "parts"
          return $results;
     }

     public function __toString()
     {
          return sprintf(
               ‘%s{targets:[%s]}‘,
               get_class($this),
               implode(‘,‘, $this->getAllTargets())
          );
     }

     // ----------------------------------------
     // private methods

     /**
     * Sorts the internal mapping (positions to targets) by position
     */
     private function _sortPositionTargets()
     {
          // sort by key (position) if not already
          if (!$this->_positionToTargetSorted)
          {
               ksort($this->_positionToTarget, SORT_REGULAR);
               $this->_positionToTargetSorted = true;
          }
     }

}


/**
* Hashes given values into a sortable fixed size address space.
*
* @author Paul Annesley
* @package Flexihash
* @licence http://www.opensource.org/licenses/mit-license.php
*/
interface Flexihash_Hasher
{

     /**
     * Hashes the given string into a 32bit address space.
     *
     * Note that the output may be more than 32bits of raw data, for example
     * hexidecimal characters representing a 32bit value.
     *
     * The data must have 0xFFFFFFFF possible values, and be sortable by
     * PHP sort functions using SORT_REGULAR.
     *
     * @param string
     * @return mixed A sortable format with 0xFFFFFFFF possible values
     */
     public function hash($string);

}


/**
* Uses CRC32 to hash a value into a signed 32bit int address space.
* Under 32bit PHP this (safely) overflows into negatives ints.
*
* @author Paul Annesley
* @package Flexihash
* @licence http://www.opensource.org/licenses/mit-license.php
*/
class Flexihash_Crc32Hasher
     implements Flexihash_Hasher
{

     /* (non-phpdoc)
     * @see Flexihash_Hasher::hash()
     */
     public function hash($string)
     {
          return crc32($string);
     }

}


/**
* Uses CRC32 to hash a value into a 32bit binary string data address space.
*
* @author Paul Annesley
* @package Flexihash
* @licence http://www.opensource.org/licenses/mit-license.php
*/
class Flexihash_Md5Hasher
     implements Flexihash_Hasher
{

     /* (non-phpdoc)
     * @see Flexihash_Hasher::hash()
     */
     public function hash($string)
     {
          return substr(md5($string), 0, 8); // 8 hexits = 32bit

          // 4 bytes of binary md5 data could also be used, but
          // performance seems to be the same.
     }

}


/**
* An exception thrown by Flexihash.
*
* @author Paul Annesley
* @package Flexihash
* @licence http://www.opensource.org/licenses/mit-license.php
*/
class Flexihash_Exception extends Exception
{
}

如上,當加入機器時,首先根據復制節點的個數算出所有虛擬節點的hash值,放在一個數組中,數組的key值是hash值,value是機器ip(只是實際添加機器的ip,不是計算後的虛擬機器ip)

// hash the target into multiple positions
for ($i = 0; $i < $this->_replicas; $i++)
{
        $position = $this->_hasher->hash($target . $i);
        $this->_positionToTarget[$position] = $target; // lookup
        $this->_targetToPositions[$target] []= $position; // target removal
 }

當請求到來時,算出請求的hash值,在_positionToTarget數組中遍歷尋找,當找到第一個比請求hash值大的節點時,即是對應的Cahce機器節點。

public function lookupList($resource, $requestedCount)
     {
          if (!$requestedCount)
               throw new Flexihash_Exception(‘Invalid count requested‘);

          // handle no targets
          if (empty($this->_positionToTarget))
               return array();

          // optimize single target
          if ($this->_targetCount == 1)
               return array_unique(array_values($this->_positionToTarget));

          // hash resource to a position
          $resourcePosition = $this->_hasher->hash($resource);

          $results = array();
          $collect = false;

          $this->_sortPositionTargets();

          // search values above the resourcePosition
          foreach ($this->_positionToTarget as $key => $value)
          {
               // start collecting targets after passing resource position
               if (!$collect && $key > $resourcePosition)
               {
                    $collect = true;
               }

               // only collect the first instance of any target
               if ($collect && !in_array($value, $results))
               {
                    $results []= $value;
               }

               // return when enough results, or list exhausted
               if (count($results) == $requestedCount || count($results) == $this->_targetCount)
               {
                    return $results;
               }
          }

          // loop to start - search values below the resourcePosition
          foreach ($this->_positionToTarget as $key => $value)
          {
               if (!in_array($value, $results))
               {
                    $results []= $value;
               }

               // return when enough results, or list exhausted
               if (count($results) == $requestedCount || count($results) == $this->_targetCount)
               {
                    return $results;
               }
          }

          // return results after iterating through both "parts"
          return $results;
     }

為什麽會有2個foreach循環呢,因為當請求的hash值落到圓環的最後一個Cache節點到2^32-1當中時,比如下圖的B點:

技術分享圖片

此時,第一個foreach是不能滿足要求的,因為數組中的key值沒有比請求的hash值大的,所以最後需要判斷下,將_positionToTarget數組中的前$requestedCount個value加入到results當中。

一致性hash