淺談Redis中的Rehash機制

阿新 • • 發佈：2018-12-31

已經很久沒寫過純C的程式碼了，最近在學習redis，驚歎於它的強大優雅，同時也在閒暇之餘翻看它的原始碼，結構非常清晰，各個模組的功能也十分明確，非常適合閱讀與學習。

眾所周知，redis支援多種資料結構，其中dict是使用頻率相當高，也是非常實用的一種結構。在redis的具體實現中，使用了一種叫做漸進式雜湊(rehashing)的機制來提高dict的縮放效率，在看這一部分的原始碼的時候，真的是有實實在在被優雅到的。
其實關於漸進式雜湊的相關文章已經不少了，但是我還是決定自己寫一篇，一方面是重新梳理思路，另一方面可以加深一下印象。
在看rehash的函式主體之前，我們先來看一下dict

的資料結構是如何定義的：


/* 雜湊表節點 */
typedef struct dictEntry {
    // 鍵
    void *key;
    // 值
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
    } v;
    // 指向下個雜湊表節點，形成連結串列
    struct dictEntry *next;
} dictEntry;

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */ 

/* 雜湊表
 * 每個字典都使用兩個雜湊表，以實現漸進式 rehash 。
 */
typedef struct dictht {
    // 雜湊表陣列
    // 可以看作是：一個雜湊表陣列，陣列的每個項是entry連結串列的頭結點（鏈地址法解決雜湊衝突）
    dictEntry **table;
    // 雜湊表大小
    unsigned long size;
    // 雜湊表大小掩碼，用於計算索引值
    // 總是等於 size - 1
    unsigned long sizemask;
    // 該雜湊表已有節點的數量
    unsigned long used;
} dictht;
/* 字典 */ 

typedef struct dict {
    // 型別特定函式
    dictType *type;
    // 私有資料
    void *privdata;
    // 雜湊表
    dictht ht[2];
    // rehash 索引
    // 當 rehash 不在進行時，值為 -1
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    // 目前正在執行的安全迭代器的數量
    int iterators; /* number of iterators currently running */
} dict;

dict的結構大致如上，接下來分析一下其中最重要的幾個資料成員：

dictht::table：雜湊表內部的table結構使用了鏈地址法來解決雜湊衝突，剛開始看的時候我很奇怪，這怎麼是個二維陣列？這其實是一個指向陣列的指標，陣列中的每一項都是entry連結串列的頭結點。
dictht ht[2]：在dict的內部，維護了兩張雜湊表，作用等同於是一對滾動陣列，一張表是舊錶，一張表是新表，當hashtable的大小需要動態改變的時候，舊錶中的元素就往新開闢的新表中遷移，當下一次變動大小，當前的新表又變成了舊錶，以此達到資源的複用和效率的提升。
rehashidx：因為是漸進式的雜湊，資料的遷移並不是一步完成的，所以需要有一個索引來指示當前的rehash進度。當rehashidx為-1時，代表沒有雜湊操作。

接下來我們來看rehash的主體部分（直接取自github的最新版本）：

/* Performs N steps of incremental rehashing. Returns 1 if there are still
 * keys to move from the old to the new hash table, otherwise 0 is returned.
 *
 * Note that a rehashing step consists in moving a bucket (that may have more
 * than one key as we use chaining) from the old to the new hash table, however
 * since part of the hash table may be composed of empty spaces, it is not
 * guaranteed that this function will rehash even a single bucket, since it
 * will visit at max N*10 empty buckets in total, otherwise the amount of
 * work it does would be unbound and the function may block for a long time. */
int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            uint64_t h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    /* Check if we already rehashed the whole table... */
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;
        return 0;
    }

    /* More to rehash... */
    return 1;
}

瞭解一個函式功能最好的入口就是它的註釋。我們可以大致瞭解到：

rehash是以bucket(桶)為基本單位進行漸進式的資料遷移的，每步完成一個bucket的遷移，直至所有資料遷移完畢。一個bucket對應雜湊表陣列中的一條entry連結串列。新版本的dictRehash()還加入了一個最大訪問空桶數(empty_visits)的限制來進一步減小可能引起阻塞的時間。

接下來我們深扒一下這個函式的具體實現。

判斷dict是否正在rehashing，只有是，才能繼續往下進行，否則已經結束雜湊過程，直接返回。
接著是分n步進行的漸進式雜湊主體部分（n由函式引數傳入），在while的條件裡面加入對.used舊錶中剩餘元素數目的觀察，增加安全性。
一個runtime的斷言保證一下漸進式雜湊的索引沒有越界。
接下來一個小while是為了跳過空桶，同時更新剩餘可以訪問的空桶數，empty_visits這個變數的作用之前已經說過了。
現在我們來到了當前的bucket，在下一個while(de)中把其中的所有元素都遷移到ht[1]中，索引值是輔助了雜湊表的大小掩碼計算出來的，可以保證不會越界。同時更新了兩張表的當前元素數目。
每一步rehash結束，都要增加索引值，並且把舊錶中已經遷移完畢的bucket置為空指標。
最後判斷一下舊錶是否全部遷移完畢，若是，則回收空間，重置舊錶，重置漸進式雜湊的索引，否則用返回值告訴呼叫方，dict內仍然有資料未遷移。

漸進式雜湊的精髓在於：資料的遷移不是一次性完成的，而是可以通過dictRehash()這個函式分步規劃的，並且呼叫方可以及時知道是否需要繼續進行漸進式雜湊操作。如果dict資料結構中儲存了海量的資料，那麼一次性遷移勢必帶來redis效能的下降，別忘了redis是單執行緒模型，在實時性要求高的場景下這可能是致命的。而漸進式雜湊則將這種代價可控地分攤了，呼叫方可以在dict做插入，刪除，更新的時候執行dictRehash()，最小化資料遷移的代價。
在遷移的過程中，資料是在新表還是舊錶中並不是一個非常急迫的需求，遷移的過程並不會丟失資料，在舊錶中找不到再到新表中尋找就是了。

所以在後面的dict相關的函式裡，會大量的看到

if(dictIsRehashing(d))
   _dictRehashStep(d);  // 單步rehash

這樣的程式碼。

最後是從《Redis設計與實現》中copy來的圖解，可以幫助大家更形象地理解整個incremental rehash的過程：

總結一下

redis高效能的保障採取了各式各樣的措施，不乏很多優雅驚豔的工程技巧，非常值得我們學習。在閱讀原始碼的過程中，還給我留下深刻印象的一點就是：redis對於記憶體的管理到了精細的程度，也可能是我太久沒看pure c的專案了吧，收穫還是頗豐的。希望能和大家一起共同進步。

淺談Redis中的Rehash機制

總結一下

淺談Redis中的Rehash機制

淺談java中的反射機制

淺談java中的比較機制

淺談Redis事務機制

淺談Java中的鎖機制介紹

淺談Linux中的訊號處理機制（三）

淺談Linux中的訊號處理機制（二）

淺談Linux中的訊號處理機制（一）

淺談Android中的Handler機制

【原創】詳細案例解剖——淺談Redis快取的常用5中方式（String，Hash，List，set，SetSorted ）

淺談php的快取機制之redis

【Java學習筆記之三十二】淺談Java中throw與throws的用法及異常拋出處理機制剖析

淺談Java中的hashCode方法

A1—淺談JavaScript中的原型（二）

淺談AngularJS中的$parse和$eval

淺談AngularJS中的指令和指令間的相互通信

淺談spring中AOP以及spring中AOP的註解方式

淺談iOS中的RunLoop

淺談javascript中的call與apply方法

淺談遊戲中BOSS設計的思路

淺談Redis中的Rehash機制

總結一下

相關推薦