Redis 原始碼分析（二）一個 rehash 也不阻塞的雜湊表

阿新 • • 發佈：2019-02-02

Redis 的架構設計挺巧妙的，捨棄了主流的多執行緒架構，別出心裁的使用單執行緒架構，說實話，作為一個 kv，我一開始認為多執行緒並行的訪問應該是一個預設選項，但是 Redis 的高效，用事實證明，這顯然不是。這個單執行緒的事件系統另開一坑再聊吧，今天主要是看一下這個有趣的雜湊表。

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int iterators; /* number of 
 iterators currently running */
} dict;

這就是 Redis 裡面存雜湊表的資料結構，真正的雜湊表是哪個 dictht，dictht[0] 是一個雜湊表，dictht[1] 是另一個雜湊表。這裡兩個雜湊表的設計主要是為了完成一個操作—— rehash，並且是不阻塞的 rehash。
雜湊表中最耗時的操作就是 rehash 了，作為一個單執行緒生物，Redis 不會另外開一個執行緒去搞這個事情，增刪改查還有 rehash 都在一個執行緒裡跑，那麼如何能讓 rehash 的過程不影響其他的操作呢？
我們來隨便找一個雜湊表的操作函式，就拿雜湊表的查詢函式來講吧

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    unsigned int h, idx, table;

    if (d->ht[0].size == 0) return NULL; /* We don't have a table at all */
    if (dictIsRehashing(d)) _dictRehashStep(d);// 注意
    h = dictHashKey(d, key);
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask 
;
        he = d->ht[table].table[idx];
        while(he) {
            if (dictCompareHashKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}

如果你看了我上一篇文章的話，這個函式應該已經見過了，同樣不需要看整個函式，只需要看我標註的地方就好了，就一行，意思呢，很明白，這個雜湊表是不是在 rehash 呀？如果是的話執行 _dictRehashStep 這個函式（開頭加了個 _ 這個符號，假裝私有函式。。）這個函式是什麼意思呢？

static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}

裡面那個 dictRehash 是執行 rehash 的地方，直接進來

int dictRehash(dict *d, int n) {
    if (!dictIsRehashing(d)) return 0;

    while(n--) {
        dictEntry *de, *nextde;

        /* Check if we already rehashed the whole table... */
        if (d->ht[0].used == 0) {
            _dictFree(d->ht[0].table);
            d->ht[0] = d->ht[1];
            _dictReset(&d->ht[1]);
            d->rehashidx = -1;
            return 0;
        }

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT 簡單說就是找到我們該搬的桶，搬空它，然後結束戰鬥，就只搬一個桶*/
        while(de) {
            unsigned int h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }
    return 1;
}

上文程式碼中的中文應該很引人注目（因為程式碼還是不如人話好懂啊~），這裡這個函式就是找到這個雜湊表中需要被搬運的第一個桶，然後把這個桶裡面的所有項一個個重新雜湊一下，搬到第二個雜湊表中，就是從 dictht 中的 ht[0] 搬運到 ht[1]，然後結束之後，指標交換一下就可以了呀。
既然瞭解了這個搬運工函式的作用，我們來看一下哪些部分呼叫了這個函式呢？
dictAdd
dictFind
dictGenericDelete
增刪改查（改是先刪再add）裡面都用到了呀，也就是在線上不停的增刪改查中不知不覺就 rehash 完了，一個 O(n) 的操作就這樣變成了均攤 O(1) 的，當然不會阻塞啦。
Redis 是一個線上服務，其資料結構也是根據這個特性來設計的，把一個大的操作均攤到每個細小的操作中來降低演算法複雜度，這種思想並不罕見，比如帶懶惰標記的線段樹，伸展樹，STL 中的 vector 也是均攤的來算複雜度，這種方法雖然有點耍賴皮，但是相當實用啊。
下一講來講 Redis 的事件系統吧，這個系統一方面使得 Redis 效率極高，另一方面也降低了很多的編碼複雜度，也是一個精妙的設計。

Redis 原始碼分析（二）一個 rehash 也不阻塞的雜湊表

Redis 原始碼分析（二）一個 rehash 也不阻塞的雜湊表

Redis原始碼分析（二十六）--- slowLog和hyperloglog

Flume NG原始碼分析（二）支援執行時動態修改配置的配置模組

GCC原始碼分析（二）——前端

Glide原始碼分析（二）——從用法來看之load&into方法

YOLOv2原始碼分析（二）

zigbee 之ZStack-2.5.1a原始碼分析（二）無線接收控制LED

兄弟連區塊鏈入門教程eth原始碼分析p2p-udp.go原始碼分析（二）

Spring原始碼分析（二）（IoC容器的實現）（1）

tornado原始碼分析（二）之iostream

Redis原始碼剖析（二）--簡單動態字串

Cat原始碼分析（二）：Server端

redis原始碼解析（二）動態字串sds基本功能函式

subsampling-scale-image-view載入長圖原始碼分析（二）

Spring component-scan原始碼分析（二） -- @Configuration註解處理

Spring原始碼分析（二）（IoC容器的實現）（3）

Spring原始碼分析（二）（IoC容器的實現）（2）

groupcache 原始碼分析（二）-- LRU

Spark2.3.2原始碼解析： 7. SparkContext原始碼分析（二）：TaskScheduler

jieba原始碼分析（二）

Redis 原始碼分析（二） 一個 rehash 也不阻塞的雜湊表

相關推薦

Redis 原始碼分析（二）一個 rehash 也不阻塞的雜湊表