redis原始碼學習之dict

阿新 • • 發佈：2020-12-18

參考《Redis 設計與實現》 (基於redis3.0.0) 作者:黃健巨集
學習redis3.2.13

介紹
字典的結構
鍵值對節點 dictEntry
雜湊表結構 dictht
字典結構 dict
hash與rehash
鍵衝突的原因與處理
rehash
rehash過程概覽
擴容、縮容
為什麼首先檢查擴容、縮容條件
容量計算
何時擴容、縮容
rehash前奏:準備ht[1]
rehashing
漸進式rehash執行期間的雜湊表操作
迭代器
迭代器的結構
迭代器的獲取與釋放
迭代器的遊走方式
使用迭代器遍歷
總結

介紹

字典是一種儲存鍵值對的抽象資料結構。其鍵與值相互關聯，在字典中，通過鍵可以找到相應的值。
字典的實現方式是多種多樣，可以是陣列、也可以是雜湊表、或者也可以是樹。C++中的有序字典map與無序字典unordered_map就分別使用了紅黑樹與雜湊表來實現。
redis實現的字典使用的是雜湊表的方式。

字典的結構

鍵值對節點 dictEntry

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    //處理鍵衝突時使用，指向下個節點的地址
    struct dictEntry *next;
} dictEntry;

為了支援儲存多種型別的value，同時節省空間，redis選用了聯合來表示值。
redis在處理鍵的衝突時，採用了鏈地址法，使用一個指標來記錄的下個衝突節點的位置。

雜湊表結構 dictht

typedef struct dictht {
    //存放dictEntry *的陣列(裡面的元素也叫bucket)
    dictEntry **table;
    //陣列的大小，必須是2的N次方(初始情況為0)
    unsigned long size;
    //用於陣列下標計算的掩碼，總是等於size - 1
    unsigned long sizemask;
    //當前dictht 中已有節點的總和
    unsigned long used;
} dictht;

值得注意的是size和used沒有任何關係，size是陣列的大小，而used是dictht中已有的鍵值對節點數量，包括陣列中使用的節點，以及發生衝突後由連結串列連線起來的節點

這裡size必須是2的N次方，且sizemask等於size - 1，是為了配合hash計算陣列的下標，redis通過位操作來提高了效能

//下標的計算方式
index = hash % size
// 當size為2的N次方時，
hash % size == hash & (size - 1) <====> hash & (sizemask)

字典結構 dict

typedef struct dictType {
    //雜湊函式的函式指標
    unsigned int (*hashFunction)(const void *key);
    //鍵、值深拷貝的函式指標
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    //鍵比較的函式指標
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    //鍵摧毀的函式指標
    void (*keyDestructor)(void *privdata, void *key);
    //值摧毀的函式指標
    void (*valDestructor)(void *privdata, void *obj);
} dictType;
...
typedef struct dict {
    //自定義鍵值對操作的結構
    dictType *type;
    //私有資料，建立字典時傳入，可配合結構dictType中的函式使用
    void *privdata;
    //兩個雜湊表，正常使用表0，rehash時才會用到表1
    dictht ht[2];
    //漸進式rehash所需，表示rehash進行到dictEntry *陣列的哪個位置，-1表示沒有在rehash(或是rehash完成)
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    //dict當前存在的迭代器的個數
    int iterators; /* number of iterators currently running */
} dict;

未發生rehash、鍵衝突時，字典的示例(還是取書中的例子):

hash與rehash

鍵衝突的原因與處理

hash演算法具有衝突必然性：導致即使兩個鍵是不同的，得出的雜湊值也可能是一樣的。雜湊衝突必然性見鴿巢原理。
已有節點多於dictEntry *大小：即使計算出的hash值不一樣，對陣列長度取餘後得到的下標就會重複，必然會有鍵衝突。

redis處理衝突的方法是鏈地址法。即是使用一個連結串列來儲存該鍵所有衝突的節點。為了能快速存入鍵值對，redisi直接將新的鍵值對插入連結串列頭部。
但是，隨著節點的增多，連結串列會越來越長，嚴重影響字典效能。需要一定的方法去處理這個問題。

rehash

rehash過程概覽

rehas之前會根據已有的節點個數和dictEntry *陣列的大小綜合判斷需要擴容還是縮容:

當向字典新增節點時，會判斷是否符合擴容條件
redis後臺定時判斷是否符合縮容條件

滿足條件則分配一個足夠的空間給ht[1],此時字典同時擁有兩個dictEntry *陣列
將字典中的rehashidx置0，表明開始rehash，將要遷移ht[0]陣列中0位置的元素
4.重算ht[0]中陣列0位置元素裡的全部節點在ht[1]的下標，並根據下標將節點放入ht[1]，並更新rehashidx以便下次rehash
遷移完ht[0]中的全部節點，釋放ht[0]中的陣列，並用ht[1]替換ht[0]，最後重置ht[1]，並將rehashidx設為-1表示rehash結束

redis在遷移dictEntry *陣列時，並不是一次全部遷移完成的。而是一部分一部分遷移：

查詢、新增、更新、刪除內部進行的是一步遷移，一次只遷移dictEntry *陣列中一個元素/bucket(會遷移完其上的連結串列裡的全部節點)

static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);    //當不存在安全迭代器時才觸發一步遷移
}

redis後臺定時通過函式dictRehashMilliseconds遷移，這種遷移方式裡，遷移1毫秒，並在1毫秒內一次試圖遷移100個元素/bucket(會遷移完其上的連結串列裡的全部節點)

//dict.c
int dictRehashMilliseconds(dict *d, int ms) {
    long long start = timeInMilliseconds();
    int rehashes = 0;

    while(dictRehash(d,100)) {
        rehashes += 100;
        if (timeInMilliseconds()-start > ms) break;
    }
    return rehashes;
}
//server.c
int incrementallyRehash(int dbid) {
...
dictRehashMilliseconds(server.db[dbid].dict,1);
dictRehashMilliseconds(server.db[dbid].expires,1);
...    
}

擴容、縮容

為什麼首先檢查擴容、縮容條件

擴容檢查
前面說到了redis字典有鍵衝突，字典中節點越多，重複的概率越大，連結串列也就可能越長。
需要一個更大的陣列，使得對現有節點重新計算hash並取餘後，能儘量落到陣列的空槽裡，使時間複雜度從O(N)變為最初的O(1)。
縮容檢查
由於節點數量一直在隨著程式執行進行著動態增減，只有擴容沒有縮容的話，勢必會造成不必要的記憶體浪費。所以，需要對字典的空間進行縮調。

容量計算

為了使用位操作取餘，容量(除數)的為2的N次方

static unsigned long _dictNextPower(unsigned long size)
{
    unsigned long i = DICT_HT_INITIAL_SIZE;    //#define DICT_HT_INITIAL_SIZE     4

    if (size >= LONG_MAX) return LONG_MAX + 1LU;
    while(1) {
        if (i >= size)
            return i;
        i *= 2;
    }
}

何時擴容、縮容

擴容條件

新增節點時，dictEntry *陣列為空(字典剛建立)
ht[0]已有節點數大於陣列大小，同時，開啟了允許resize標誌或已有節點數與陣列大小之商大於5，即負載因子大於5

static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    //dict處於漸進式rehash狀態不用擴容，是因為進行漸進式rehash的前置條件是擴容完成
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||                                                            //static int dict_can_resize = 1;
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))    //static unsigned int dict_force_resize_ratio = 5;
    {
        return dictExpand(d, d->ht[0].used*2);    //指定已有節點的2倍擴容，ht[0].used*2的2的N次方擴容
    }
    return DICT_OK;
}

resize標誌由server.c中updateDictResizePolicy控制

/* This function is called once a background process of some kind terminates,
 * as we want to avoid resizing the hash tables when there is a child in order
 * to play well with copy-on-write (otherwise when a resize happens lots of
 * memory pages are copied). The goal of this function is to update the ability
 * for dict.c to resize the hash tables accordingly to the fact we have o not
 * running childs. */
void updateDictResizePolicy(void) {
    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1)
        dictEnableResize();
    else
        dictDisableResize();
}

結合註釋與程式碼可以知道，由於redis想利用好寫時複製，所以，當後臺程序開始生成/重寫RDB/AOF檔案或結束生成/重寫RDB/AOF檔案，會呼叫此函式來關閉/開啟字典的擴容。
總結下來，也可以這樣理解：
伺服器目前沒有在執行 BGSAVE 命令或者 BGREWRITEAOF 命令，並且雜湊表的負載因子大於等於 1
伺服器目前正在執行 BGSAVE 命令或者 BGREWRITEAOF 命令，並且雜湊表的負載因子大於等於 5
縮容條件
已有節點數與陣列大小之商大於10%，即負載因子小於0.1時發生縮容

int htNeedsResize(dict *dict) {
    long long size, used;

    size = dictSlots(dict);
    used = dictSize(dict);
    return (size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < HASHTABLE_MIN_FILL));    //#define HASHTABLE_MIN_FILL        10      /* Minimal hash table fill 10% */
}

縮容判斷函式由redis定期呼叫

rehash前奏:準備ht[1]

ht[1]中陣列空間的準備、以及rehash開啟的標誌通過dictExpand來處理

#define dictIsRehashing(d) ((d)->rehashidx != -1)
...
int dictExpand(dict *d, unsigned long size)
{
    dictht n; /* the new hash table */
    unsigned long realsize = _dictNextPower(size);    //從4開始找大於等於size的最小2的N次方做為新大小

    /* the size is invalid if it is smaller than the number of
     * elements already inside the hash table */
    //字典正在rehash時，ht[0]與ht[1]都有存在節點的可能，後面的賦值操作可能導致節點丟失，不允許擴容
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    /* Rehashing to the same table size is not useful. */
    if (realsize == d->ht[0].size) return DICT_ERR;

    /* Allocate the new hash table and initialize all pointers to NULL */
    n.size = realsize;
    n.sizemask = realsize-1;
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = 0;

    /* Is this the first initialization? If so it's not really a rehashing
     * we just set the first hash table so that it can accept keys. */
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    /* Prepare a second hash table for incremental rehashing */
    d->ht[1] = n;
    d->rehashidx = 0;    //ht[1]已準備好，可以從ht[0]的d->rehashidx處的bucket移動到ht[1]
    return DICT_OK;
}

如果是新建立的字典，dictEntry *陣列是不會有任何容量的，擴容函式也是根據該陣列是否為空，來確定是處理新字典還是準備rehash

rehashing

字典的rehash操作由dictRehash實現，此函式執行N步漸進式rehash，N決定了函式一次處理幾個bucket(dictEntry *陣列中的元素)

int dictRehash(dict *d, int n) {
    //一次rehash只會訪問最多10個空桶便會返回，empty_visits用於記錄已訪問空桶個數
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    //沒有準備準備好ht[1]不能rehash
    if (!dictIsRehashing(d)) return 0;
    //n步rehash
    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;    //一次最多訪問10個空桶
        }
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            unsigned int h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            //將節點放入新雜湊表陣列table的h位置(如果形成了連結串列為頭插)
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;    //下個將要rehash的位置
    }

    /* Check if we already rehashed the whole table... */
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;    //rehash完成
        return 0;
    }

    /* More to rehash... */
    return 1;
}

漸進式rehash執行期間的雜湊表操作

因為在進行漸進式 rehash 的過程中，字典會同時使用 ht[0] 和 ht[1] 兩個雜湊表，所以在漸進式 rehash 進行期間，字典的刪除（delete）、查詢（find）、更新（update）等操作會在兩個雜湊表上進行：比如說，要在字典裡面查詢一個鍵的話，程式會先在 ht[0] 裡面進行查詢，如果沒找到的話，就會繼續到 ht[1] 裡面進行查詢，諸如此類。
另外，在漸進式 rehash 執行期間，新新增到字典的鍵值對一律會被儲存到 ht[1] 裡面，而 ht[0] 則不再進行任何新增操作：這一措施保證了 ht[0] 包含的鍵值對數量會只減不增，並隨著 rehash 操作的執行而最終變成空表。

迭代器

dict中的迭代器有兩種，一種是安全迭代器，另一種是非安全迭代器

安全迭代器存在時，dict的查詢、新增、更新、刪除操作不會進行rehash，這避免了rehash造成的迭代順序混亂。安全迭代器存在時，可以對dict進行增加、更新、查詢、刪除操作
非安全迭代器存在時，只能對字典進行迭代，如果對字典進行了修改，會導迭代器的指紋前後不一致而觸發斷言

迭代器的結構

typedef struct dictIterator {
    //被迭代的字典
    dict *d;                   
    //bucket的位置(dictEntry *陣列的下標)    
    long index;             
    //table ht的下標(建立迭代器前可能已經處於rehash狀態，所以兩個ht都需要遍歷)   
    //safe表明當前迭代器的種類(安全或非安全)
    int table, safe;
    //當前迭代的節點 與 將迭代的節點
    //在迭代器遊走函式dictNext中，當前的節點entry會被返回給使用者，並可能被使用者刪除，保留nextEntry避免指標丟失
    dictEntry *entry, *nextEntry;    
    /* unsafe iterator fingerprint for misuse detection. */
    long long fingerprint;    //非安全迭代器使用的，用於驗證的指紋
} dictIterator;

迭代器的獲取與釋放

獲取

dictIterator *dictGetIterator(dict *d)
{
    dictIterator *iter = zmalloc(sizeof(*iter));

    iter->d = d;
    iter->table = 0;
    iter->index = -1;
    iter->safe = 0;
    iter->entry = NULL;
    iter->nextEntry = NULL;
    return iter;
}

dictIterator *dictGetSafeIterator(dict *d) {
    dictIterator *i = dictGetIterator(d);

    i->safe = 1;
    return i;
}

釋放

void dictReleaseIterator(dictIterator *iter)
{
    if (!(iter->index == -1 && iter->table == 0)) {
        if (iter->safe)
            iter->d->iterators--;
        else
            assert(iter->fingerprint == dictFingerprint(iter->d));    //迭代完成後，釋放迭代器時校驗指紋
    }
    zfree(iter);

迭代器的遊走方式

dictEntry *dictNext(dictIterator *iter)
{
    while (1) {
        //iter是一個全新迭代器或已迭代完bucket中的一個連結串列
        if (iter->entry == NULL) {
            //指向正在迭代的ht
            dictht *ht = &iter->d->ht[iter->table];
            //iter是個全新迭代器，是安全迭代器，增加被迭代dict上迭代器數量，否則字典計算指紋
            if (iter->index == -1 && iter->table == 0) {
                if (iter->safe)
                    iter->d->iterators++;
                else
                    iter->fingerprint = dictFingerprint(iter->d);
            }
            //當前bucket中的連結串列迭代完了，應該迭代下一個bucket，所以增加index，指向下一個bucket
            //若是新迭代器，則應該迭代首個bucket了，也需增加index，使其指向首個bucket
            iter->index++;
            //即將被迭代的bucket的下標大於當前ht的下標，分情況討論
            if (iter->index >= (long) ht->size) {
                //1、正在rehash時會有兩個ht，當前迭代完的是ht[0]，需要再迭代下ht[1]
                if (dictIsRehashing(iter->d) && iter->table == 0) {
                    iter->table++;
                    iter->index = 0;
                    ht = &iter->d->ht[1];
                } else {
                    //2、不在rehash狀態，已經完成了迭代
                    break;    
                }
            }
            //迭代到下個bucket(或首個bucket)內的連結串列頭部
            iter->entry = ht->table[iter->index];
        } else {
            //在某個bucket中的連結串列內迭代
            iter->entry = iter->nextEntry;
        }
        //沒有到連結串列位部，先記錄下個節點的位置，再返回迭代到的節點，因為返回的迭代器可能被使用者刪除
        if (iter->entry) {
            /* We need to save the 'next' here, the iterator user
             * may delete the entry we are returning. */
            iter->nextEntry = iter->entry->next;
            return iter->entry;
        }
    }
    return NULL;
}

使用迭代器遍歷

while((de = dictNext(di)) != NULL) {
    //   doSomethingWith(de);    
}

總結

由於受當前雜湊表空間的限制，節點數量增加到多於雜湊表空間時，必定會發生鍵衝突，鏈地址法可以解決鍵衝突
鏈地址法雖然可以解決鍵衝突，同時也增加了時間複雜度，需要通過rehash來處理這個問題
rehash也可以避免空間的浪費
漸進式rehash可以避免一次性遷移太多節點而造成的的阻塞
定時rehash可以避免字典同時擁有兩個雜湊表太久造成的效能損失
合適的時候使用位運算會得到更好的效能
通過linux系統的COW機制來在兩個程序間共享記憶體時，避免修改太多記憶體，可減少記憶體複製量，從而更好的使用COW

來自為知筆記(Wiz)