Redis叢集分析（31）

阿新 • • 發佈：2020-12-28

1、頭領選舉

在（30）中分析了在頭領選舉時哨兵伺服器之際的互動方式，接著我們繼續分析頭領選舉時如何統計選票，確認頭領。

在（30）中提到了如下程式碼：

在這裡插入圖片描述

在（30）中解析了互動用的sentinelAskMasterStateToOtherSentinels方法，這裡繼續解析下面的sentinelFailoverStateMachine方法，其內容如下：

void sentinelFailoverStateMachine(sentinelRedisInstance *ri) {
    serverAssert(ri->flags & SRI_MASTER) 
;

    if (!(ri->flags & SRI_FAILOVER_IN_PROGRESS)) return;

    switch(ri->failover_state) {
        case SENTINEL_FAILOVER_STATE_WAIT_START:
            sentinelFailoverWaitStart(ri);
            break;
        case SENTINEL_FAILOVER_STATE_SELECT_SLAVE:
            sentinelFailoverSelectSlave 
(ri);
            break;
        case SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE:
            sentinelFailoverSendSlaveOfNoOne(ri);
            break;
        case SENTINEL_FAILOVER_STATE_WAIT_PROMOTION:
            sentinelFailoverWaitPromotion(ri);
            break;
        case SENTINEL_FAILOVER_STATE_RECONF_SLAVES: 

            sentinelFailoverReconfNextSlave(ri);
            break;
    }
}

這個方法很簡單，就是一個switch語句，其中switch的引數ri->failover_state在（30）中提到過。在具體分析這個引數前，先看其取值，即：
SENTINEL_FAILOVER_STATE_WAIT_START
SENTINEL_FAILOVER_STATE_SELECT_SLAVE
SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE
SENTINEL_FAILOVER_STATE_WAIT_PROMOTION
SENTINEL_FAILOVER_STATE_RECONF_SLAVES

他們都有相同的字首SENTINEL_FAILOVER_STATE。如果去掉這些字首，他們就變成了WAIT_START、SELECT_SLAVE、SEND_SLAVEOF_NOONE、WAIT_PROMOTION、RECONF_SLAVES。這裡我們首先看第一個:WAIT_START（第7行），即等待開始。這裡要需要先解釋故障轉移的具體操作流程，才能更好的理解WAIT_START的意義。

對於哨兵的故障轉移的起點是主客觀下線，主客觀下線的具體操作在之前解析過了。當哨兵判斷主伺服器客觀下線之後，就需要對主伺服器進行故障轉移，但是哨兵是一個叢集，不止一臺機器，而故障轉移只需要一臺機器進行執行便可。所以在開始真正的故障轉移前需要選擇一臺哨兵，而這個選擇就是通過頭領選舉來實現的。上面的WAIT_START的意義就是等待頭領選舉完成，然後開進行真正的故障轉移。

然後是第二個：SELECT_SLAVE（第10行），即選擇從伺服器。真正的故障轉移其實很簡單，就是從剩餘的還活著的從伺服器中選擇一臺作為新的主伺服器，然後對外提供伺服器。所以這裡的第二步是SELECT_SLAVE（選擇一個從伺服器）。

然後是第三個：SEND_SLAVEOF_NOONE（第13行），即傳送slaveof no one命令。這一步的意義在與將從伺服器變為主伺服器。在分析redis的主從模式的時候，講解了slaveof命令，當時提到了no one的意思是將從伺服器轉變為主伺服器。

然後是第四個：WAIT_PROMOTION（第16行），即等待轉變成功。

最後是第五個：RECONF_SLAVES（第19行），即重新配置從伺服器。有了新的主伺服器後，需要將其他的從伺服器設定為新的主伺服器的從伺服器。

然後我們再繼續分析ri->failover_state引數的取值問題。在（30）中我們分析了，在sentinelStartFailoverIfNeeded方法中，如果主伺服器是客觀下線的話，會執行一個sentinelStartFailover方法。這個方法會將ri->failover_state的值修改為SENTINEL_FAILOVER_STATE_WAIT_START。所以這裡我們首先看第7，8行failover_state為SENTINEL_FAILOVER_STATE_WAIT_START的情況。
這裡的處理也很簡單就是執行了一個sentinelFailoverWaitStart方法，其內容如下：

/* ---------------- Failover state machine implementation ------------------- */
void sentinelFailoverWaitStart(sentinelRedisInstance *ri) {
    char *leader;
    int isleader;

    /* Check if we are the leader for the failover epoch. */
    leader = sentinelGetLeader(ri, ri->failover_epoch);
    isleader = leader && strcasecmp(leader,sentinel.myid) == 0;
    sdsfree(leader);

    /* If I'm not the leader, and it is not a forced failover via
     * SENTINEL FAILOVER, then I can't continue with the failover. */
    if (!isleader && !(ri->flags & SRI_FORCE_FAILOVER)) {
        int election_timeout = SENTINEL_ELECTION_TIMEOUT;

        /* The election timeout is the MIN between SENTINEL_ELECTION_TIMEOUT
         * and the configured failover timeout. */
        if (election_timeout > ri->failover_timeout)
            election_timeout = ri->failover_timeout;
        /* Abort the failover if I'm not the leader after some time. */
        if (mstime() - ri->failover_start_time > election_timeout) {
            sentinelEvent(LL_WARNING,"-failover-abort-not-elected",ri,"%@");
            sentinelAbortFailover(ri);
        }
        return;
    }
    sentinelEvent(LL_WARNING,"+elected-leader",ri,"%@");
    if (sentinel.simfailure_flags & SENTINEL_SIMFAILURE_CRASH_AFTER_ELECTION)
        sentinelSimFailureCrash();
    ri->failover_state = SENTINEL_FAILOVER_STATE_SELECT_SLAVE;
    ri->failover_state_change_time = mstime();
    sentinelEvent(LL_WARNING,"+failover-state-select-slave",ri,"%@");
}

首先是第7行，這裡呼叫了一個sentinelGetLeader方法，這個方法會統計投票的結果，其程式碼如下：

/* Scan all the Sentinels attached to this master to check if there
 * is a leader for the specified epoch.
 *
 * To be a leader for a given epoch, we should have the majority of
 * the Sentinels we know (ever seen since the last SENTINEL RESET) that
 * reported the same instance as leader for the same epoch. */
char *sentinelGetLeader(sentinelRedisInstance *master, uint64_t epoch) {
    dict *counters;
    dictIterator *di;
    dictEntry *de;
    unsigned int voters = 0, voters_quorum;
    char *myvote;
    char *winner = NULL;
    uint64_t leader_epoch;
    uint64_t max_votes = 0;

    serverAssert(master->flags & (SRI_O_DOWN|SRI_FAILOVER_IN_PROGRESS));
    counters = dictCreate(&leaderVotesDictType,NULL);

    voters = dictSize(master->sentinels)+1; /* All the other sentinels and me.*/

    /* Count other sentinels votes */
    di = dictGetIterator(master->sentinels);
    while((de = dictNext(di)) != NULL) {
        sentinelRedisInstance *ri = dictGetVal(de);
        if (ri->leader != NULL && ri->leader_epoch == sentinel.current_epoch)
            sentinelLeaderIncr(counters,ri->leader);
    }
    dictReleaseIterator(di);

    /* Check what's the winner. For the winner to win, it needs two conditions:
     * 1) Absolute majority between voters (50% + 1).
     * 2) And anyway at least master->quorum votes. */
    di = dictGetIterator(counters);
    while((de = dictNext(di)) != NULL) {
        uint64_t votes = dictGetUnsignedIntegerVal(de);

        if (votes > max_votes) {
            max_votes = votes;
            winner = dictGetKey(de);
        }
    }
    dictReleaseIterator(di);

    /* Count this Sentinel vote:
     * if this Sentinel did not voted yet, either vote for the most
     * common voted sentinel, or for itself if no vote exists at all. */
    if (winner)
        myvote = sentinelVoteLeader(master,epoch,winner,&leader_epoch);
    else
        myvote = sentinelVoteLeader(master,epoch,sentinel.myid,&leader_epoch);

    if (myvote && leader_epoch == epoch) {
        uint64_t votes = sentinelLeaderIncr(counters,myvote);

        if (votes > max_votes) {
            max_votes = votes;
            winner = myvote;
        }
    }

    voters_quorum = voters/2+1;
    if (winner && (max_votes < voters_quorum || max_votes < master->quorum))
        winner = NULL;

    winner = winner ? sdsnew(winner) : NULL;
    sdsfree(myvote);
    dictRelease(counters);
    return winner;
}

這個方法會統計選票，確定選舉的結果。首先看第18行，這裡建立了一個名為counters的字典，字典的key為候選伺服器的runid，value為其票數。然後是第20行的voters，這個引數代表了投票的總數。然後是第22行到29行，這段程式碼在統計每臺伺服器的票數。這段程式碼其實也很簡單，首先是23行從引數master->sentinels（這個引數在解析哨兵如何發現其他哨兵伺服器的時候提到過，這個引數中儲存的是其發現的其他哨兵伺服器。）中取出所有的哨兵伺服器。然後是第24行使用一個while迴圈遍歷所有的伺服器，對於其中的每一個伺服器，首先檢查其epoch和leader是否符合條件（第26行），若符合條件則執行sentinelLeaderIncr方法（第27行），統計票數。

sentinelLeaderIncr方法的內容如下：

/* Helper function for sentinelGetLeader, increment the counter
 * relative to the specified runid. */
int sentinelLeaderIncr(dict *counters, char *runid) {
    dictEntry *existing, *de;
    uint64_t oldval;

    de = dictAddRaw(counters,runid,&existing);
    if (existing) {
        oldval = dictGetUnsignedIntegerVal(existing);
        dictSetUnsignedIntegerVal(existing,oldval+1);
        return oldval+1;
    } else {
        serverAssert(de != NULL);
        dictSetUnsignedIntegerVal(de,1);
        return 1;
    }
}

這個方法很簡單，如果傳入的runid在counters中已經存在，那麼在已經存在的資料上加一，若不存在則新建一個並將其值設定為1。

到這裡其統計票數的程式碼便結束了。為了更好的理解其統計方式我們需要簡單總結一下其投票的過程。同樣是從主客觀下線開始，當其判斷主伺服器客觀下線後，便會立刻呼叫sentinelAskMasterStateToOtherSentinels方法，這個方法我們之前解析過，他會向其他的哨兵投票命令，並註冊一個名叫sentinelReceiveIsMasterDownReply的方法來處理其返回結果。當其他的哨兵接收到這個投票命令後，若未投票則將票投給他，若已投票則向其返回其投票的伺服器的runid。傳送投票的哨兵在接收到其返回後，會將資料記錄在代表該伺服器的例項（ri）中，而這個例項是儲存在引數master->sentinels中的。

然後便是這裡的統計票數的程式碼，它只需要遍歷一下所有的伺服器將其投的票統計出來便可，統計的方法便是sentinelLeaderIncr方法。

然後是sentinelGetLeader方法的第34行到43行，這段程式碼很簡單就是遍歷一下統計結果，拿到票數最多的伺服器runid和其票數。winner為其runid，max_votes為其得票數。

然後是第48行到60行，這段程式碼主要是統計當前哨兵伺服器的票。其中49行和51行的sentinelVoteLeader方法，在之前分析過，它會根據epoch來判斷是否投過票，不會重複投票。

最後是63行的if語句，這裡會有兩個條件：1、得票數過半；2、得票數超過其設定的quorum（配置哨兵伺服器時候設定的滿足客觀下線的哨兵數）。如果不滿足這兩個條件，那麼這次選舉不成立winner會被設定null。

至此，統計投票結果的sentinelGetLeader方法便分析完了。接著我們繼續看呼叫這個方法的sentinelFailoverWaitStart方法。呼叫統計投票結果的程式碼在第7行，拿到leader後，第8行會比較leader和其自身的runid，判斷其自身是否是leader。如果不是leader，則執行第13行到26行的程式碼。這段程式碼的主要作用是退出故障轉移。若是leader，則執行第27行及之後的程式碼，繼續執行故障轉移。其中重點在第30行，會將引數 ri->failover_state 的值設定為 SENTINEL_FAILOVER_STATE_SELECT_SLAVE。

Redis叢集分析（31）

Redis叢集分析（31）

Redis叢集分析（1）

Redis叢集分析（2）

Redis叢集分析（29）

Redis叢集分析（30）

Redis叢集分析（33）

Redis原始碼分析（一）--Redis結構解析

twitter公司redis&memcached中介軟體twemproxy原始碼分析（一）

redis入門指南（六）—— 叢集

Redis 學習筆記（三）哨兵模式配置高可用和叢集

【MongoDB】MongoDB原理分析、叢集搭建（Docker）與簡單使用

演演算法基礎篇-關於棧的演演算法題分析（二）

Redis分散式鎖（三）：支援鎖可重入，避免鎖遞迴呼叫時死鎖

redis基本操作（二）

基於python實現微信好友資料分析（簡單）

Jmeter系列（31）- 獲取並使用 JDBC Request 返回的資料

多執行緒高併發程式設計(12) -- 阻塞演演算法實現ArrayBlockingQueue原始碼分析（1）

使用Redis Data Reveal（rdr）檢視Redis中key佔用記憶體空間

redis入門指南（二）—— 資料操作相關命令

圖資料分析（4）

Redis叢集分析（31）

相關推薦