Redis原始碼分析--Sentinel(4)例項處理的Acting half
阿新 • • 發佈:2022-02-07
Acting half:
一、進入故障轉移狀態之前:
void sentinelHandleRedisInstance(sentinelRedisInstance *ri) { // ... // ... /* ============== ACTING HALF ============= */ /* We don't proceed with the acting half if we are in TILT mode. * TILT happens when we find something odd with the time, like a * sudden change in the clock. */ if (sentinel.tilt) { if (mstime()-sentinel.tilt_start_time < SENTINEL_TILT_PERIOD) return; /* 如果30秒內一切正常,退出Tilt模式 */ sentinel.tilt = 0; sentinelEvent(REDIS_WARNING,"-tilt",NULL,"#tilt mode exited"); } /* Every kind of instance */ sentinelCheckSubjectivelyDown(ri); /* Masters and slaves */ if (ri->flags & (SRI_MASTER|SRI_SLAVE)) { /* Nothing so far. */ } /* Only masters */ if (ri->flags & SRI_MASTER) { /* 由於訊息的收發都是非同步的,所以這裡不一定可以直接判斷出結果, * 所以本sentinel會在一次次定時器呼叫中判斷是否需要客觀下線 */ sentinelCheckObjectivelyDown(ri); /* 判斷是否進行故障轉移,如果進行故障轉移,master->failover_state的變化將觸發一次命令的傳送 * 即本機sentinel要求其他sentinel選舉本機為leader*/ if (sentinelStartFailoverIfNeeded(ri)) /* 這裡的呼叫是本機要求選舉自己做領頭Sentinel,這裡只會觸發一次 */ sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED); sentinelFailoverStateMachine(ri); /* 這裡傳送的訊息應該不固定,可能是詢問客觀下線狀態, * 也可能是要求選舉(如果初始選舉沒有產生leader, * 這一行會不斷觸發,直至產生leader) */ sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS); } }
- L16:檢測主觀下線狀態;
- L27:檢測客觀下線狀態,注意 L24: Only masters;
- L32、L37: 第二節分析 sentinelAskMasterStateToOtherSentinels;
- L30:第三節分析進入故障轉移階段的條件;
- L33:進入故障轉移階段,下一篇文章分析故障轉移狀態機;
二、傳送命令 sentinelAskMasterStateToOtherSentinels:
這個函式很特殊,所以專門用一節講它。首先先說明這個函式會發送命令:
SENTINEL IS-MASTER-DOWN-BY-ADDR <ip> <port> <current_epoch> <runid>
而根據runid引數不同,會執行兩個不同功能:
- 第一個功能:runid為'*'。本機sentinel認為當前master已主觀下線,需要詢問其他sentinels是否認為該master下線(可能主觀,也可能客觀,因為不同Sentinel判定標準不同 //todo分析);
- 第二個功能:runid為本機sentinel的runid。本機sentinel已經認定當前master客觀下線,要求其他sentinels選舉自己為領頭Sentinel;
void sentinelAskMasterStateToOtherSentinels(sentinelRedisInstance *master, int flags) { dictIterator *di; dictEntry *de; di = dictGetIterator(master->sentinels); while((de = dictNext(di)) != NULL) { sentinelRedisInstance *ri = dictGetVal(de); mstime_t elapsed = mstime() - ri->last_master_down_reply_time; char port[32]; int retval; /* If the master state from other sentinel is too old, we clear it. */ if (elapsed > SENTINEL_ASK_PERIOD*5) { ri->flags &= ~SRI_MASTER_DOWN; sdsfree(ri->leader); ri->leader = NULL; } /* Only ask if master is down to other sentinels if: * * 1) We believe it is down, or there is a failover in progress. * 2) Sentinel is connected. * 3) We did not received the info within SENTINEL_ASK_PERIOD ms. */ if ((master->flags & SRI_S_DOWN) == 0) continue; if (ri->flags & SRI_DISCONNECTED) continue; if (!(flags & SENTINEL_ASK_FORCED) && mstime() - ri->last_master_down_reply_time < SENTINEL_ASK_PERIOD) continue; /* Ask */ ll2string(port,sizeof(port),master->addr->port); retval = redisAsyncCommand(ri->cc, /**/ sentinelReceiveIsMasterDownReply, NULL, "SENTINEL is-master-down-by-addr %s %s %llu %s", master->addr->ip, port, sentinel.current_epoch, /* 根據master當前failover_state判斷這是sentinel在進行判斷是否客觀下線還是要選舉該sentinel為leader */ (master->failover_state > SENTINEL_FAILOVER_STATE_NONE) ? server.runid : "*"); if (retval == REDIS_OK) ri->pending_commands++; } dictReleaseIterator(di); }
-
看一下官方註釋
If we think the master is down, we start sending SENTINEL IS-MASTER-DOWN-BY-ADDR requests(1) to other sentinels in order to get the replies(2) that allow to reach the quorum needed to mark(3) the master in ODOWN state and trigger(4) a failover
-
(1):該函式第一個功能;
- (2):L34: 回撥函式sentinelReceiveIsMasterDownReply接收回復,根據回覆會設定ri->flags & SRI_MASTER_DOWN;
- (3):ri->flags & SRI_MASTER_DOWN會在sentinelCheckObjectivelyDown函式中使quorum++;
- (4):如果客觀下線要求達到,那麼第一節的sentinelStartFailoverIfNeeded會進入故障轉移階段,即trigger a failover;
-
L34:第三節分析傳送SENTINEL is-master-down-by-addr命令的回撥函式sentinelReceiveIsMasterDownReply;
三、進入故障轉移的條件:
int sentinelStartFailoverIfNeeded(sentinelRedisInstance *master) {
/* We can't failover if the master is not in O_DOWN state. */
if (!(master->flags & SRI_O_DOWN)) return 0;
/* Failover already in progress? */
if (master->flags & SRI_FAILOVER_IN_PROGRESS) return 0;
/* Last failover attempt started too little time ago? */
if (mstime() - master->failover_start_time <
master->failover_timeout*2) return 0;
/* 進行故障轉移 */
sentinelStartFailover(master);
return 1;
}
- L3:主伺服器必須已經被判定客觀下線才會進入故障轉移;
- L6:主伺服器不能已經在故障轉移狀態中;
- L12: 進行故障轉移,下面繼續分析原始碼:
void sentinelStartFailover(sentinelRedisInstance *master) {
redisAssert(master->flags & SRI_MASTER);
/* 設定FAILOVER_STATE_WAIT_START狀態,該狀態會在
* sentinelAskMasterStateToOtherSentinels 方法中
* 允許本sentinel要求其他Sentinel選舉自己為leader */
master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START;
master->flags |= SRI_FAILOVER_IN_PROGRESS;
/* 回合數++ */
master->failover_epoch = ++sentinel.current_epoch;
sentinelEvent(REDIS_WARNING,"+new-epoch",master,"%llu",
(unsigned long long) sentinel.current_epoch);
sentinelEvent(REDIS_WARNING,"+try-failover",master,"%@");
/* 維護time資訊 */
master->failover_start_time = mstime();
master->failover_state_change_time = mstime();
}
-
L6、L7:master的failover_state和flags都被更新,HandleInstance程式下一次進入故障轉移狀態機函式時,邏輯才進入故障轉移;
-
L9:回合數++,可能不是第一個回合;