Redis叢集分析(33)
1、 故障轉移
在(32)中分析了故障轉移的SELECT_SLAVE狀態下的程式碼,並提到了在sentinelFailoverSelectSlave方法中會將failover_state的狀態修改為SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE。在這個狀態下,sentinelFailoverStateMachine方法執行的方法如下:
呼叫的sentinelFailoverSendSlaveOfNoOne方法,內容如下:
void sentinelFailoverSendSlaveOfNoOne(sentinelRedisInstance * ri) {
int retval;
/* We can't send the command to the promoted slave if it is now
* disconnected. Retry again and again with this state until the timeout
* is reached, then abort the failover. */
if (ri->promoted_slave->link->disconnected) {
if (mstime() - ri-> failover_state_change_time > ri->failover_timeout) {
sentinelEvent(LL_WARNING,"-failover-abort-slave-timeout",ri,"%@");
sentinelAbortFailover(ri);
}
return;
}
/* Send SLAVEOF NO ONE command to turn the slave into a master.
* We actually register a generic callback for this command as we don't
* really care about the reply. We check if it worked indirectly observing
* if INFO returns a different role (master instead of slave). */
retval = sentinelSendSlaveOf(ri->promoted_slave,NULL,0);
if (retval != C_OK) return;
sentinelEvent(LL_NOTICE, "+failover-state-wait-promotion",
ri->promoted_slave,"%@");
ri->failover_state = SENTINEL_FAILOVER_STATE_WAIT_PROMOTION;
ri->failover_state_change_time = mstime();
}
首先是第8行到第14行的if語句,這個語句是在檢查與該伺服器的連線是否正常。如果連線已經斷開,那麼就退出故障轉移。
然後是第20行,這裡呼叫了sentinelSendSlaveOf方法來向從伺服器傳送slaveof no one命令。然後是21行如果傳送出錯直接返回。最後是22行以後的程式碼,主要是一些賦值操作。其中重點是24行,這裡會將failover_state的值修改為SENTINEL_FAILOVER_STATE_WAIT_PROMOTION。
其中傳送命令的sentinelSendSlaveOf方法如下:
int sentinelSendSlaveOf(sentinelRedisInstance *ri, char *host, int port) {
char portstr[32];
int retval;
ll2string(portstr,sizeof(portstr),port);
/* If host is NULL we send SLAVEOF NO ONE that will turn the instance
* into a master. */
if (host == NULL) {
host = "NO";
memcpy(portstr,"ONE",4);
}
/* In order to send SLAVEOF in a safe way, we send a transaction performing
* the following tasks:
* 1) Reconfigure the instance according to the specified host/port params.
* 2) Rewrite the configuration.
* 3) Disconnect all clients (but this one sending the commnad) in order
* to trigger the ask-master-on-reconnection protocol for connected
* clients.
*
* Note that we don't check the replies returned by commands, since we
* will observe instead the effects in the next INFO output. */
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s",
sentinelInstanceMapCommand(ri,"MULTI"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s %s %s",
sentinelInstanceMapCommand(ri,"SLAVEOF"),
host, portstr);
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s REWRITE",
sentinelInstanceMapCommand(ri,"CONFIG"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
/* CLIENT KILL TYPE <type> is only supported starting from Redis 2.8.12,
* however sending it to an instance not understanding this command is not
* an issue because CLIENT is variadic command, so Redis will not
* recognized as a syntax error, and the transaction will not fail (but
* only the unsupported command will fail). */
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s KILL TYPE normal",
sentinelInstanceMapCommand(ri,"CLIENT"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
retval = redisAsyncCommand(ri->link->cc,
sentinelDiscardReplyCallback, ri, "%s",
sentinelInstanceMapCommand(ri,"EXEC"));
if (retval == C_ERR) return retval;
ri->link->pending_commands++;
return C_OK;
}
首先是第9行到第12行,這裡會判斷傳入的host,如果host為NULL,則其會將slaveof命令的引數設定為NO ONE。在上一個方法sentinelFailoverSendSlaveOfNoOne中的第20行呼叫這個方法的地方,我們可以發現其傳入的host正是NULL。
然後是第24行到第58行,這段程式碼相似度很高,作用是向從伺服器傳送5個命令,分別是MULTI、SLAVEOF、CLIENT、EXEC。這五個命令都是通過redisAsyncCommand方法來發送的,其註冊的對返回值的處理方法都是sentinelDiscardReplyCallback。這個方法的內容如下:
/* Just discard the reply. We use this when we are not monitoring the return
* value of the command but its effects directly. */
void sentinelDiscardReplyCallback(redisAsyncContext *c, void *reply, void *privdata) {
instanceLink *link = c->data;
UNUSED(reply);
UNUSED(privdata);
if (link) link->pending_commands--;
}
這裡可以看見這個方法實際是沒有對返回值進行任何處理。從sentinelSendSlaveOf方法的註釋中可以瞭解到這些命令的執行結果是從哨兵對其定期傳送的info命令中確認。
然後是其傳送的五個命令。首先是MULTI命令,這個命令標記一個事務塊的開始。
事務塊內的多條命令會按照先後順序被放進一個隊列當中,最後由 EXEC 命令原子性(atomic)地執行。然後是SLAVEOF命令,之前的文件分析了他的作用。接著是CONFIG命令,這裡傳送的是CONFIG REWRITE,作用是重寫配置檔案。然後是CLIENT命令,這裡傳送的是CLIENT KILL TYPE normal,其作用是斷開客戶端的連線。最後是EXEC命令,執行事務塊中的命令。
至此,SEND_SLAVEOF_NOONE狀態下的操作便解析完成了。接著繼續看SENTINEL_FAILOVER_STATE_WAIT_PROMOTION狀態,這個狀態執行的方法如下:
其呼叫的sentinelFailoverWaitPromotion方法如下:
void sentinelFailoverWaitPromotion(sentinelRedisInstance *ri) {
/* Just handle the timeout. Switching to the next state is handled
* by the function parsing the INFO command of the promoted slave. */
if (mstime() - ri->failover_state_change_time > ri->failover_timeout) {
sentinelEvent(LL_WARNING,"-failover-abort-slave-timeout",ri,"%@");
sentinelAbortFailover(ri);
}
}
這段程式碼只有一個if語句,用於判斷是否執行超時,若超時則退出故障轉移。