Redis Sentinel 原理簡單介紹
記錄下自己關於Redis Sentinel的理解~
不管什麼中介軟體,只要是單點部署就都會有單點故障的隱患,所以很容易想到的架構是:主從架構
Redis主從架構
Redis主從複製原理
主從複製分完全同步、部分同步兩種情況:
-
完全同步:當一個從節點連線到Master後,向master傳送一個
SYNC
命令(新版本PSYNC
),master執行BGSAVE
生成RDB
檔案,同時開啟一個buffer記錄master上的寫操作,RDB檔案生成好後,傳送給Slave節點,slave儲存到本地磁碟,然後再載入到記憶體。然後master將buffer裡面到寫命令發給slave, 好像是通過redis協議??? -
部分同步:slave可以傳送
PSYNC master_run_id offset
請求部分同步,master和slaves都會記錄同步的offset,如果slave請求同步的offset對應的資料在master上有,就同步給slave, 如果在master上沒有,就會執行一次完全同步。
從庫請求master進行一次完全同步:
master的日誌:
8233:M 01 Sep 2020 16:55:02.260 * Replica 127.0.0.1:6380 asks for synchronization
38233:M 01 Sep 2020 16:55:02.260 * Full resync requested by replica 127.0.0.1:6380
38233:M 01 Sep 2020 16:55:02.260 * Starting BGSAVE for SYNC with target: disk
38233:M 01 Sep 2020 16:55:02.261 * Background saving started by pid 38299
38299:C 01 Sep 2020 16:55:02.323 * DB saved on disk
38299:C 01 Sep 2020 16:55:02.323 * RDB: 4 MB of memory used by copy-on-write
38233:M 01 Sep 2020 16:55:02.378 * Background saving terminated with success
38233:M 01 Sep 2020 16:55:02.378 * Synchronization with replica 127.0.0.1:6380 succeeded
從庫:6380的日誌:
$ tail -f 6380.log
38295:S 01 Sep 2020 16:55:02.259 * Connecting to MASTER 127.0.0.1:6379
38295:S 01 Sep 2020 16:55:02.259 * MASTER <-> REPLICA sync started
38295:S 01 Sep 2020 16:55:02.259 * Non blocking connect for SYNC fired the event.
38295:S 01 Sep 2020 16:55:02.260 * Master replied to PING, replication can continue...
38295:S 01 Sep 2020 16:55:02.260 * Partial resynchronization not possible (no cached master)
38295:S 01 Sep 2020 16:55:02.262 * Full resync from master: 46ef90de89e6771b67bc2b43371da2f97a03b4d1:0
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: receiving 175 bytes from master
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Flushing old data
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Loading DB in memory
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Finished with success
哨兵架構
單點故障解決了,但是主從切換還得人工來搞,能不能做到自動切換呢,當然可以!
Master: 6379
Slaves:6380,6381
Sentinels:26379,26380,26381
哨兵原理
1. 哨兵之間的自動發現
- 每個sentinel節點每2秒都會向自己監控的master和slaves節點的 Pub/Sub channel:
__sentinel__:hello
傳送message - 每個sentinel節點訂閱master和slave的channel:
__sentinel__:hello
來自動發現其他的sentinel
sentinel釋出的message:__sentinel__:hello
通道的內容:
127.0.0.1:6381> PSUBSCRIBE *
Reading messages... (press Ctrl-C to quit)
"psubscribe"
"*"
(integer) 1
"pmessage"
"*"
"__sentinel__:hello"
"127.0.0.1,26380,fc976b271914f43a4a318dfe8c1f41a2e747f8d8,1,mymaster,127.0.0.1,6381,1"
"pmessage"
"*"
"__sentinel__:hello"
"127.0.0.1,26379,b60bd3e15db23a9862d213e7703001c72d48dc73,1,mymaster,127.0.0.1,6381,1"
哨兵節點之間的釋出訂閱事件
內容,自動發現了其他的Sentinel:
$ src/redis-cli -p 26379
127.0.0.1:26379> PSUBSCRIBE *
Reading messages... (press Ctrl-C to quit)
"psubscribe"
"*"
(integer) 1
"pmessage"
"*"
"+sentinel"
"sentinel fc976b271914f43a4a318dfe8c1f41a2e747f8d8 127.0.0.1 26380 @ mymaster 127.0.0.1 6379"
2. 如何發現其他的Slaves
通過Master節點知道有哪些Slaves,通過向Master傳送info
命令來發現Master下的從。
3. 進行一次自動故障轉移
3.1. master 宕機
手動kill掉master節點的程序
3.2. sentinel發現master宕機
1 檢視sentinel的log日誌:
$ tail -f 26379.log
(手動關閉了master6379節點)
38502:X 01 Sep 2020 17:26:38.311 # +sdown master mymaster 127.0.0.1 637938502:X 01 Sep 2020 17:26:38.395 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
2 檢視sentinel之間的Pub/Sub Channel:
"+sdown"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+odown"
"master mymaster 127.0.0.1 6379 #quorum 2/2"
"pmessage"
3.3. Sentinel Leader選舉
在三個sentinel中選出由哪個sentinel來做這次的主從自動切換,首先會sentinel投票
1 檢視sentinel的log日誌:
$ tail -f 26379.log
38502:X 01 Sep 2020 17:26:38.395 # +new-epoch 1
38502:X 01 Sep 2020 17:26:38.395 # +try-failover master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:38.396 # +vote-for-leader b60bd3e15db23a9862d213e7703001c72d48dc73 1 (給哨兵b60bd開啟投票)
38502:X 01 Sep 2020 17:26:38.397 # fc976b271914f43a4a318dfe8c1f41a2e747f8d8 voted for b60bd3e15db23a9862d213e7703001c72d48dc73 1 (fc976b給sentinel Id=b60bd投1票)
38502:X 01 Sep 2020 17:26:38.454 # +elected-leader master mymaster 127.0.0.1 6379
3.4. 選擇合適的slave作為新的master
- 檢視sentinle的log日誌:
$ tail -f 26379.log
38502:X 01 Sep 2020 17:26:38.454 # +failover-state-select-slave master mymaster 127.0.0.1 637938502:X 01 Sep 2020 17:26:38.545 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (選擇6381成為新的master)
38502:X 01 Sep 2020 17:26:38.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (6381成為新的master)
38502:X 01 Sep 2020 17:26:38.646 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.282 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.282 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.346 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.480 # -odown master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.186 # +failover-end master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.186 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
- Slave的選擇策略:
- 存活的slave
- 複製偏移量最大的
- Run Id 最小的
- 6381升級為master:
$ src/redis-cli -p 6381
127.0.0.1:6381> info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=79554,lag=0
master_replid:d349582dc829f56d1da32e2d2f1434c6f2c44802
master_replid2:46ef90de89e6771b67bc2b43371da2f97a03b4d1
master_repl_offset:79554
second_repl_offset:65161
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:631
repl_backlog_histlen:78924
127.0.0.1:6381>
3.5 上面涉及的完整的日誌:
- Sentinel間的Pub/Sub內容:
$ src/redis-cli -p 26379
127.0.0.1:26379> PSUBSCRIBE *
Reading messages... (press Ctrl-C to quit)
"psubscribe"
"*"
(integer) 1
"pmessage"
"*"
"+sdown"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+odown"
"master mymaster 127.0.0.1 6379 #quorum 2/2"
"pmessage"
"*"
"+new-epoch"
"1"
"pmessage"
"*"
"+try-failover"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+vote-for-leader"
"b60bd3e15db23a9862d213e7703001c72d48dc73 1"
"pmessage"
"*"
"+elected-leader"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-state-select-slave"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+selected-slave"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-state-send-slaveof-noone"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-state-wait-promotion"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"-role-change"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 new reported role is master"
"pmessage"
"*"
"+promoted-slave"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-state-reconf-slaves"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+slave-reconf-sent"
"slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"-odown"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+slave-reconf-inprog"
"slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+slave-reconf-done"
"slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-end"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+switch-master"
"mymaster 127.0.0.1 6379 127.0.0.1 6381"
"pmessage"
"*"
"+slave"
"slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381"
- 完整的Sentinel哨兵的log日誌:
$ tail -f 26379.log
38501:X 01 Sep 2020 17:15:48.851 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
38501:X 01 Sep 2020 17:15:48.851 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=38501, just started
38501:X 01 Sep 2020 17:15:48.851 # Configuration loaded
38502:X 01 Sep 2020 17:15:48.854 * Running mode=sentinel, port=26379.
38502:X 01 Sep 2020 17:15:48.855 # Sentinel ID is b60bd3e15db23a9862d213e7703001c72d48dc73
38502:X 01 Sep 2020 17:15:48.855 # +monitor master mymaster 127.0.0.1 6379 quorum 2
38502:X 01 Sep 2020 17:15:48.855 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (發現了slave)
38502:X 01 Sep 2020 17:17:59.296 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 (發現了slave)
38502:X 01 Sep 2020 17:20:31.119 * +sentinel sentinel fc976b271914f43a4a318dfe8c1f41a2e747f8d8 127.0.0.1 26380 @ mymaster 127.0.0.1 6379 (發現了另外一個sentinel)
38502:X 01 Sep 2020 17:22:42.715 # +sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:23:42.558 * +reboot slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:23:42.659 # -sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
.............
............(手動關閉了master6379節點)
38502:X 01 Sep 2020 17:26:38.311 # +sdown master mymaster 127.0.0.1 637938502:X 01 Sep 2020 17:26:38.395 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
38502:X 01 Sep 2020 17:26:38.395 # +new-epoch 1
38502:X 01 Sep 2020 17:26:38.395 # +try-failover master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:38.396 # +vote-for-leader b60bd3e15db23a9862d213e7703001c72d48dc73 1 (給哨兵b60bd開啟投票)
38502:X 01 Sep 2020 17:26:38.397 # fc976b271914f43a4a318dfe8c1f41a2e747f8d8 voted for b60bd3e15db23a9862d213e7703001c72d48dc73 1 (fc976b給sentinel Id=b60bd投1票)
38502:X 01 Sep 2020 17:26:38.454 # +elected-leader master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:38.454 # +failover-state-select-slave master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:38.545 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (選擇6381成為新的master)
38502:X 01 Sep 2020 17:26:38.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (6381成為新的master)
38502:X 01 Sep 2020 17:26:38.646 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.282 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.282 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.346 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.480 # -odown master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.186 # +failover-end master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.186 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
38502:X 01 Sep 2020 17:27:10.258 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381