1. 程式人生 > 實用技巧 >Redis Sentinel 原理簡單介紹

Redis Sentinel 原理簡單介紹

記錄下自己關於Redis Sentinel的理解~

不管什麼中介軟體,只要是單點部署就都會有單點故障的隱患,所以很容易想到的架構是:主從架構

Redis主從架構

Redis主從複製原理

主從複製分完全同步、部分同步兩種情況:

  1. 完全同步:當一個從節點連線到Master後,向master傳送一個SYNC命令(新版本PSYNC),master執行BGSAVE生成RDB檔案,同時開啟一個buffer記錄master上的寫操作,RDB檔案生成好後,傳送給Slave節點,slave儲存到本地磁碟,然後再載入到記憶體。然後master將buffer裡面到寫命令發給slave, 好像是通過redis協議???

  2. 部分同步:slave可以傳送PSYNC master_run_id offset 請求部分同步,master和slaves都會記錄同步的offset,如果slave請求同步的offset對應的資料在master上有,就同步給slave, 如果在master上沒有,就會執行一次完全同步。

從庫請求master進行一次完全同步:
master的日誌:

8233:M 01 Sep 2020 16:55:02.260 * Replica 127.0.0.1:6380 asks for synchronization
38233:M 01 Sep 2020 16:55:02.260 * Full resync requested by replica 127.0.0.1:6380


38233:M 01 Sep 2020 16:55:02.260 * Starting BGSAVE for SYNC with target: disk
38233:M 01 Sep 2020 16:55:02.261 * Background saving started by pid 38299
38299:C 01 Sep 2020 16:55:02.323 * DB saved on disk
38299:C 01 Sep 2020 16:55:02.323 * RDB: 4 MB of memory used by copy-on-write
38233:M 01 Sep 2020 16:55:02.378 * Background saving terminated with success
38233:M 01 Sep 2020 16:55:02.378 * Synchronization with replica 127.0.0.1:6380 succeeded

從庫:6380的日誌:

$ tail -f 6380.log
38295:S 01 Sep 2020 16:55:02.259 * Connecting to MASTER 127.0.0.1:6379
38295:S 01 Sep 2020 16:55:02.259 * MASTER <-> REPLICA sync started
38295:S 01 Sep 2020 16:55:02.259 * Non blocking connect for SYNC fired the event.
38295:S 01 Sep 2020 16:55:02.260 * Master replied to PING, replication can continue...
38295:S 01 Sep 2020 16:55:02.260 * Partial resynchronization not possible (no cached master)
38295:S 01 Sep 2020 16:55:02.262 * Full resync from master: 46ef90de89e6771b67bc2b43371da2f97a03b4d1:0
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: receiving 175 bytes from master
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Flushing old data
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Loading DB in memory
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Finished with success

哨兵架構

單點故障解決了,但是主從切換還得人工來搞,能不能做到自動切換呢,當然可以!

Master: 6379
Slaves:6380,6381
Sentinels:26379,26380,26381

哨兵原理

1. 哨兵之間的自動發現

  1. 每個sentinel節點每2秒都會向自己監控的master和slaves節點的 Pub/Sub channel: __sentinel__:hello傳送message
  2. 每個sentinel節點訂閱master和slave的channel:__sentinel__:hello 來自動發現其他的sentinel

sentinel釋出的message:__sentinel__:hello通道的內容:

127.0.0.1:6381> PSUBSCRIBE *

Reading messages... (press Ctrl-C to quit)

  1. "psubscribe"

  2. "*"

  3. (integer) 1

  4. "pmessage"

  5. "*"

  6. "__sentinel__:hello"

  7. "127.0.0.1,26380,fc976b271914f43a4a318dfe8c1f41a2e747f8d8,1,mymaster,127.0.0.1,6381,1"

  8. "pmessage"

  9. "*"

  10. "__sentinel__:hello"

  11. "127.0.0.1,26379,b60bd3e15db23a9862d213e7703001c72d48dc73,1,mymaster,127.0.0.1,6381,1"

哨兵節點之間的釋出訂閱事件內容,自動發現了其他的Sentinel:

$ src/redis-cli -p 26379

127.0.0.1:26379> PSUBSCRIBE *

Reading messages... (press Ctrl-C to quit)

  1. "psubscribe"

  2. "*"

  3. (integer) 1

  4. "pmessage"

  5. "*"

  6. "+sentinel"

  7. "sentinel fc976b271914f43a4a318dfe8c1f41a2e747f8d8 127.0.0.1 26380 @ mymaster 127.0.0.1 6379"

2. 如何發現其他的Slaves

通過Master節點知道有哪些Slaves,通過向Master傳送info命令來發現Master下的從。

3. 進行一次自動故障轉移

3.1. master 宕機

手動kill掉master節點的程序

3.2. sentinel發現master宕機

1 檢視sentinel的log日誌:

$ tail -f 26379.log
(手動關閉了master6379節點)
38502:X 01 Sep 2020 17:26:38.311 # +sdown master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:38.395 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2

2 檢視sentinel之間的Pub/Sub Channel:

  1. "+sdown"

  2. "master mymaster 127.0.0.1 6379"

  3. "pmessage"

  4. "*"

  5. "+odown"

  6. "master mymaster 127.0.0.1 6379 #quorum 2/2"

  7. "pmessage"

3.3. Sentinel Leader選舉

在三個sentinel中選出由哪個sentinel來做這次的主從自動切換,首先會sentinel投票
1 檢視sentinel的log日誌:

$ tail -f 26379.log
38502:X 01 Sep 2020 17:26:38.395 # +new-epoch 1
38502:X 01 Sep 2020 17:26:38.395 # +try-failover master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:38.396 # +vote-for-leader b60bd3e15db23a9862d213e7703001c72d48dc73 1 (給哨兵b60bd開啟投票)
38502:X 01 Sep 2020 17:26:38.397 # fc976b271914f43a4a318dfe8c1f41a2e747f8d8 voted for b60bd3e15db23a9862d213e7703001c72d48dc73 1 (fc976b給sentinel Id=b60bd投1票)
38502:X 01 Sep 2020 17:26:38.454 # +elected-leader master mymaster 127.0.0.1 6379

3.4. 選擇合適的slave作為新的master
  1. 檢視sentinle的log日誌:

$ tail -f 26379.log
38502:X 01 Sep 2020 17:26:38.454 # +failover-state-select-slave master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:38.545 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (選擇6381成為新的master)

38502:X 01 Sep 2020 17:26:38.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (6381成為新的master)

38502:X 01 Sep 2020 17:26:38.646 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:39.282 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:39.282 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:39.346 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:39.480 # -odown master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:40.186 # +failover-end master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:40.186 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381

38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381

38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

  1. Slave的選擇策略:
  1. 存活的slave
  2. 複製偏移量最大的
  3. Run Id 最小的
  1. 6381升級為master:

$ src/redis-cli -p 6381

127.0.0.1:6381> info Replication

# Replication

role:master

connected_slaves:1

slave0:ip=127.0.0.1,port=6380,state=online,offset=79554,lag=0

master_replid:d349582dc829f56d1da32e2d2f1434c6f2c44802

master_replid2:46ef90de89e6771b67bc2b43371da2f97a03b4d1

master_repl_offset:79554

second_repl_offset:65161

repl_backlog_active:1

repl_backlog_size:1048576

repl_backlog_first_byte_offset:631

repl_backlog_histlen:78924

127.0.0.1:6381>

3.5 上面涉及的完整的日誌:
  1. Sentinel間的Pub/Sub內容:

$ src/redis-cli -p 26379

127.0.0.1:26379> PSUBSCRIBE *

Reading messages... (press Ctrl-C to quit)

  1. "psubscribe"

  2. "*"

  3. (integer) 1

  4. "pmessage"

  5. "*"

  6. "+sdown"

  7. "master mymaster 127.0.0.1 6379"

  8. "pmessage"

  9. "*"

  10. "+odown"

  11. "master mymaster 127.0.0.1 6379 #quorum 2/2"

  12. "pmessage"

  13. "*"

  14. "+new-epoch"

  15. "1"

  16. "pmessage"

  17. "*"

  18. "+try-failover"

  19. "master mymaster 127.0.0.1 6379"

  20. "pmessage"

  21. "*"

  22. "+vote-for-leader"

  23. "b60bd3e15db23a9862d213e7703001c72d48dc73 1"

  24. "pmessage"

  25. "*"

  26. "+elected-leader"

  27. "master mymaster 127.0.0.1 6379"

  28. "pmessage"

  29. "*"

  30. "+failover-state-select-slave"

  31. "master mymaster 127.0.0.1 6379"

  32. "pmessage"

  33. "*"

  34. "+selected-slave"

  35. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"

  36. "pmessage"

  37. "*"

  38. "+failover-state-send-slaveof-noone"

  39. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"

  40. "pmessage"

  41. "*"

  42. "+failover-state-wait-promotion"

  43. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"

  44. "pmessage"

  45. "*"

  46. "-role-change"

  47. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 new reported role is master"

  48. "pmessage"

  49. "*"

  50. "+promoted-slave"

  51. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"

  52. "pmessage"

  53. "*"

  54. "+failover-state-reconf-slaves"

  55. "master mymaster 127.0.0.1 6379"

  56. "pmessage"

  57. "*"

  58. "+slave-reconf-sent"

  59. "slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"

  60. "pmessage"

  61. "*"

  62. "-odown"

  63. "master mymaster 127.0.0.1 6379"

  64. "pmessage"

  65. "*"

  66. "+slave-reconf-inprog"

  67. "slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"

  68. "pmessage"

  69. "*"

  70. "+slave-reconf-done"

  71. "slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"

  72. "pmessage"

  73. "*"

  74. "+failover-end"

  75. "master mymaster 127.0.0.1 6379"

  76. "pmessage"

  77. "*"

  78. "+switch-master"

  79. "mymaster 127.0.0.1 6379 127.0.0.1 6381"

  80. "pmessage"

  81. "*"

  82. "+slave"

  83. "slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381"

  1. 完整的Sentinel哨兵的log日誌:

$ tail -f 26379.log

38501:X 01 Sep 2020 17:15:48.851 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

38501:X 01 Sep 2020 17:15:48.851 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=38501, just started

38501:X 01 Sep 2020 17:15:48.851 # Configuration loaded

38502:X 01 Sep 2020 17:15:48.854 * Running mode=sentinel, port=26379.

38502:X 01 Sep 2020 17:15:48.855 # Sentinel ID is b60bd3e15db23a9862d213e7703001c72d48dc73

38502:X 01 Sep 2020 17:15:48.855 # +monitor master mymaster 127.0.0.1 6379 quorum 2

38502:X 01 Sep 2020 17:15:48.855 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (發現了slave)

38502:X 01 Sep 2020 17:17:59.296 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 (發現了slave)

38502:X 01 Sep 2020 17:20:31.119 * +sentinel sentinel fc976b271914f43a4a318dfe8c1f41a2e747f8d8 127.0.0.1 26380 @ mymaster 127.0.0.1 6379 (發現了另外一個sentinel)

38502:X 01 Sep 2020 17:22:42.715 # +sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:23:42.558 * +reboot slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:23:42.659 # -sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

.............
............

(手動關閉了master6379節點)
38502:X 01 Sep 2020 17:26:38.311 # +sdown master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:38.395 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2

38502:X 01 Sep 2020 17:26:38.395 # +new-epoch 1

38502:X 01 Sep 2020 17:26:38.395 # +try-failover master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:38.396 # +vote-for-leader b60bd3e15db23a9862d213e7703001c72d48dc73 1 (給哨兵b60bd開啟投票)

38502:X 01 Sep 2020 17:26:38.397 # fc976b271914f43a4a318dfe8c1f41a2e747f8d8 voted for b60bd3e15db23a9862d213e7703001c72d48dc73 1 (fc976b給sentinel Id=b60bd投1票)

38502:X 01 Sep 2020 17:26:38.454 # +elected-leader master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:38.454 # +failover-state-select-slave master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:38.545 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (選擇6381成為新的master)

38502:X 01 Sep 2020 17:26:38.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (6381成為新的master)

38502:X 01 Sep 2020 17:26:38.646 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:39.282 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:39.282 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:39.346 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:39.480 # -odown master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:40.186 # +failover-end master mymaster 127.0.0.1 6379

38502:X 01 Sep 2020 17:26:40.186 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381

38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381

38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

38502:X 01 Sep 2020 17:27:10.258 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381