Redis Sentinel叢集雙機房容災實施步驟
概要目標
防止雙機房情況下任一個機房完全無法提供服務時如何讓Redis繼續提供服務。
架構設計
A、B兩機房,其中A機房有一Master一Slave和兩個Sentinel,B機房只有2個Sentinel,如下圖。
初始規劃
A機房
192.168.71.213 S+哨兵
192.168.71.214 M+哨兵
B機房
192.168.70.214 S
192.168.70.215 S
目錄建立
--redis軟體目錄
mkdir -p /home/redis
--pidfile檔案存放目錄
mkdir -p /home/redis/redisrun/
解壓redis截止到 /home/redis
叢集配置
【Master】
選擇71.214作為Master
[
#後臺啟動
daemonize yes
pidfile "/home/redis/redisrun/redis_6379.pid"
port 6379
timeout 0
tcp-keepalive 0
loglevel notice
logfile "/home/redis/redis.log"
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
dir "/home/redis/redisdb"
#如果做故障切換,不論主從節點都要填寫密碼且要保持一致
masterauth "123456"
slave-serve-stale-data yes
slave-read-only yes
repl-disable-tcp-nodelay no
slave-priority 98
#當前redis密碼
requirepass "123456"
appendonly yes
# appendfsync always
appendfsync everysec
# appendfsync no
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
# Generated by CONFIG REWRITE
【Slave】
選擇其餘3個幾點作為Slave
[[email protected] redis]# vi /home/redis/redis.conf
daemonize yes
pidfile "/home/redis/redisrun/redis_6379.pid"
port 6379
timeout 0
tcp-keepalive 0
loglevel notice
logfile "/home/redis/redis.log"
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
dir "/home/redis/redisdb"
#主節點密碼
masterauth "123456"
slave-serve-stale-data yes
slave-read-only yes
repl-disable-tcp-nodelay no
slave-priority 98
requirepass "123456"
appendonly yes
# appendfsync always
appendfsync everysec
# appendfsync no
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
# Generated by CONFIG REWRITE
#配置主節點資訊
slaveof 192.168.71.214 6379
--檢查修正
daemonize yes
pidfile "/home/redis/redisrun//redis_6379.pid"
logfile "/home/redis/redis.log"
【sentinel.conf】
選擇A機房2節點作為sentinel
vi /home/redis/sentinel.conf
port 26379
#1表示在sentinel叢集中只要有兩個節點檢測到redis主節點出故障就進行切換,單sentinel節點無效(自己測試發現的)
#如果3s內mymaster無響應,則認為mymaster宕機了
#如果10秒後,mysater仍沒活過來,則啟動failover
sentinel monitor mymaster 192.168.71.214 6379 1
sentinel down-after-milliseconds mymaster 3000
sentinel failover-timeout mymaster 10000
daemonize yes
#指定工作目錄
dir "/home/redis/sentinel-work"
protected-mode no
logfile "/home/redis/sentinellog/sentinel.log"
#redis主節點密碼
sentinel auth-pass mymaster 123456
# Generated by CONFIG REWRITE
--檢查修正
sentinel monitor mymaster 192.168.71.214 6379 1
dir "/home/redis/sentinel-work"
logfile "/home/redis/sentinellog/sentinel.log"
啟動檢查
【啟動叢集與日誌監控】
每個幾點都執行
cd /home/redis/src/
./redis-server /home/redis/redis.conf
tail -f /home/redis/redis.log
只在sentinel節點執行
cd /home/redis/src/
./redis-sentinel /home/redis/sentinel.conf
tail -f /home/redis/sentinellog/sentinel.log
【Master檢查】
cd /home/redis/src/
[[email protected] src]# ./redis-cli -h 192.168.70.214 -p 6379 -a 123456
192.168.70.214:6379> info Replication
# Replication
role:master
connected_slaves:3
slave0:ip=192.168.71.213,port=6379,state=online,offset=1107595,lag=1
slave1:ip=192.168.70.214,port=6379,state=online,offset=1107742,lag=0
slave2:ip=192.168.70.215,port=6379,state=online,offset=1107889,lag=0
master_repl_offset:1107889
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:59314
repl_backlog_histlen:1048576
192.168.70.214:6379> set test zgy
OK
192.168.70.214:6379> get test
"zgy"
192.168.70.214:6379>
【Slave檢查,只讀】
192.168.71.214:6379> get test
"zgy"
192.168.71.214:6379> set test zgy2
(error) READONLY You can't write against a read only slave.
192.168.71.214:6379> info Replication
# Replication
role:slave
master_host:192.168.70.214
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:42385
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
192.168.71.214:6379>
斷網斷電測試
斷網
通過開啟防火牆來模擬
service iptables status
--service iptables start
--70網段2節點的防火牆配置
[[email protected] redis]# cat /etc/sysconfig/iptables
# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
#遮蔽A機房2個節點
-I INPUT -s 192.168.71.213 -j DROP
-I INPUT -s 192.168.71.214 -j DROP
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
斷網
B機房斷網前
--前
192.168.71.214:6379> info Replication
# Replication
role:master
connected_slaves:3
slave0:ip=192.168.71.213,port=6379,state=online,offset=12825868,lag=1
slave1:ip=192.168.70.214,port=6379,state=online,offset=12825868,lag=1
slave2:ip=192.168.70.215,port=6379,state=online,offset=12826015,lag=0
master_repl_offset:12826162
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11777587
repl_backlog_histlen:1048576
192.168.71.214:6379>
--後
--明顯找不到70網段的那2個節點啦
192.168.71.214:6379> info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.71.213,port=6379,state=online,offset=12909588,lag=1
master_repl_offset:12909588
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11861013
repl_backlog_histlen:1048576
192.168.71.214:6379>
而Master還能繼續對外提供服務
A機房斷網前、後
前
192.168.71.214:6379> info Replication
# Replication
role:master
connected_slaves:3
slave0:ip=192.168.71.213,port=6379,state=online,offset=12942691,lag=1
slave1:ip=192.168.70.214,port=6379,state=online,offset=12942691,lag=1
slave2:ip=192.168.70.215,port=6379,state=online,offset=12942838,lag=0
master_repl_offset:12942838
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11894263
repl_backlog_histlen:1048576
後,出現2個Master??
192.168.71.214:6379> info Replication
# Replication
role:master
connected_slaves:0
master_repl_offset:12957363
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11908788
repl_backlog_histlen:1048576
192.168.71.214:6379>
192.168.71.213:6379> info replication
# Replication
role:master
connected_slaves:0
master_repl_offset:12943881
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
192.168.71.213:6379>
斷電
通過kill redis程序來模擬
ps -ef|grep redis
斷電前
192.168.71.213:6379> info replication
# Replication
role:master
connected_slaves:3
slave0:ip=192.168.70.215,port=6379,state=online,offset=13091227,lag=0
slave1:ip=192.168.70.214,port=6379,state=online,offset=13091227,lag=0
slave2:ip=192.168.71.214,port=6379,state=online,offset=13091080,lag=1
master_repl_offset:13091227
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:13087442
repl_backlog_histlen:3786
192.168.71.214:6379> info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.71.213,port=6379,state=online,offset=13096642,lag=1
master_repl_offset:13096642
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:13092272
repl_backlog_histlen:4371
192.168.71.214:6379>
斷電後
192.168.70.214:6379> info Replication
# Replication
role:slave
master_host:192.168.71.214
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:13159324
master_link_down_since_seconds:18
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
92.168.70.215:6379> info Replication
# Replication
role:slave
master_host:192.168.71.214
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:13159324
master_link_down_since_seconds:28
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
70網段都變成Slave無法正常提供服務了。。。
此時,需要修改其中一個節點的配置來向外提供服務
先Kill掉redis程序,再修改某一節點的redis引數,指向其中一個節點,如70.215,並檢查另外一臺,刪除這一項,最後重啟2個節點,對外正常提供服務
vi /home/redis/redis.conf
slaveof 192.168.70.214 6379
[[email protected] src]# ./redis-cli -h 192.168.70.214 -p 6379 -a 123456
192.168.70.214:6379> info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.70.215,port=6379,state=online,offset=15,lag=1
master_repl_offset:15
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:14
192.168.70.214:6379>
【還原初始】
修改71.214 之外的引數
vi /home/redis/redis.conf
slaveof 192.168.71.214 6379
vi /home/redis/sentinel.conf
sentinel monitor mymaster 192.168.71.214 6379 1
並刪除最後幾行
資料校驗
Master執行更新資料會同步Slave
注意事項
見每步後面