1. 程式人生 > >說說Keepalived的腦裂

說說Keepalived的腦裂

HA 腦裂

1. 工作場景

Keepalived提供了Loadbalancing和High-Availability的功能, 本文說的是其為2個Mycat節點提供HA功能的場景.



2. 關鍵配置如下, 為主備非搶占模式.

! mycat01, 192.168.4.196


global_defs {

# 一個keepalived.conf, 對應一個router_id

router_id mycat01

}


vrrp_instance VI_1 {

state BACKUP

nopreempt

interface eth0

# 一個組播內, 各節點該值相同

virtual_router_id 196

priority 200

advert_int 1

authentication {

auth_type PASS

auth_pass Zf4aqy

}


virtual_ipaddress {

192.168.4.200

}

}


! mycat02, 192.168.4.195


global_defs {

router_id mycat02

}


vrrp_instance VI_1 {

state BACKUP

nopreempt

interface eth0

virtual_router_id 196

priority 200

advert_int 1

authentication {

auth_type PASS

auth_pass Zf4aqy

}


virtual_ipaddress {

192.168.4.200

}

}



3. Keepalived提供HA的工作原理

Keepalived的HA功能, 是通過VRRP(Virtual Router Redundancy Protocol, 虛擬路由冗余協議)來實現的. 其用IP組播的方式(默認組播地址: 224.0.0.18), 實現服務節點間的通信, 通過一種競選機制來將路由任務交給某臺VRRP路由器. 工作時主節點發送VRRP協議報文, 備節點接收報文, 若一段時間(默認3個報文發送時間)備節點接收不到主節點發送的報文, 就會啟動接管程序接管主節點的資源. 備節點可以有多個, 通過優先級競選.



4. Keepalived發生腦裂的情況

4.1 根據上述工作原理知道, 備節點接收不到報文時, 如兩者間的網絡不通了, 備節點就會啟動接管程序接管主節點的資源, 對外提供服務, 表現形式就是備節點上出現了虛擬IP, 此時主節點也是持有虛擬IP的.


問題重現如下, 此時mycat01為主節點, mycat02為備節點, 在mycat02上drop掉組播數據包.

# iptables -A INPUT -m pkttype --pkt-type multicast -j DROP


然後觀察message日誌, 可見其也變成了主節點.

# tail -f /var/log/messages

Mar 17 00:13:58 mycat02 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team

Mar 17 00:14:00 mycat02 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE

Mar 17 00:14:01 mycat02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE

Mar 17 00:14:01 mycat02 Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.

Mar 17 00:14:01 mycat02 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.4.200

Mar 17 00:14:01 mycat02 Keepalived_healthcheckers: Netlink reflector reports IP 192.168.4.200 added

Mar 17 00:14:06 mycat02 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.4.200


在mycat01上抓包, 看到mycat02也在向組播地址發送數據包.

# tcpdump -nnei eth0 | grep -i 'vrid 196'

00:13:58.735738 fa:16:3e:f4:18:40 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 54: 192.168.4.196 > 224.0.0.18: VRRPv2, Advertisement, vrid 196, prio 202, authtype simple, intvl 1s, length 20

00:13:59.736268 fa:16:3e:f4:18:40 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 54: 192.168.4.196 > 224.0.0.18: VRRPv2, Advertisement, vrid 196, prio 202, authtype simple, intvl 1s, length 20

00:14:00.736800 fa:16:3e:f4:18:40 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 54: 192.168.4.196 > 224.0.0.18: VRRPv2, Advertisement, vrid 196, prio 202, authtype simple, intvl 1s, length 20

00:14:00.955581 fa:16:3e:5d:23:10 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 54: 192.168.4.195 > 224.0.0.18: VRRPv2, Advertisement, vrid 196, prio 202, authtype simple, intvl 1s, length 20

00:14:01.956340 fa:16:3e:f4:18:40 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 54: 192.168.4.196 > 224.0.0.18: VRRPv2, Advertisement, vrid 196, prio 202, authtype simple, intvl 1s, length 20


當把iptables停掉後, mycat01和mycat02扔分別持有虛擬IP 192.168.4.200, 這也說明了主備非搶占模式的一個弊端, 主備搶占模式是不會這樣的, 下面是其主要配置.

! mycat01, 192.168.4.196


global_defs {

router_id mycat01

}


vrrp_instance VI_1 {

state MASTER

virtual_router_id 196

priority 200

}


! mycat02, 192.168.4.195


global_defs {

router_id mycat02

}


vrrp_instance VI_1 {

state BACKUP

virtual_router_id 196

priority 150

}


4.2 同一個組播內, virtual_router_id配置不相同, 也會發生腦裂現象, 這裏就不在贅述了.



說到最後, 如何避免腦裂現象呢, 其實可以看到, 只要有虛擬IP存在, 就不可能完全規避這個問題. 也就是沒有虛擬IP了, 就沒有腦裂了, 那麽節點又如何向外提供服務呢? 引入Zookeeper或Consul的服務發現機制, 這些又是另外的事情了...


若感興趣可關註訂閱號”數據庫最佳實踐”(DBBestPractice).

技術分享圖片

說說Keepalived的腦裂