Heartbeat+DRBD+MySQL高可用方案
1.方案簡介
本方案採用Heartbeat雙機熱備軟體來保證資料庫的高穩定性和連續性,資料的一致性由DRBD這個工具來保證。預設情況下只有一臺mysql在工作,當主mysql伺服器出現問題後,系統將自動切換到備機上繼續提供服務,當主資料庫修復完畢,又將服務切回繼續由主mysql提供服務。
2.方案優缺點
優點:安全性高、穩定性高、可用性高,出現故障自動切換。
缺點:只有一臺伺服器提供服務,成本相對較高,不方便擴充套件,可能會發生腦裂。
3.軟體介紹
Heartbeat介紹
官方站點:http://linux-ha.org/wiki/Main_Page
heartbeat可以資源(VIP地址及程式服務)從一臺有故障的伺服器快速的轉移到另一臺正常的伺服器提供服務,heartbeat和keepalived相似,heartbeat可以實現failover功能,但不能實現對後端的健康檢查
DRBD介紹
官方站點:http://www.drbd.org/
DRBD(DistributedReplicatedBlockDevice)是一個基於塊裝置級別在遠端伺服器直接同步和映象資料的軟體,用軟體實現的、無共享的、伺服器之間映象塊裝置內容的儲存複製解決方案。它可以實現在網路中兩臺伺服器之間基於塊裝置級別的實時映象或同步複製(兩臺伺服器都寫入成功)/非同步複製(本地伺服器寫入成功),相當於網路的RAID1,由於是基於塊裝置(磁碟,LVM邏輯卷),在檔案系統的底層,所以資料複製要比cp命令更快。DRBD已經被MySQL官方寫入文件手冊作為推薦的高可用的方案之一
4.方案拓撲
5.方案適用場景:
適用於資料庫訪問量不太大,短期內訪問量增長不會太快,對資料庫可用性要求非常高的場景。
6.測試環境介紹(如下所示,均已關閉防火牆及selinux,生產環境自行開放埠)
主機名 ip 系統 DRBD磁碟 heartbeat版本 db-server-01 192.168.0.10 centos6.2 64bit /dev/sda5 3.0.4 db-server-02 192.168.0.20 centos6.2 64bit /dev/sda5 3.0.4
7.軟體安裝以及環境配置
(1)安裝drbd依賴元件(兩臺機器,安裝以後重啟系統,因為會升級核心版本,不重啟會對不上核心版本,有知道不用重啟的童鞋請給我留言^_^):
yum install -y kernel kernel-devel kernel-headers flex
(2)下載軟體安裝(兩臺機器操作一樣)
wget http://oss.linbit.com/drbd/8.4/drbd-8.4.2.tar.gz
tar xf drbd-8.4.2.tar.gz cd drbd-8.4.2 ./configure --prefix=/usr/local/drbd --with-km make KDIR=/usr/src/kernels/2.6.32-431.11.2.el6.x86_64/ #很多童鞋無法載入drbd模組,多半是正在執行的核心版本和新安裝的不相符 make install mkdir -p /usr/local/drbd/var/run/drbd cp /usr/local/drbd/etc/rc.d/init.d/drbd /etc/rc.d/init.d chmod 755 /etc/init.d/drbd cd drbd make clean make KDIR=/usr/src/kernels/2.6.32-431.11.2.el6.x86_64/ cp drbd.ko /lib/modules/`uname -r`/kernel/lib/ modprobe drbd
檢查是否載入了drbd模組
[[email protected] ~]# lsmod | grep drbd drbd 314246 0 libcrc32c 1246 1 drbd [[email protected] ~]#
(3)DRBD配置(配置之前需要先使用fdisk對/dev/sda進行分割槽)
[[email protected] ~]# df -HT Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 19G 2.6G 16G 15% / tmpfs tmpfs 121M 0 121M 0% /dev/shm /dev/sda1 ext4 204M 52M 141M 27% /boot /dev/sda5 ext4 34G 185M 32G 1% /data [[email protected] ~]#
我這裡兩臺機器之前都已經分割槽了,由於是自己筆記本上的虛擬機器,所以懶得加磁碟了,我直接把 /data/解除安裝,然後格式化/dev/sda5,我兩臺機器都這樣操作,如果你有空的磁碟,照樣需要進行分割槽,比如可以將一個1T的盤分一個區就行了。
[[email protected] ~]# umount /data/ [[email protected] ~]# mkfs.ext4 /dev/sda5 mke2fs 1.41.12 (17-May-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 2048000 inodes, 8185344 blocks 409267 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 250 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 28 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [[email protected] ~]#
[[email protected] ~]# fdisk -l Disk /dev/sda: 53.7 GB, 53687091200 bytes 255 heads, 63 sectors/track, 6527 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000eb0ff Device Boot Start End Blocks Id System /dev/sda1 * 1 26 204800 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 26 2321 18432000 83 Linux /dev/sda3 2321 2451 1048576 82 Linux swap / Solaris /dev/sda4 2451 6528 32742400 5 Extended /dev/sda5 2451 6528 32741376 83 Linux [[email protected] ~]#
我這裡還要在/etc/fstab裡面註釋一項:
#UUID=33958004-e8a7-4135-844f-707a5537e86a /data ext4 defaults 1 2
否則重啟機器的時候提示無法掛載,會無法啟動的。
修改/etc/hosts檔案,兩臺伺服器操作一樣。
192.168.0.10 db-server-01 192.168.0.20 db-server-02
drbd配置只需要修改/usr/local/drbd/etc/drbd.d/global_common.conf配置檔案即可,修改後如下(兩臺伺服器配置一樣):
[[email protected] ~]# cat /usr/local/drbd/etc/drbd.d/global_common.conf global { usage-count yes; } common { syncer { rate 30M; } } #同步速率,視頻寬而定 resource r0 { #建立一個資源,名字叫"r0" protocol C; #選擇的是drbd的C 協議(資料同步協議,C為收到資料並寫入後返回,確認成功) startup { } disk { on-io-error detach; } net { } on db-server-01 { #設定一個節點,分別以各自的主機名命名 device /dev/drbd0; #設定資源裝置/dev/drbd0 指向實際的物理分割槽 /dev/sda5 disk /dev/sda5; address 192.168.0.10:7888; #設定監聽地址以及埠 meta-disk internal; } on db-server-02 { device /dev/drbd0; disk /dev/sda5; address 192.168.0.20:7888; meta-disk internal; #internal表示是在同一個區域網內 } } [[email protected] ~]#
(4)DRBD的管理與維護:
建立DRBD資源
配置好drbd以後,就需要使用命令建立配置的drbd資源,使用如下命令(兩臺伺服器操作一樣):
[[email protected] ~]# dd if=/dev/zero of=/dev/sda5 bs=1M count=100 #不這樣做的話,在建立資源的時候報錯 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 3.34339 s, 31.4 MB/s [[email protected] ~]#
[[email protected] ~]# drbdadm create-md r0 Writing meta data... initializing activity log NOT initializing bitmap New drbd meta data block successfully created. success [[email protected] ~]#
(5)DRBD的啟動與狀態檢視(分別在兩臺伺服器啟動)
[[email protected] ~]# /etc/init.d/drbd start Starting DRBD resources: [ create res: r0 prepare disk: r0 adjust disk: r0 adjust net: r0 ] ..... [[email protected] ~]#
[[email protected] ~]# /etc/init.d/drbd start Starting DRBD resources: [ create res: r0 prepare disk: r0 adjust disk: r0 adjust net: r0 ] . [[email protected] ~]#
檢視drbd的狀態:
[[email protected] ~]# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@db-server-01, 2014-04-18 21:15:57 m:res cs ro ds p mounted fstype 0:r0 Connected Secondary/Secondary Inconsistent/Inconsistent C [[email protected] ~]#
可以看見都還沒有主節點。設定當前節點(192.168.0.10)為主節點,並進行格式化和掛載 。
drbdadm -- --overwrite-data-of-peer primary all mkfs.ext4 /dev/drbd0 mkdir /data mount /dev/drbd0 /data/
在另外一臺伺服器建立掛載目錄,也建立/data
[[email protected] ~]# mkdir /data
檢視一下drbd的狀態(可以看見還在同步):
[[email protected] ~]# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@db-server-01, 2014-04-18 21:15:57 m:res cs ro ds p mounted fstype ... sync'ed: 13.7% (27596/31972)M 0:r0 SyncSource Primary/Secondary UpToDate/Inconsistent C /data ext4 [[email protected] ~]#
(6)mysql安裝,我這裡為了簡單直接安裝編譯好的二進位制軟體包(兩臺伺服器都需要安裝,操作一樣,只是第二臺mysql不需要初始化資料)
注意:兩臺伺服器上的mysql使用者的uid和gid要一樣。不然切換後會導致mysql資料目錄的屬主不正確而啟動失敗。
[[email protected] ~]# wget http://cdn.mysql.com/Downloads/MySQL-5.5/mysql-5.5.37-linux2.6-x86_64.tar.gz
[[email protected] ~]# tar xf mysql-5.5.37-linux2.6-x86_64.tar.gz -C /usr/local/ [[email protected] ~]# cd /usr/local/ [[email protected] local]# ln -s mysql-5.5.37-linux2.6-x86_64/ mysql [[email protected] local]# groupadd mysql [[email protected] local]# useradd -r -g mysql mysql [[email protected] local]# cd mysql [[email protected] mysql]# chown -R mysql . [[email protected] mysql]# chgrp -R mysql . [[email protected] mysql]# mkdir /data/mysql [[email protected] mysql]# chown -R mysql.mysql /data/mysql/ [[email protected] mysql]# /usr/local/mysql/scripts/mysql_install_db --user=mysql --datadir=/data/mysql/ --basedir=/usr/local/mysql
[[email protected] mysql]# chown -R root . [[email protected] mysql]# cp support-files/my-medium.cnf /etc/my.cnf [[email protected] mysql]# cp support-files/mysql.server /etc/init.d/mysqld [[email protected] mysql]# chmod 755 /etc/init.d/mysqld
[[email protected] mysql]# egrep 'datadir|basedir' /etc/my.cnf #兩臺伺服器上的mysql配置檔案都加入這裡的配置 datadir=/data/mysql basedir=/usr/local/mysql [[email protected] mysql]#
(7)手動切換drbd的主從。看另外一臺伺服器是否有資料(自動切換需要使用heartbeat,後面介紹):
[[email protected] ~]# ll /data/ total 20 drwx------ 2 root root 16384 Apr 18 22:16 lost+found drwxr-xr-x 5 mysql mysql 4096 Apr 18 23:01 mysql [[email protected] ~]#
[[email protected] ~]# ll /data/ total 0 [[email protected] ~]#
[[email protected] ~]# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@db-server-01, 2014-04-18 21:15:57 m:res cs ro ds p mounted fstype 0:r0 Connected Primary/Secondary UpToDate/UpToDate C /data ext4 [[email protected] ~]#
可以看見當前伺服器是主,也就是資料在這臺伺服器上,另外一臺伺服器是沒有資料的。下面進行手動切換
主切換成從,需要先解除安裝檔案系統,再執行降級為從的命令:
[[email protected] ~]# umount /data/ [[email protected] ~]# drbdadm secondary all
從切換成主,要先執行升級成主的命令然後掛在檔案系統:
[[email protected] ~]# drbdadm primary all [[email protected] ~]# mount /dev/drbd0 /data/ [[email protected] ~]# ll /data/ total 20 drwx------ 2 root root 16384 Apr 18 22:16 lost+found drwxr-xr-x 5 mysql mysql 4096 Apr 18 23:01 mysql [[email protected] ~]# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@db-server-02, 2014-04-18 21:22:55 m:res cs ro ds p mounted fstype 0:r0 Connected Primary/Secondary UpToDate/UpToDate C /data ext4 [[email protected] ~]#
可以看見已經成功切換成主,並且mysql初始化資料也存在了。
DRBD腦裂後的處理
當DRBD出現腦裂後,會導致drbd兩邊的磁碟資料不一致,在確定要作為從的節點上切換成secondary,並放棄該資源的資料:
drbdadm secondary r0 drbdadm -- --discard-my-data connect r0
在要作為primary的節點重新連線secondary(如果這個節點當前的連線狀態為WFConnection的話,可以省略),使用如下命令連線:
drbdadm connect r0
(8)Heartbeat安裝(兩臺伺服器)
需要新增epel源,centos預設自己沒有該軟體包,當然你可以自己原始碼編譯。
rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum install heartbeat -y
建立DRBD指令碼檔案drbddisk:(兩臺伺服器)
注意:
此處是一個大坑,因為預設yum安裝Heartbeat,不會在/etc/ha.d/resource.d/建立drbddisk指令碼,估計是版本太新了吧。記得前兩年都不會這樣的。囧。而且也無法在安裝後從本地其他路徑找到該檔案。此處也是因為啟動Heartbeat後無法PING通虛IP,最後通過檢視/var/log/ha-log日誌,找到一行ERROR: Cannot locate resource script drbddisk,然後進而到/etc/ha.d/resource.d/路徑下發現竟然沒有drbddisk指令碼,最後在google上找到該程式碼,建立該指令碼,終於測試通過:
[[email protected] ~]# chmod 755 /etc/ha.d/resource.d/drbddisk [[email protected] ~]# cat /etc/ha.d/resource.d/drbddisk #!/bin/bash # # This script is inteded to be used as resource script by heartbeat # # Copright 2003-2008 LINBIT Information Technologies # Philipp Reisner, Lars Ellenberg # ### DEFAULTFILE="/etc/default/drbd" DRBDADM="/sbin/drbdadm" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi if [ "$#" -eq 2 ]; then RES="$1" CMD="$2" else RES="all" CMD="$1" fi ## EXIT CODES # since this is a "legacy heartbeat R1 resource agent" script, # exit codes actually do not matter that much as long as we conform to # http://wiki.linux-ha.org/HeartbeatResourceAgent # but it does not hurt to conform to lsb init-script exit codes, # where we can. # http://refspecs.linux-foundation.org/LSB_3.1.0/ #LSB-Core-generic/LSB-Core-generic/iniscrptact.html #### drbd_set_role_from_proc_drbd() { local out if ! test -e /proc/drbd; then ROLE="Unconfigured" return fi dev=$( $DRBDADM sh-dev $RES ) minor=${dev#/dev/drbd} if [[ $minor = *[!0-9]* ]] ; then # sh-minor is only supported since drbd 8.3.1 minor=$( $DRBDADM sh-minor $RES ) fi if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then ROLE=Unknown return fi if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then set -- $out ROLE=${5%/**} : ${ROLE:=Unconfigured} # if it does not show up else ROLE=Unknown fi } case "$CMD" in start) # try several times, in case heartbeat deadtime # was smaller than drbd ping time try=6 while true; do $DRBDADM primary $RES && break let "--try" || exit 1 # LSB generic error sleep 1 done ;; stop) # heartbeat (haresources mode) will retry failed stop # for a number of times in addition to this internal retry. try=3 while true; do $DRBDADM secondary $RES && break # We used to lie here, and pretend success for anything != 11, # to avoid the reboot on failed stop recovery for "simple # config errors" and such. But that is incorrect. # Don't lie to your cluster manager. # And don't do config errors... let --try || exit 1 # LSB generic error sleep 1 done ;; status) if [ "$RES" = "all" ]; then echo "A resource name is required for status inquiries." exit 10 fi ST=$( $DRBDADM role $RES ) ROLE=${ST%/**} case $ROLE in Primary|Secondary|Unconfigured) # expected ;; *) # unexpected. whatever... # If we are unsure about the state of a resource, we need to # report it as possibly running, so heartbeat can, after failed # stop, do a recovery by reboot. # drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is # suddenly readonly. So we retry by parsing /proc/drbd. drbd_set_role_from_proc_drbd esac case $ROLE in Primary) echo "running (Primary)" exit 0 # LSB status "service is OK" ;; Secondary|Unconfigured) echo "stopped ($ROLE)" exit 3 # LSB status "service is not running" ;; *) # NOTE the "running" in below message. # this is a "heartbeat" resource script, # the exit code is _ignored_. echo "cannot determine status, may be running ($ROLE)" exit 4 # LSB status "service status is unknown" ;; esac ;; *) echo "Usage: drbddisk [resource] {start|stop|status}" exit 1 ;; esac exit 0 [[email protected] ~]#
(9)heartbeat配置
Hearbeat的配置主要包括三個配置檔案,authkeys,ha.cf和haresources的配置,下面就分別來看看:
Authkerys的配置(兩臺伺服器配置一樣)
這個檔案用來配置密碼認證方式,支援3種認證方式,crc,md5和sha1,從左到右安全性越來越高,消耗的資源也越多。因此如果heartbeat執行在安全的網路之上,比如私網,那麼可以將驗證方式設定成crc,master和backup的authkeys配置一樣。我的authkeys檔案配置如下:
[[email protected] ~]# cat /etc/ha.d/authkeys auth 1 1 crc [[email protected] ~]# chmod 600 /etc/ha.d/authkeys
注意:該檔案許可權必須是600
ha.cf的配置(兩臺機器稍微有點區別),Primary(192.168.0.10)如下:
[[email protected] ~]# cat /etc/ha.d/ha.cf logfile /var/log/ha-log #定義Heartbeat的日誌名字及位置 logfacility local0 keepalive 2 #設定心跳(監測)時間為2秒 deadtime 15 #設定死亡時間為15秒 ucast eth1 192.168.0.20 #採用單播的方式,IP地址指定為對方IP auto_failback off #當Primary機器發生故障切換到Secondary機器後Primary恢復後是否進行切回操作 (最好是我們有需求手動進行切換) node db-server-01 node db-server-02 [[email protected] ~]#
Secondary(192.168.0.20)如下:
[[email protected] ~]# cat /etc/ha.d/ha.cf logfile /var/log/ha-log #定義Heartbeat的日誌名字及位置 logfacility local0 keepalive 2 #設定心跳(監測)時間為2秒 deadtime 15 #設定死亡時間為15秒 ucast eth1 192.168.0.10 #採用單播的方式,IP地址指定為對方IP auto_failback off #當Primary機器發生故障切換到Secondary機器後Primary恢復後是否進行切回操作(一般我們可以看需求,否則不用自動切換) node db-server-01 node db-server-02 [[email protected] ~]#
haresources的配置(兩臺機器配置一樣):
[[email protected] ~]# cat /etc/ha.d/haresources db-server-01 IPaddr::192.168.0.88/24/eth1 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext4 mysqld [[email protected] ~]#
注:該檔案內IPaddr,Filesystem等指令碼存放路徑在/etc/ha.d/resource.d/下,也可在該目錄下存放服務啟動指令碼(例如:mysqld),將相同指令碼名稱添到/etc/ha.d/haresources內容中,從而跟隨heartbeat啟動而啟動該指令碼。
IPaddr::192.168.0.88/24/eth1:用IPaddr指令碼配置浮動VIP
drbddisk::r0:用drbddisk指令碼實現DRBD主從節點資源組的掛載和解除安裝
Filesystem::/dev/drbd0::/data::ext4:用Filesystem指令碼實現磁碟掛載和解除安裝
(10)heartbeat的管理
配置好heartbeat之後,需要將mysql從自啟動伺服器中去掉,因為主heartbeat啟動的時候會掛載drdb檔案系統以及啟動mysql,切換的時候會將主上的mysql停止並解除安裝檔案系統,從上會掛載檔案系統,並啟動mysql。因此需要做如下操作(兩臺伺服器):
[[email protected] ~]# chkconfig mysqld off [[email protected] ~]# chkconfig heartbeat off [[email protected] ~]# chkconfig drbd off
[[email protected] ~]# cat /etc/rc.local #!/bin/sh # # This script will be executed *after* all the other init scripts. # You can put your own initialization stuff in here if you don't # want to do the full Sys V style init stuff. touch /var/lock/subsys/local modprobe drbd #必須先載入模組,這也是因為將啟動命令放在這裡的原因 /etc/init.d/drbd start /etc/init.d/heartbeat start [[email protected] ~]#
到這裡heartbeat+drbd+mysql高可用環境就搭建結束了。接下來進行測試。
高可用測試
(1)在第一臺伺服器上面啟動mysql服務。(192.168.0.10)
[[email protected] ~]# /etc/init.d/mysqld start Starting MySQL.The server quit without updating PID file (/[FAILED]ql/db-server-01.pid). [[email protected] ~]# ll /data/ total 0 [[email protected] ~]#
怎麼回事?/data/下面為空。這裡是因為我們在前面已經把這個節點變為Secondary
[[email protected] ~]# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@db-server-01, 2014-04-18 21:15:57 m:res cs ro ds p mounted fstype 0:r0 Connected Secondary/Primary UpToDate/UpToDate C [[email protected] ~]#
我們現在需要手動切換回來。才能啟動mysql
[[email protected] ~]# umount /data/ [[email protected] ~]# drbdadm secondary all [[email protected] ~]#
[[email protected] ~]# drbdadm primary all [[email protected] ~]# mount /dev/drbd0 /data/ [[email protected] ~]# ll /data/ total 20 drwx------ 2 root root 16384 Apr 18 22:16 lost+found drwxr-xr-x 5 mysql mysql 4096 Apr 18 23:01 mysql [[email protected] ~]# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@db-server-01, 2014-04-18 21:15:57 m:res cs ro ds p mounted fstype 0:r0 Connected Primary/Secondary UpToDate/UpToDate C /data ext4 [[email protected] ~]#
可以看見已經切換回來了,我們現在可以啟動mysql了。
[[email protected] ~]# /etc/init.d/mysqld start Starting MySQL....... [ OK ] [[email protected] ~]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.5.37-log MySQL Community Server (GPL) Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
(2)在兩臺伺服器上面啟動heartbeat
[[email protected] ~]# /etc/init.d/heartbeat start Starting High-Availability services: INFO: Resource is stopped Done. [[email protected] ~]#
[[email protected] ~]# /etc/init.d/heartbeat start Starting High-Availability services: INFO: Resource is stopped Done. [[email protected] ~]#
[[email protected] ~]# ip addr | grep eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.0.10/24 brd 192.168.0.255 scope global eth1 inet 192.168.0.88/24 brd 192.168.0.255 scope global secondary eth1 [[email protected] ~]#
可以看見虛擬ip192.168.0.88已經存在了。說明成功了。我們看看heartbeat的日誌就能發現。
[[email protected] ~]# tail -n 20 /var/log/ha-log harc(default)[5598]: 2014/04/19_00:25:21 info: Running /etc/ha.d//rc.d/status status Apr 19 00:25:22 db-server-01 heartbeat: [5591]: info: Comm_now_up(): updating status to active Apr 19 00:25:22 db-server-01 heartbeat: [5591]: info: Local status now set to: 'active' Apr 19 00:25:22 db-server-01 heartbeat: [5591]: info: Status update for node db-server-02: status active harc(default)[5618]: 2014/04/19_00:25:22 info: Running /etc/ha.d//rc.d/status status Apr 19 00:25:33 db-server-01 heartbeat: [5591]: info: remote resource transition completed. Apr 19 00:25:33 db-server-01 heartbeat: [5591]: info: remote resource transition completed. Apr 19 00:25:33 db-server-01 heartbeat: [5591]: info: Initial resource acquisition complete (T_RESOURCES(us)) /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.88)[5671]: 2014/04/19_00:25:33 INFO: Resource is stopped Apr 19 00:25:33 db-server-01 heartbeat: [5635]: info: Local Resource acquisition completed. harc(default)[5752]: 2014/04/19_00:25:33 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp ip-request-resp(default)[5752]: 2014/04/19_00:25:33 received ip-request-resp IPaddr::192.168.0.88/24/eth1 OK yes ResourceManager(default)[5775]: 2014/04/19_00:25:33 info: Acquiring resource group: db-server-01 IPaddr::192.168.0.88/24/eth1 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext4 mysqld /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.88)[5803]: 2014/04/19_00:25:33 INFO: Resource is stopped ResourceManager(default)[5775]: 2014/04/19_00:25:33 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.88/24/eth1 start IPaddr(IPaddr_192.168.0.88)[5926]: 2014/04/19_00:25:34 INFO: Adding inet address 192.168.0.88/24 with broadcast address 192.168.0.255 to device eth1 IPaddr(IPaddr_192.168.0.88)[5926]: 2014/04/19_00:25:34 INFO: Bringing device eth1 up IPaddr(IPaddr_192.168.0.88)[5926]: 2014/04/19_00:25:34 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.0.88 eth1 192.168.0.88 auto not_used not_used /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.88)[5900]: 2014/04/19_00:25:34 INFO: Success /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[6030]: 2014/04/19_00:25:34 INFO: Running OK [[email protected] ~]#
激動的時刻到了,我們測試一下自動切換。我們先看看兩臺伺服器的狀態:
[[email protected] ~]# df -HT Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 19G 3.5G 15G 20% / tmpfs tmpfs 121M 0 121M 0% /dev/shm /dev/sda1 ext4 204M 52M 141M 27% /boot /dev/drbd0 ext4 33G 216M 32G 1% /data [[email protected] ~]#
[[email protected] ~]# df -HT Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 19G 4.9G 13G 28% / tmpfs tmpfs 121M 0 121M 0% /dev/shm /dev/sda1 ext4 204M 52M 141M 27% /boot [[email protected] ~]#
可以看見掛載在第一臺伺服器。
測試方法:
1.停掉master上的mysqld,看看是否切換(因為heartheat不檢查服務的可用性,因此需要通過而外的指令碼來實現)。
2.停掉master的heartheat看看是否能正常切換。
3.停掉master的網路或者直接將master系統shutdown,看看能否正常切換。
4.啟動master的heartbeat看看是否能正常切換回來。
5.重新啟動master看看能否切換過程是否OK。
注意:這裡說的切換是不是已經將mysql停掉、是否解除安裝了檔案系統等等。
我就停止master(192.168.0.10)上的heartbeat來測試是否會自動切換,這裡除了第一條無法實現,其他的都可以切換:
[[email protected] ~]# /etc/init.d/heartbeat stop Stopping High-Availability services: Done.
[[email protected] ~]# df -HT Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 19G 3.5G 15G 20% / tmpfs tmpfs 121M 0 121M 0% /dev/shm /dev/sda1 ext4 204M 52M 141M 27% /boot [[email protected] ~]# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@db-server-01, 2014-04-18 21:15:57 m:res cs ro ds p mounted fstype 0:r0 Connected Secondary/Primary UpToDate/UpToDate C [[email protected] ~]#
可以看見已經切換了,我們看另外一臺機器的情況:
[[email protected] ~]# df -HT Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 19G 4.9G 13G 28% / tmpfs tmpfs 121M 0 121M 0% /dev/shm /dev/sda1 ext4 204M 52M 141M 27% /boot /dev/drbd0 ext4 33G 216M 32G 1% /data [[email protected] ~]# netstat -nltp | grep 3306 | grep -v grep tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 5542/mysqld [[email protected] ~]#
可以發現已經切換過來,mysql也自動啟動了。之前是沒有啟動的。
[[email protected] ~]# ip addr | grep eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.0.20/24 brd 192.168.0.255 scope global eth1 inet 192.168.0.88/24 brd 192.168.0.255 scope global secondary eth1 [[email protected] ~]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.5.37-log MySQL Community Server (GPL) Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
可以看見,一切正常呢。如果我們檢視日誌,就可以看見到底發生了什麼。
[[email protected] ~]# tail -n 10 /var/log/ha-log ResourceManager(default)[4768]: 2014/04/19_00:36:42 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext4 start Filesystem(Filesystem_/dev/drbd0)[5131]: 2014/04/19_00:36:42 INFO: Running start for /dev/drbd0 on /data /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[5122]: 2014/04/19_00:36:42 INFO: Success ResourceManager(default)[4768]: 2014/04/19_00:36:43 info: Running /etc/init.d/mysqld start mach_down(default)[4741]: 2014/04/19_00:36:46 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down(default)[4741]: 2014/04/19_00:36:46 info: mach_down takeover complete for node db-server-01. Apr 19 00:36:46 db-server-02 heartbeat: [4637]: info: mach_down takeover complete. Apr 19 00:36:58 db-server-02 heartbeat: [4637]: WARN: node db-server-01: is dead Apr 19 00:36:58 db-server-02 heartbeat: [4637]: info: Dead node db-server-01 gave up resources. Apr 19 00:36:58 db-server-02 heartbeat: [4637]: info: Link db-server-01:eth1 dead. [[email protected] ~]#
對於mysqld服務掛掉的情況無法實現自動切換,所以需要一個指令碼來幫助我們完成,我這裡有個簡單的指令碼,能實現當mysqld服務不可用時進行自動切換,當進行切換時傳送郵件等。該指令碼放在主伺服器執行,也就是執行mysqld服務的伺服器上執行。
[[email protected] ~]# cat mysqlmon.sh #!/bin/bash trap 'echo PROGRAM INTERRUPTED; exit 1' INT username=root password=123456 n=0 log='/var/log/mysqlmon.log' while true do if /usr/local/mysql/bin/mysql -u${username} -p${password} -e "use test" >&/dev/null then echo `date +"%Y-%m-%d %H:%M:%S"` mysqld is alive! >> ${log} n=0 else echo "`date +"%Y-%m-%d %H:%M:%S"` mysqld cannot be connected!" >> ${log} n=$[n + 1] if [ $n -eq 3 ] then /etc/init.d/heartbeat stop echo "`date +"%Y-%m-%d %H:%M:%S"` mysqld switched to backup!" >> ${log} echo "`date +"%Y-%m-%d %H:%M:%S"` mysqld switched to backup" | mutt -s "mysqld switched to backup" [email protected] break fi fi sleep 10 done [[email protected] ~]#
掛在後臺執行:
[[email protected] ~]# nohup mysqlmon.sh &
停止mysqld服務,看是否進行切換以及傳送郵件:
[[email protected] ~]# /etc/init.d/mysqld stop Shutting down MySQL. [ OK ] [[email protected] ~]#
[[email protected] ~]# df -HT Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 19G 4.9G 13G 28% / tmpfs tmpfs 121M 0 121M 0% /dev/shm /dev/sda1 ext4 204M 52M 141M 27% /boot /dev/drbd0 ext4 33G 216M 32G 1% /data [[email protected] ~]# netstat -nltp | grep 3306 tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 13771/mysqld [[email protected] ~]#
總結:
搭建還不算複雜,但是也踩了不少坑,比如yum安裝的heartbeat沒有drbddisk指令碼。該方案的優點是安全性高、穩定性高、可用性高,出現故障自動切換,但是缺點也很明顯,只有一臺伺服器提供服務,成本相對較高。不方便擴充套件。可能會發生腦裂。當mysql服務掛掉或者不可用的情況下不能進行自動切換,需要通過crm模式實現或者額外的指令碼實現(比如shell指令碼監測到master的mysql不可用就將主上的heartbeat停掉,這樣就會切換到backup中去)。監控也特別重要,可以使用nagios或者zabbix監控。
參考資料: