利用corosync+pacemaker+DRBD解決MFS master的單點問題
在上一篇文章中介紹了mfs的安裝配置與基本維護,但mfs存在一個master單點的問題,這一篇文章就來介紹利用corosync+pacemaker+DRBD解決MFS master的單點問題,在上一篇文章中,192.168.5.72這臺伺服器原來作為metalogger server,現在把這臺伺服器作為元伺服器的備機,把metalogger server遷移到192.168.5.73上面,5.73既當metalogger server又當Chunkservers。
配置環境:
CentOS 7.5 x 64 metaserver Master: 192.168.5.71 metaserver Slave:192.168.5.72 VIP:192.168.5.77 metalogger server:192.168.5.73 Chunkservers: 192.168.5.73 192.168.5.74 192.168.5.75
#hosts檔案配置
cat >> /etc/hosts << EOF 192.168.5.77 mfsmaster 192.168.5.71 mfs71 192.168.5.72 mfs72 192.168.5.73 mfs73 192.168.5.74 mfs74 192.168.5.75 mfs75 EOF
#這裡mfsmaster、mfschunkserver、metalogger、mfs client的安裝配置就不做介紹了
一、安裝DRBD(主備節點安裝)
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm yum install drbd84 kmod-drbd84 -y
#格式化磁碟,兩臺伺服器上的分割槽/dev/sdb1作為drbd的網路mirror分割槽(用獨立分割槽做DRBD)
fdisk /dev/sdb mkfs.ext4 /dev/sdb1
#開始配置DRBD
modprobe drbd lsmod | grep drbd
vi /etc/drbd.d/global_common.conf
global { usage-count no; } common { protocol C; disk { on-io-error detach; } syncer { rate 100M; } } resource mfs { on mfs71 { device /dev/drbd1; disk /dev/sdb1; address 192.168.5.71:7899; meta-disk internal; } on mfs72 { device /dev/drbd1; disk /dev/sdb1; address 192.168.5.72:7899; meta-disk internal; } }
#啟動DRBD
dd if=/dev/zero bs=1M count=128 of=/dev/sdb1 sync drbdadm create-md mfs service drbd start chkconfig drbd on
[[email protected] mfs]# cat /proc/drbd version: 8.4.11-1 (api:1/proto:86-101) GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by [email protected], 2018-04-26 12:10:42 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- ns:0 nr:0 dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:104853340
#初始化一個主機(這幾步只在主節點上操作)
[[email protected] ~]# drbdsetup /dev/drbd1 primary [[email protected] ~]# drbdadm primary --force mfs [[email protected] ~]# drbdadm -- --overwrite-data-of-peer primary mfs
#主節點檢視同步狀態
[[email protected] mfs]# cat /proc/drbd version: 8.4.11-1 (api:1/proto:86-101) GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by [email protected], 2018-04-26 12:10:42 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- ns:129444 nr:0 dw:0 dr:131548 al:8 bm:0 lo:0 pe:2 ua:0 ap:0 ep:1 wo:f oos:104724828 [>....................] sync'ed: 0.2% (102268/102392)M finish: 2:42:36 speed: 10,708 (10,708) K/sec
##檢視格式化進度
[[email protected] ~]# watch -n1 'cat /proc/drbd'
說明:
cs:兩臺資料連線狀態
ro:兩臺主機的狀態
ds:磁碟狀態是“UpToDate/UpToDate”,同步狀態。
##檔案系統的掛載只能在master節點進行,備機的DRBD裝置無法被掛載,因為它是用來接收主機資料的,由DRBD負責操作
mkdir -p /data/mfs chown -R mfs:mfs /data/mfs mkfs.ext4 /dev/drbd1 mount /dev/drbd1 /data/mfs
#檢視掛載情況
[[email protected] mfs]# df -h 檔案系統 容量 已用 可用 已用% 掛載點 /dev/drbd1 99G 61M 94G 1% /data/mfs
二、Metaserver配置
#將metadata.mfs的儲存目錄修改到drbd下面
vi mfsmaster.cfg DATA_PATH = /data/mfs
#試著啟動mfsmaster(只在主節點上啟動)
cp /usr/local/mfs/var/mfs/* /data/mfs/ chown -R mfs:mfs /data/mfs /usr/local/mfs/sbin/mfsmaster start
[[email protected] mfs]# ps -ef|grep mfs root 25966 2 0 11月02 ? 00:00:00 [drbd_w_mfs] root 25969 2 0 11月02 ? 00:00:00 [drbd_r_mfs] root 25975 2 0 11月02 ? 00:00:03 [drbd_a_mfs] root 25976 2 0 11月02 ? 00:00:00 [drbd_as_mfs] mfs 31508 1 5 15:31 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start root 31510 31452 0 15:31 pts/0 00:00:00 grep --color=auto mfs
[[email protected] mfs]# lsof -i:9420 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mfsmaster 31508 mfs 9u IPv4 160777 0t0 TCP *:9420 (LISTEN)
#關閉mfsmaster開機啟動(主從節點都要執行,後面mfsmaster的啟動由pacemaker來完成)
systemctl disable moosefs-master.service systemctl stop moosefs-master.service
#關閉DRBD服務並解除安裝裝置
umount /data/mfs #這一步只在主節點上執行
三、pacemaker+corosync安裝配置
#安裝pacemaker corosync
yum install pcs pacemaker corosync fence-agents-all -y
#啟動pcsd服務(開機自啟動)
systemctl start pcsd.service systemctl enable pcsd.service#為hacluster設定密碼,安裝元件生成的hacluster使用者,用來本地啟動pcs程序,因此我們需要設定密碼,每個節點的密碼相同
passwd hacluster balala369
#叢集各節點之間認證(主節點mfs71上操作)
[[email protected] ~]# pcs cluster auth 192.168.5.71 192.168.5.72 Username: hacluster Password: 192.168.5.71: Authorized 192.168.5.72: Authorized
##建立msfcluster 叢集資源(主節點mfs71上操作)
[[email protected] mfs]# pcs cluster setup --name mfscluster 192.168.5.71 192.168.5.72 Destroying cluster on nodes: 192.168.5.71, 192.168.5.72... 192.168.5.72: Stopping Cluster (pacemaker)... 192.168.5.71: Stopping Cluster (pacemaker)... 192.168.5.72: Successfully destroyed cluster 192.168.5.71: Successfully destroyed cluster Sending 'pacemaker_remote authkey' to '192.168.5.71', '192.168.5.72' 192.168.5.71: successful distribution of the file 'pacemaker_remote authkey' 192.168.5.72: successful distribution of the file 'pacemaker_remote authkey' Sending cluster config files to the nodes... 192.168.5.71: Succeeded 192.168.5.72: Succeeded Synchronizing pcsd certificates on nodes 192.168.5.71, 192.168.5.72... 192.168.5.71: Success 192.168.5.72: Success Restarting pcsd on the nodes in order to reload the certificates... 192.168.5.71: Success 192.168.5.72: Success
##檢視corosync配置檔案
[[email protected] mfs]# cat /etc/corosync/corosync.conf totem { version: 2 cluster_name: mfscluster secauth: off transport: udpu } nodelist { node { ring0_addr: 192.168.5.71 nodeid: 1 } node { ring0_addr: 192.168.5.72 nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes }
#設定叢集自啟動
[[email protected] mfs]# pcs cluster enable --all 192.168.5.71: Cluster Enabled 192.168.5.72: Cluster Enabled
[[email protected] ~]# systemctl start corosync.service [[email protected] ~]# systemctl start pacemaker.service [[email protected] ~]# systemctl enable corosync [[email protected] ~]# systemctl enable pacemaker [[email protected] ~]# systemctl start corosync.service [[email protected] ~]# systemctl start pacemaker.service [[email protected] ~]# systemctl enable corosync [[email protected] ~]# systemctl enable pacemaker
#檢視叢集狀態
[[email protected] mfs]# pcs cluster status Cluster Status: Stack: corosync Current DC: mfs71 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Mon Nov 5 16:09:30 2018 Last change: Mon Nov 5 16:09:09 2018 by hacluster via crmd on mfs71 2 nodes configured 0 resources configured PCSD Status: mfs72 (192.168.5.72): Online mfs71 (192.168.5.71): Online
[[email protected] ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: mfs71 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Mon Nov 5 16:09:53 2018 Last change: Mon Nov 5 16:09:15 2018 by hacluster via crmd on mfs71 2 nodes configured 0 resources configured PCSD Status: mfs72 (192.168.5.72): Online mfs71 (192.168.5.71): Online
##檢視啟動節點狀態
[[email protected] mfs]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.5.71 status = ring 0 active with no faults
[[email protected] ~]# corosync-cfgtool -s Printing ring status. Local node ID 2 RING ID 0 id = 192.168.5.72 status = ring 0 active with no faults
#檢視pacemaker程序
[[email protected] mfs]# ps axf |grep pacemaker 473 pts/0 S+ 0:00 | \_ grep --color=auto pacemaker 310 ? Ss 0:00 /usr/sbin/pacemakerd -f 311 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib 312 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd 313 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd 314 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd 315 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine 316 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
#檢視叢集資訊
[[email protected] mfs]# corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.5.71) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.5.72) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined
#禁用STONITH
[[email protected] mfs]# pcs property set stonith-enabled=false
#無法仲裁時候,選擇忽略
[[email protected] mfs]# pcs property set no-quorum-policy=ignore
#檢查配置是否正確
[[email protected] mfs]# crm_verify -L -V
#從pacemaker 1.1.8開始,crm發展成了一個獨立專案,叫crmsh。也就是說,我們安裝了pacemaker後,並沒有crm這個命令,我們要實現對叢集資源管理,還需要獨立安裝crmsh,crmsh依賴於許多包如:pssh
[[email protected] mfs]# wget -O /etc/yum.repos.d/network:ha-clustering:Stable.repo http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo [[email protected] mfs]# yum -y install crmsh
[[email protected] mfs]# wget -O /etc/yum.repos.d/network:ha-clustering:Stable.repo http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo [[email protected] mfs]# yum -y install crmsh
#如果yum安裝報錯,那就下載rpm包進行安裝
cd /opt wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/crmsh-3.0.0-6.2.noarch.rpm wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/crmsh-scripts-3.0.0-6.2.noarch.rpm wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/pssh-2.3.1-7.3.noarch.rpm wget http://mirror.yandex.ru/opensuse/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/python-parallax-1.0.1-29.1.noarch.rpm wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/python-pssh-2.3.1-7.3.noarch.rpm yum -y install crmsh-3.0.0-6.2.noarch.rpm crmsh-scripts-3.0.0-6.2.noarch.rpm pssh-2.3.1-7.3.noarch.rpm python-parallax-1.0.1-29.1.noarch.rpm python-pssh-2.3.1-7.3.noarch.rpm
#crm(只在主節點mfs71上配置)
[[email protected] mfs]# crm #檢視systemd型別可代理的服務,其中有moosefs-master drbd crm(live)# ra crm(live)ra# list systemd #新增DRBD資源 crm(live)# configure #注意:這裡的drbd_resource=mfs要與/etc/drbd.d/global_common.conf裡定義的resource mfs要一致 crm(live)configure# primitive mfsdrbd ocf:linbit:drbd params drbd_resource=mfs op start timeout=240s op stop timeout=100s op monitor role=Master interval=20s timeout=30s op monitor role=Slave interval=30s timeout=30s crm(live)configure# ms ms_mfsdrbd mfsdrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started
## 新增檔案系統資源
crm(live)configure# primitive drbdfs ocf:heartbeat:Filesystem params device=/dev/drbd1 directory=/data/mfs fstype=ext4 op monitor interval=30s timeout=40s op start timeout=60 op stop timeout=60 on-fail=restart
## 新增VIP資源(以192.168.5.77作為浮動IP,名字為mfsvip並且告訴叢集每30秒檢查它一次)
crm(live)configure# primitive mfsvip ocf:heartbeat:IPaddr params ip=192.168.5.77 op monitor interval=20 timeout=30 on-fail=restart
#配置監控的服務
crm(live)configure# primitive mfsserver systemd:moosefs-master op monitor interval=20s timeout=15s on-fail=restart crm(live)configure# show
#語法檢查
crm(live)configure# verify crm(live)configure# commit
## 定義約束(排列約束、順序約束)
#掛載資源追隨drbd主資源
crm(live)configure# colocation drbd_with_ms_mfsdrbd inf: drbdfs ms_mfsdrbd:Master
#節點上存在drbdMaster才能啟動drbdfs服務
crm(live)configure# order drbd_after_ms_mfsdrbd mandatory: ms_mfsdrbd:promote drbdfs:start
#mfs服務追隨掛載資源
crm(live)configure# colocation mfsserver_with_drbdfs inf: mfsserver drbdfs
#drbdfs服務啟動才能啟動mfs服務
crm(live)configure# order mfsserver_after_drbdfs mandatory: drbdfs:start mfsserver:start
#vip追隨mfs服務
crm(live)configure# colocation mfsvip_with_mfsserver inf: mfsvip mfsserver
#vip啟動才能啟動mfs服務
crm(live)configure# order mfsvip_before_mfsserver mandatory: mfsvip mfsserver crm(live)configure# show
#語法檢查
crm(live)configure# verify crm(live)configure# commit crm(live)configure# quit
#檢視叢集資訊
crm status
#檢視主節點服務是否執行正常?DRBD是否掛載?
#修改chunkserver的mfschunkserver.cfg,將MASTER_HOST修改成VIP
MASTER_HOST = 192.168.5.77
#修改metalogger的mfsmetalogger.cfg,將MASTER_HOST修改成VIP(在192.168.5.73上啟用mfsmetalogger)
MASTER_HOST = 192.168.5.77
#客戶端通過VIP掛載(也可以通過主機名掛載)
/usr/local/mfs/bin/mfsmount /mnt/mfs -H 192.168.5.77
/usr/local/mfs/bin/mfsmount -m /mnt/mfsmeta/ -H 192.168.5.77
四、用apache來替換mfscgiserv
#安裝apache
yum -y install httpd
#建立apache認證使用者
htpasswd -cm /etc/httpd/conf/htpasswd.users blufly balala369
#httpd.conf配置
vi /etc/httpd/conf/httpd.conf
DocumentRoot "/usr/local/mfs/share/mfscgi" <Directory "/usr/local/mfs/share/mfscgi"> #確保cgi模組的載入 LoadModule cgi_module modules/mod_cgi.so AddHandler cgi-script .cgi Alias /cgi-bin/ "/usr/local/mfs/share/mfscgi/" <Directory "/usr/local/mfs/share/mfscgi"> AllowOverride None Options ExecCGI Order allow,deny Allow from all AuthName "Mfs access" AuthType Basic AuthUserFile /etc/httpd/conf/htpasswd.users Require valid-user </Directory>
#啟動apache
systemctl start httpd.service
#設定自啟
systemctl enable httpd.service
#通過apache檢視mfs執行情況,首先apache需要登入認證
#mfs叢集資源資訊
#chunkserver資訊
#客戶端掛載資訊
五、MFS故障切換
#模擬主從節點自動切換
[[email protected] mfs]# crm crm(live)# node standby crm(live)# status #將原來的主節點上線 crm(live)# node online crm(live)# status
#master由mfs71切換到mfs72上
#模擬moosefs-master服務掛掉,現在的master是mfs72
[[email protected] ~]# systemctl stop moosefs-master.service
#在手動將moosefs-master關掉後,資源並沒有轉移,因為節點沒有故障,所以資源不會轉移,預設情況下,pacemaker不會對任何資源進行監控。所以,即便是資源關掉了,只要節點沒有故障,資源依然不會轉移。要想達到資源轉移的目的,得定義監控(monitor)。
#雖然我們在MFS資源定義中加了“monitor”選項,但發現並沒有起到作用,服務不會自動拉起,所以通過加監控指令碼的方式暫時解決。
cat /root/monitor_mfs.sh
#!/bin/bash #監控mfs服務的執行情況 while true do drbdstatus=`cat /proc/drbd 2> /dev/null | grep ro | tail -n1 | awk -F':' '{print $4}' | awk -F'/' '{print $1}'` #判斷drbd的狀態 mfsstatus=`/bin/systemctl status moosefs-master.service |grep active | grep -c running` #判斷mfs是否執行 if [ -z $drbdstatus ];then sleep 10 continue elif [ $drbdstatus == 'Primary' ];then #若drbd是Primary狀態 if [ $mfsstatus -eq 0 ];then #若mfs未執行 systemctl start moosefs-master.service &> /dev/null #啟動mfs服務 systemctl start moosefs-master.service &> /dev/null newmfsstatus=`/bin/systemctl status moosefs-master.service |grep active | grep -c running` #再次判斷mfs是否成功啟動 if [ $newmfsstatus -eq 0 ];then #若mfs未執行,也就是無法啟動 /bin/systemctl stop pacemaker.service &> /dev/null #將pacemaker服務stop掉,目的是自動切換到另一臺備用機 /bin/systemctl stop pacemaker.service &> /dev/null fi fi fi sleep 10 done
#新增執行許可權
[[email protected] ~]# chmod +x /root/monitor_mfs.sh [[email protected] ~]# nohup /root/monitor_mfs.sh &
#設定開機自啟動
[[email protected] ~]# echo "nohup /root/monitor_mfs.sh &" >> /etc/rc.local [[email protected] ~]# echo "nohup /root/monitor_mfs.sh &" >> /etc/rc.local
#執行monitor_mfs.sh指令碼後,首先會嘗試重啟本機的moosefs-master服務
#貼一個crmsh的配置檔案,在configure模式下,用edit來編輯
node 1: mfs71 \ attributes standby=off node 2: mfs72 \ attributes standby=off primitive drbdfs Filesystem \ params device="/dev/drbd1" directory="/data/mfs" fstype=ext4 \ op monitor interval=30s timeout=40s \ op start timeout=60 interval=0 \ op stop timeout=60 on-fail=restart interval=0 primitive mfsdrbd ocf:linbit:drbd \ params drbd_resource=mfs \ op start timeout=240s interval=0 \ op stop timeout=100s interval=0 \ op monitor role=Master interval=20s timeout=30s \ op monitor role=Slave interval=30s timeout=30s primitive mfsserver systemd:moosefs-master \ op monitor interval=20s timeout=15s on-fail=restart primitive mfsvip IPaddr \ params ip=192.168.5.77 \ op monitor interval=20 timeout=30 on-fail=restart ms ms_mfsdrbd mfsdrbd \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started order drbd_after_ms_mfsdrbd Mandatory: ms_mfsdrbd:promote drbdfs:start colocation drbd_with_ms_mfsdrbd inf: drbdfs ms_mfsdrbd:Master order mfsserver_after_drbdfs Mandatory: drbdfs:start mfsserver:start colocation mfsserver_with_drbdfs inf: mfsserver drbdfs order mfsvip_before_mfsserver Mandatory: mfsvip mfsserver colocation mfsvip_with_mfsserver inf: mfsvip mfsserver property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.18-11.el7_5.3-2b07d5c5a9 \ cluster-infrastructure=corosync \ cluster-name=mfscluster \ stonith-enabled=false \ no-quorum-policy=ignore