MySQL 5.6 GTID+MHA
MHA介紹
MHA(MasterHigh Availability Manager and Tools for MySQL),是日本的一位MySQL專家採用Perl語言編寫的一個指令碼管理工具,該工具僅適用於MySQLReplication(二層)環境,目的在於維持Master主庫的高可用性。它是基於標準的MySQL複製(非同步/半同步):
MHA有兩部分組成:MHAManager(管理節點)和MHA Node(資料節點);
MHA Manager可以單獨部署在一臺獨立機器上管理多個master-slave叢集,也可以部署在一臺slave上;
MHA Manager探測叢集的node節點,當發現master出現故障的時候,它可以自動將具有最新資料的slave提升為新的master,然後將所有其它的slave導向新的master上,整個故障轉移過程對應用程式是透明的;
MHA node執行在每臺MySQL伺服器上(master/slave/manager),它通過監控具備解析和清理logs功能的指令碼來加快故障轉移的;
MHA推薦一主二從架構,也支援一主一從架構。
MHA架構
MHA由MHAManager和MHANode組成:
MHA Manager:
執行一些工具,比如masterha_manager工具實現自動監控MySQL Master和實現master故障切換,其它工具實現手動實現master故障切換、線上mater轉移、連線檢查等等。一個Manager可以管理多個master-slave叢集,只需要在管理節點上部署即可。
Manager工具包: masterha_check_ssh : 檢查MHA的SSH配置。 masterha_check_repl : 檢查MySQL複製。 masterha_manager : 啟動MHA。 masterha_check_status : 檢測當前MHA執行狀態。 masterha_master_monitor : 監測master是否宕機。 masterha_master_switch : 控制故障轉移(自動或手動)。 masterha_conf_host : 新增或刪除配置的server資訊。 |
MHA Node:
儲存二進位制日誌:如果能夠訪問故障master,會拷貝master的二進位制日誌;
應用差異中繼日誌:從擁有最新資料的slave上生成差異中繼日誌,然後應用差異日誌。
清除中繼日誌:在不停止SQL執行緒的情況下刪除中繼日誌
Node工具(這些工具通常由MHA Manager的指令碼觸發,無需人手操作)。 save_binary_logs : 儲存和複製master的二進位制日誌。 apply_diff_relay_logs : 識別差異的中繼日誌事件並應用於其它slave。 filter_mysqlbinlog : 去除不必要的ROLLBACK事件(MHA已不再使用這個工具)。 purge_relay_logs : 清除中繼日誌(不會阻塞SQL執行緒)。 |
MHA工作機制
當master出現故障時,通過對比slave之間I/O執行緒讀取master binlog的位置,選取最接近的slave做為latest slave。其它slave通過與latest slave對比生成差異中繼日誌。在latest slave上應用從master儲存的binlog,同時將latest slave提升為master。最後在其它slave上應用相應的差異中繼日誌並開始從新的master開始複製。
在MHA實現Master故障切換過程中,MHANode會試圖訪問故障的master(通過SSH),如果可以訪問(不是硬體故障,比如InnoDB資料檔案損壞等),會儲存二進位制檔案,以最大程度保證資料不丟失。MHA和半同步複製一起使用會大大降低資料丟失的危險。
MHA優勢
1、故障切換迅速
在M/S複製叢集中,只要從庫在複製上沒有延遲,MHA通常可以在數秒內實現故障切換。9-10秒內檢查到master故障,可以選擇在7-10秒關閉master以避免出現裂腦,幾秒鐘內,將差異中繼日誌(relay log)應用到新的master上,因此總的宕機時間通常為10-30秒。恢復新的master後,MHA並行的恢復其餘的slave。即使在有數萬臺slave,也不會影響master的恢復時間。
2、master故障不會導致資料不一致
當目前的master出現故障是,MHA自動識別slave之間中繼日誌(relay log)的不同,並應用到所有的slave中。這樣所有的salve能夠保持同步,只要所有的slave處於存活狀態。和Semi-SynchronousReplication(半同步外掛)一起使用,(幾乎)可以保證沒有資料丟失。
3、無需修改當前的MySQL設定
MHA的設計的重要原則之一就是儘可能地簡單易用。MHA工作在傳統的MySQL版本5.0和之後版本的主從複製環境中。和其它高可用解決方法比,MHA並不需要改變MySQL的部署環境。MHA適用於非同步和半同步的主從複製。
啟動/停止/升級/降級/安裝/解除安裝MHA不需要改變(包擴啟動/停止)MySQL複製。當需要升級MHA到新的版本,不需要停止MySQL,僅僅替換到新版本的MHA,然後重啟MHA Manager就好了。
MHA執行在MySQL5.0開始的原生版本上。一些其它的MySQL高可用解決方案需要特定的版本(比如MySQL叢集、帶全域性事務ID的MySQL等等),但並不僅僅為了master的高可用才遷移應用的。在大多數情況下,已經部署了比較舊MySQL應用,並且不想僅僅為了實現Master的高可用,花太多的時間遷移到不同的儲存引擎或更新的前沿發行版。MHA工作的包括5.0/5.1/5.5的原生版本的MySQL上,所以並不需要遷移。
4、無需增加大量的伺服器
MHA由MHA Manager和MHANode組成。MHA Node執行在需要故障切換/恢復的MySQL伺服器上,因此並不需要額外增加伺服器。MHAManager執行在特定的伺服器上,因此需要增加一臺(實現高可用需要2臺),但是MHAManager可以監控大量(甚至上百臺)單獨的master,因此,並不需要增加大量的伺服器。即使在一臺slave上執行MHA Manager也是可以的。綜上,實現MHA並沒用額外增加大量的服務。
5、無效能下降
MHA適用與非同步或半同步的MySQL複製。監控master時,MHA僅僅是每隔幾秒(預設是3秒)傳送一個ping包,並不傳送重查詢。可以得到像原生MySQL複製一樣快的效能。
6、適用於任何儲存引擎
MHA可以執行在只要MySQL複製執行的儲存引擎上,並不僅限制於InnoDB,即使在不易遷移的傳統的MyISAM引擎環境,一樣可以使用MHA。
7、MHA 0.56版本已經支援MYSQLGTID複製
MYSQLGTID
全域性事務標示符(Global Transactions Identifier)是MySQL5.6複製的一個新特性,全域性事務 ID 的官方定義是:GTID= source_id:transaction_id
MySQL 5.6 中,每一個 GTID 代表一個數據庫事務。source_id 表示執行事務的主庫 uuid(server_uuid),transaction_id 是一個從 1 開始的自增計數,表示在這個主庫上執行的第 n 個事務。MySQL 會保證事務與 GTID 之間的 1 : 1 對映。GTID是全域性唯一性。
MYSQLGTID優勢
傳統複製方式:
比如:Server A的伺服器宕機,需要將業務切換到Server B上。同時,我們又需要將Server C的複製源改成Server B。複製源修改的命令語法很簡單即CHANGE MASTER TO MASTER_HOST='***',MASTER_LOG_FILE='***', MASTER_LOG_POS=N。而難點在於,由於同一個事務在每臺機器上所在的binlog名字和位置都不一樣,那麼怎麼找到Server C當前同步停止點,對應ServerB的master_log_file和master_log_pos就會很困難。
GTID複製方式:
在5.6的GTID出現後,就很簡單,由於同一事務的GTID在所有節點上的值一致,那麼根據Server C當前停止點的GTID就能唯一定位到Server B上的GTID。甚至由於MASTER_AUTO_POSITION功能的出現,不需要知道GTID的具體值,直接使用change master tomaster_host='****',master_user='****',master_password='*****',master_auto_position=1命令就可以直接完成failover的工作。
MYSQLGTID限制
1.不能在事務中建立和刪除臨時表;
2.無法在事務中對非事務儲存引擎進行更新;
3.不能使用create table tablename as select語句;
4.限制mysql_upgrade的執行;
5.不能使用一些無效的語句,skip_slave_skip_counter對GTID無效,應該改用gtid_next方式處理;
MYSQLGTID開啟
在[mysqld]下新增一下引數:
gtid-mode = ON
enforce-gtid-consistency = ON
在搭建M/S過程中,只需要執行: change master to master_host='XX',master_user='repl',master_password='XX',master_auto_position=1; 就能完成複製關係的建立 |
MHA安裝和配置
1.1MHA
MHA(Master High Availability)0.56版本支援GTID和一主一從架構,以下以一主一從再加manager Node為例。
1.2 拓撲圖
MHA分為Managernode 和 Agent node,此次設計,ManagerNode在192.168.1.108上,為獨立的伺服器,監控和管理整個叢集的狀態,192.168.1.103和192.168.1.104分別為Agent Node。
通過MHA軟體,提供MasterVIP(192.168.1.108),且跟隨Master自動漂移。
IP(VIP) |
MySQL資料庫 |
MHA |
192.168.1.106 |
MHA Manager Node |
|
192.168.1.103(192.168.1.108) |
Master |
Agent Node |
192.168.1.104 |
Slave |
Agent Node |
注:(1)所有MHA所在的伺服器需要配置root 使用者下SSH互信
1.3 安裝MHA軟體包
1.軟體包: mha4mysql-manager-0.56-0.el6.noarch.rpm --只需要在Manager節點上安裝 mha4mysql-node-0.56-0.el6.noarch.rpm perl-Config-Tiny-2.12-1.el6.rfx.noarch.rpm perl-Log-Dispatch-2.26-1.el6.rf.noarch.rpm perl-Parallel-ForkManager-0.7.5-2.2.el6.rf.noarch.rpm 1. 安裝所有軟體包(用yum安裝,需配置yum源,注意安裝先後順序) yum install mha4mysql-node-0.56-0.el6.noarch.rpm yum install perl-Config-Tiny-2.12-1.el6.rfx.noarch.rpm yum install perl-Log-Dispatch-2.26-1.el6.rf.noarch.rpm yum install perl-Parallel-ForkManager-0.7.5-2.2.el6.rf.noarch.rpm yum install mha4mysql-manager-0.56-0.el6.noarch.rpm |
1.4配置MHA
(1)Manager工具包: masterha_check_ssh : 檢查MHA的SSH配置。 masterha_check_repl : 檢查MySQL複製。 masterha_manager : 啟動MHA。 masterha_check_status : 檢測當前MHA執行狀態。 masterha_master_monitor : 監測master是否宕機。 masterha_master_switch : 控制故障轉移(自動或手動)。 masterha_conf_host : 新增或刪除配置的server資訊。 (2)Node工具(這些工具通常由MHA Manager的指令碼觸發,無需人手操作)。 save_binary_logs : 儲存和複製master的二進位制日誌。 apply_diff_relay_logs : 識別差異的中繼日誌事件並應用於其它slave。 filter_mysqlbinlog : 去除不必要的ROLLBACK事件(MHA已不再使用這個工具)。 purge_relay_logs : 清除中繼日誌(不會阻塞SQL執行緒)。 |
注:只需要在ManagerNode上配置,Agent Node上無需做任何配置,此次配置檔案放置在/etc/mha檔案系統下:
MHA配置檔案為app1.cnf,編輯app1.cnf檔案,新增以下內容: manager_workdir=/etc/mha ---MHA工作目錄 manager_log=/etc/mha/manager.log ---MHA日誌存放目錄 master_binlog_dir=/mysql/binlog/ ---MySQL資料庫binlog位置 password=111111 ---- 密碼 user=root ---MySQL使用者讀取relay log和改變複製關係需要使用 ping_interval=1 ---每一秒做一次健康檢查 remote_workdir=/etc/mha --- 遠端MHA工作目錄 repl_password=oavir61 --- MySQL複製使用者密碼 repl_user=root --- MySQL複製使用者 ssh_user=root ----ssh 互信使用者 client_bindir=/mysql/app/bin ---MySQL可執行檔案所在目錄 master_ip_failover_script=/etc/mha/master_ip_failover ---VIP繫結和切換指令碼 [server1] hostname=192.168.1.103 port=3306 master_binlog_dir=/mysql/binlog/ candidate_master=1 [server2] hostname=192.168.1.104 port=3306 master_binlog_dir=/mysql/binlog/ candidate_master=1 |
master_ip_failover指令碼,MHA官方提供了一個指令碼模板,但是需要自己做些修改,比如繫結和剔除VIP機制,在Master存活和Master當機兩種方式剔除VIP的機制等等:
#!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; use Net::Ping; use Switch; my ($command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password); GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, ); my $vip = '192.168.1.108'; # Virtual IP my $master_srv = '192.168.1.104'; my $timeout = 5; my $key = "1"; my $gateway = '192.168.1.1'; my $interface = 'eth1'; my $ssh_start_vip = "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s $vip $gateway >/dev/null 2>&1"; my $ssh_stop_vip = "/sbin/ifconfig $interface:$key down"; exit &main(); sub main { #print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { print "Disabling the VIP on old master if the server is still UP: $orig_master_host \n"; my $p=Net::Ping->new('icmp'); &stop_vip() if $p->ping($master_srv, $timeout); $p->close(); $exit_code = 0; }; if ([email protected]) { warn "Got Error: [email protected]\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ([email protected]) { warn [email protected]; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; #`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; } |
MHA配置資訊校驗
(1)校驗SSH互信: [[email protected] mha]# masterha_check_ssh --conf=/etc/mha/app1.cnf [[email protected] ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf Sun Aug 14 11:11:17 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Aug 14 11:11:17 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:11:17 2016 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:11:17 2016 - [info] Starting SSH connection tests.. Sun Aug 14 11:11:18 2016 - [debug] Sun Aug 14 11:11:17 2016 - [debug] Connecting via SSH from [email protected](192.168.1.104:22) to [email protected](192.168.1.103:22).. Sun Aug 14 11:11:18 2016 - [debug] ok. Sun Aug 14 11:11:18 2016 - [debug] Sun Aug 14 11:11:18 2016 - [debug] Connecting via SSH from [email protected](192.168.1.103:22) to [email protected](192.168.1.104:22).. Sun Aug 14 11:11:18 2016 - [debug] ok. Sun Aug 14 11:11:18 2016 - [info] All SSH connection tests passed successfully. (2)校驗MySQL主從複製關係: [[email protected] ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf Sun Aug 14 11:11:17 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Aug 14 11:11:17 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:11:17 2016 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:11:17 2016 - [info] Starting SSH connection tests.. Sun Aug 14 11:11:18 2016 - [debug] Sun Aug 14 11:11:17 2016 - [debug] Connecting via SSH from [email protected](192.168.1.104:22) to [email protected](192.168.1.103:22).. Sun Aug 14 11:11:18 2016 - [debug] ok. Sun Aug 14 11:11:18 2016 - [debug] Sun Aug 14 11:11:18 2016 - [debug] Connecting via SSH from [email protected](192.168.1.103:22) to [email protected](192.168.1.104:22).. Sun Aug 14 11:11:18 2016 - [debug] ok. Sun Aug 14 11:11:18 2016 - [info] All SSH connection tests passed successfully. [[email protected] ~]# masterha_check_repl --conf=/etc/mha/app1.cnf Sun Aug 14 11:12:56 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Aug 14 11:12:56 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:12:56 2016 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:12:56 2016 - [info] MHA::MasterMonitor version 0.56. Sun Aug 14 11:12:56 2016 - [info] GTID failover mode = 1 Sun Aug 14 11:12:56 2016 - [info] Dead Servers: Sun Aug 14 11:12:56 2016 - [info] Alive Servers: Sun Aug 14 11:12:56 2016 - [info] 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:12:56 2016 - [info] 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:12:56 2016 - [info] Alive Slaves: Sun Aug 14 11:12:56 2016 - [info] 192.168.1.104(192.168.1.104:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:12:56 2016 - [info] GTID ON Sun Aug 14 11:12:56 2016 - [info] Replicating from 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:12:56 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:12:56 2016 - [info] Current Alive Master: 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:12:56 2016 - [info] Checking slave configurations.. Sun Aug 14 11:12:56 2016 - [info] read_only=1 is not set on slave 192.168.1.104(192.168.1.104:3306). Sun Aug 14 11:12:56 2016 - [info] Checking replication filtering settings.. Sun Aug 14 11:12:56 2016 - [info] binlog_do_db= , binlog_ignore_db= Sun Aug 14 11:12:56 2016 - [info] Replication filtering check ok. Sun Aug 14 11:12:56 2016 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Sun Aug 14 11:12:56 2016 - [info] Checking SSH publickey authentication settings on the current master.. Sun Aug 14 11:12:57 2016 - [info] HealthCheck: SSH to 192.168.1.103 is reachable. Sun Aug 14 11:12:57 2016 - [info] 192.168.1.103(192.168.1.103:3306) (current master) +--192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:12:57 2016 - [info] Checking replication health on 192.168.1.104.. Sun Aug 14 11:12:57 2016 - [info] ok. Sun Aug 14 11:12:57 2016 - [info] Checking master_ip_failover_script status: Sun Aug 14 11:12:57 2016 - [info] /etc/mha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.103 --orig_master_ip=192.168.1.103 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 192.168.1.108;/sbin/arping -I eth1 -c 3 -s 192.168.1.108 192.168.1.1 >/dev/null 2>&1=== Checking the Status of the script.. OK Sun Aug 14 11:13:00 2016 - [info] OK. Sun Aug 14 11:13:00 2016 - [warning] shutdown_script is not defined. Sun Aug 14 11:13:00 2016 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK. 如果複製關係校驗成功,此時會在Master上自動繫結我們需要的VIP (3)如果(1)和(2)都OK,則表明MHA配置成功 |
1.5啟動MHA
nohup masterha_manager --conf=/etc/mha/app1.cnf < /dev/null >/etc/mha/manager.log 2>&1 & manager.log: Sun Aug 14 11:47:07 2016 - [info] MHA::MasterMonitor version 0.56. Sun Aug 14 11:47:07 2016 - [info] GTID failover mode = 1 Sun Aug 14 11:47:07 2016 - [info] Dead Servers: Sun Aug 14 11:47:07 2016 - [info] Alive Servers: Sun Aug 14 11:47:07 2016 - [info] 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:47:07 2016 - [info] 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:47:07 2016 - [info] Alive Slaves: Sun Aug 14 11:47:07 2016 - [info] 192.168.1.104(192.168.1.104:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:47:07 2016 - [info] GTID ON Sun Aug 14 11:47:07 2016 - [info] Replicating from 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:47:07 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:47:07 2016 - [info] Current Alive Master: 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:47:07 2016 - [info] Checking slave configurations.. Sun Aug 14 11:47:07 2016 - [info] read_only=1 is not set on slave 192.168.1.104(192.168.1.104:3306). Sun Aug 14 11:47:07 2016 - [info] Checking replication filtering settings.. Sun Aug 14 11:47:07 2016 - [info] binlog_do_db= , binlog_ignore_db= Sun Aug 14 11:47:07 2016 - [info] Replication filtering check ok. Sun Aug 14 11:47:07 2016 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Sun Aug 14 11:47:07 2016 - [info] Checking SSH publickey authentication settings on the current master.. Sun Aug 14 11:47:07 2016 - [info] HealthCheck: SSH to 192.168.1.103 is reachable. Sun Aug 14 11:47:07 2016 - [info] 192.168.1.103(192.168.1.103:3306) (current master) +--192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:47:07 2016 - [info] Checking master_ip_failover_script status: Sun Aug 14 11:47:07 2016 - [info] /etc/mha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.103 --orig_master_ip=192.168.1.103 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 192.168.1.108;/sbin/arping -I eth1 -c 3 -s 192.168.1.108 192.168.1.1 >/dev/null 2>&1=== Checking the Status of the script.. OK Sun Aug 14 11:47:10 2016 - [info] OK. Sun Aug 14 11:47:10 2016 - [warning] shutdown_script is not defined. Sun Aug 14 11:47:10 2016 - [info] Set master ping interval 1 seconds. Sun Aug 14 11:47:10 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Sun Aug 14 11:47:10 2016 - [info] Starting ping health check on 192.168.1.103(192.168.1.103:3306).. Sun Aug 14 11:47:10 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. 可以看到,MHA已經正常啟動,且在192.168.1.103上綁定了VIP:192.168.1.108 |
1.6模擬故障切換:
1.6.1 Master資料庫例項故障
1.關閉主庫MySQL例項 在Master上:service mysqld stop,檢視MHA manager日誌: Sun Aug 14 11:51:14 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Sun Aug 14 11:51:14 2016 - [info] Executing SSH check script: exit 0 Sun Aug 14 11:51:14 2016 - [info] HealthCheck: SSH to 192.168.1.103 is reachable. Sun Aug 14 11:51:15 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Aug 14 11:51:15 2016 - [warning] Connection failed 2 time(s).. Sun Aug 14 11:51:16 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Aug 14 11:51:16 2016 - [warning] Connection failed 3 time(s).. Sun Aug 14 11:51:17 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Aug 14 11:51:17 2016 - [warning] Connection failed 4 time(s).. Sun Aug 14 11:51:17 2016 - [warning] Master is not reachable from health checker! Sun Aug 14 11:51:17 2016 - [warning] Master 192.168.1.103(192.168.1.103:3306) is not reachable! Sun Aug 14 11:51:17 2016 - [warning] SSH is reachable. Sun Aug 14 11:51:17 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status.. Sun Aug 14 11:51:17 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Aug 14 11:51:17 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:51:17 2016 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:51:17 2016 - [info] GTID failover mode = 1 Sun Aug 14 11:51:17 2016 - [info] Dead Servers: Sun Aug 14 11:51:17 2016 - [info] 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:51:17 2016 - [info] Alive Servers: Sun Aug 14 11:51:17 2016 - [info] 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:51:17 2016 - [info] Alive Slaves: Sun Aug 14 11:51:17 2016 - [info] 192.168.1.104(192.168.1.104:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:51:17 2016 - [info] GTID ON Sun Aug 14 11:51:17 2016 - [info] Replicating from 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:51:17 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:51:17 2016 - [info] Checking slave configurations.. Sun Aug 14 11:51:17 2016 - [info] read_only=1 is not set on slave 192.168.1.104(192.168.1.104:3306). Sun Aug 14 11:51:17 2016 - [info] Checking replication filtering settings.. Sun Aug 14 11:51:17 2016 - [info] Replication filtering check ok. Sun Aug 14 11:51:17 2016 - [info] Master is down! Sun Aug 14 11:51:17 2016 - [info] Terminating monitoring script. Sun Aug 14 11:51:17 2016 - [info] Got exit code 20 (Master dead). Sun Aug 14 11:51:17 2016 - [info] MHA::MasterFailover version 0.56. Sun Aug 14 11:51:17 2016 - [info] Starting master failover. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] * Phase 1: Configuration Check Phase.. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] GTID failover mode = 1 Sun Aug 14 11:51:17 2016 - [info] Dead Servers: Sun Aug 14 11:51:17 2016 - [info] 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:51:17 2016 - [info] Checking master reachability via MySQL(double check)... Sun Aug 14 11:51:17 2016 - [info] ok. Sun Aug 14 11:51:17 2016 - [info] Alive Servers: Sun Aug 14 11:51:17 2016 - [info] 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:51:17 2016 - [info] Alive Slaves: Sun Aug 14 11:51:17 2016 - [info] 192.168.1.104(192.168.1.104:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:51:17 2016 - [info] GTID ON Sun Aug 14 11:51:17 2016 - [info] Replicating from 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:51:17 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:51:17 2016 - [info] Starting GTID based failover. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] ** Phase 1: Configuration Check Phase completed. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] Forcing shutdown so that applications never connect to the current master.. Sun Aug 14 11:51:17 2016 - [info] Executing master IP deactivation script: Sun Aug 14 11:51:17 2016 - [info] /etc/mha/master_ip_failover --orig_master_host=192.168.1.103 --orig_master_ip=192.168.1.103 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 192.168.1.108;/sbin/arping -I eth1 -c 3 -s 192.168.1.108 192.168.1.1 >/dev/null 2>&1=== Disabling the VIP on old master if the server is still UP: 192.168.1.103 Sun Aug 14 11:51:17 2016 - [info] done. Sun Aug 14 11:51:17 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Sun Aug 14 11:51:17 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] * Phase 3: Master Recovery Phase.. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] The latest binary log file/position on all slaves is bin.000014:231 Sun Aug 14 11:51:17 2016 - [info] Latest slaves (Slaves that received relay log files to the latest): Sun Aug 14 11:51:17 2016 - [info] 192.168.1.104(192.168.1.104:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:51:17 2016 - [info] GTID ON Sun Aug 14 11:51:17 2016 - [info] Replicating from 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:51:17 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:51:17 2016 - [info] The oldest binary log file/position on all slaves is bin.000014:231 Sun Aug 14 11:51:17 2016 - [info] Oldest slaves: Sun Aug 14 11:51:17 2016 - [info] 192.168.1.104(192.168.1.104:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:51:17 2016 - [info] GTID ON Sun Aug 14 11:51:17 2016 - [info] Replicating from 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:51:17 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] * Phase 3.3: Determining New Master Phase.. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] Searching new master from slaves.. Sun Aug 14 11:51:17 2016 - [info] Candidate masters from the configuration file: Sun Aug 14 11:51:17 2016 - [info] 192.168.1.104(192.168.1.104:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:51:17 2016 - [info] GTID ON Sun Aug 14 11:51:17 2016 - [info] Replicating from 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:51:17 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:51:17 2016 - [info] Non-candidate masters: Sun Aug 14 11:51:17 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Sun Aug 14 11:51:17 2016 - [info] New master is 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:51:17 2016 - [info] Starting master failover.. Sun Aug 14 11:51:17 2016 - [info] From: 192.168.1.103(192.168.1.103:3306) (current master) +--192.168.1.104(192.168.1.104:3306) To: 192.168.1.104(192.168.1.104:3306) (new master) Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] * Phase 3.3: New Master Recovery Phase.. Sun Aug 14 11:51:17 2016 - [info] Sun Aug 14 11:51:17 2016 - [info] Waiting all logs to be applied.. Sun Aug 14 11:51:17 2016 - [info] done. Sun Aug 14 11:51:17 2016 - [info] Getting new master's binlog name and position.. Sun Aug 14 11:51:17 2016 - [info] bin.000015:231 Sun Aug 14 11:51:17 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.104', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Sun Aug 14 11:51:17 2016 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: bin.000015, 231, 1683955a-6102-11e6-8b6f-080027ca1592:1-8, 1c3a7f53-6102-11e6-8b6f-08002722cc6d:1-10 Sun Aug 14 11:51:17 2016 - [info] Executing master IP activate script: Sun Aug 14 11:51:17 2016 - [info] /etc/mha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.1.103 --orig_master_ip=192.168.1.103 --orig_master_port=3306 --new_master_host=192.168.1.104 --new_master_ip=192.168.1.104 --new_master_port=3306 --new_master_user='root' --new_master_password='111111' IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 192.168.1.108;/sbin/arping -I eth1 -c 3 -s 192.168.1.108 192.168.1.1 >/dev/null 2>&1=== Enabling the VIP - 192.168.1.108 on the new master - 192.168.1.104 Sun Aug 14 11:51:20 2016 - [info] OK. Sun Aug 14 11:51:20 2016 - [info] ** Finished master recovery successfully. Sun Aug 14 11:51:20 2016 - [info] * Phase 3: Master Recovery Phase completed. Sun Aug 14 11:51:20 2016 - [info] Sun Aug 14 11:51:20 2016 - [info] * Phase 4: Slaves Recovery Phase.. Sun Aug 14 11:51:20 2016 - [info] Sun Aug 14 11:51:20 2016 - [info] Sun Aug 14 11:51:20 2016 - [info] * Phase 4.1: Starting Slaves in parallel.. Sun Aug 14 11:51:20 2016 - [info] Sun Aug 14 11:51:20 2016 - [info] All new slave servers recovered successfully. Sun Aug 14 11:51:20 2016 - [info] Sun Aug 14 11:51:20 2016 - [info] * Phase 5: New master cleanup phase.. Sun Aug 14 11:51:20 2016 - [info] Sun Aug 14 11:51:20 2016 - [info] Resetting slave info on the new master.. Sun Aug 14 11:51:20 2016 - [info] 192.168.1.104: Resetting slave info succeeded. Sun Aug 14 11:51:20 2016 - [info] Master failover to 192.168.1.104(192.168.1.104:3306) completed successfully. Sun Aug 14 11:51:20 2016 - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.1.103(192.168.1.103:3306) to 192.168.1.104(192.168.1.104:3306) succeeded Master 192.168.1.103(192.168.1.103:3306) is down! Check MHA Manager logs at lab1:/etc/mha/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 192.168.1.103(192.168.1.103:3306) Selected 192.168.1.104(192.168.1.104:3306) as a new master. 192.168.1.104(192.168.1.104:3306): OK: Applying all logs succeeded. 192.168.1.104(192.168.1.104:3306): OK: Activated master IP address. 192.168.1.104(192.168.1.104:3306): Resetting slave info succeeded. Master failover to 192.168.1.104(192.168.1.104:3306) completed successfully. 從日誌中可以看到,master切換至192.168.1.104 slave上,並且VIP也隨之繫結到192.168.1.104 對應的網絡卡裝置上。 2.主庫伺服器當機 (1)把192.168.1.03與192.168.1.104複製關係重新建立: change master to master_host='192.168.1.104',master_user='repl',master_password='111111',master_auto_position=1; start slave; (2)重新啟動MHA: nohup masterha_manager --conf=/etc/mha/app1.cnf < /dev/null >/etc/mha/app1.log 2>&1 & (3)直接關閉192.168.1.104伺服器,檢視切換情況: Shutdown –h now 對應MHA manager日誌: Sun Aug 14 11:59:09 2016 - [info] MHA::MasterMonitor version 0.56. Sun Aug 14 11:59:09 2016 - [info] GTID failover mode = 1 Sun Aug 14 11:59:09 2016 - [info] Dead Servers: Sun Aug 14 11:59:09 2016 - [info] Alive Servers: Sun Aug 14 11:59:09 2016 - [info] 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:59:09 2016 - [info] 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:59:09 2016 - [info] Alive Slaves: Sun Aug 14 11:59:09 2016 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:59:09 2016 - [info] GTID ON Sun Aug 14 11:59:09 2016 - [info] Replicating from 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:59:09 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:59:09 2016 - [info] Current Alive Master: 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:59:09 2016 - [info] Checking slave configurations.. Sun Aug 14 11:59:09 2016 - [info] read_only=1 is not set on slave 192.168.1.103(192.168.1.103:3306). Sun Aug 14 11:59:09 2016 - [info] Checking replication filtering settings.. Sun Aug 14 11:59:09 2016 - [info] binlog_do_db= , binlog_ignore_db= Sun Aug 14 11:59:09 2016 - [info] Replication filtering check ok. Sun Aug 14 11:59:09 2016 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Sun Aug 14 11:59:09 2016 - [info] Checking SSH publickey authentication settings on the current master.. Sun Aug 14 11:59:09 2016 - [info] HealthCheck: SSH to 192.168.1.104 is reachable. Sun Aug 14 11:59:09 2016 - [info] 192.168.1.104(192.168.1.104:3306) (current master) +--192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:59:09 2016 - [info] Checking master_ip_failover_script status: Sun Aug 14 11:59:09 2016 - [info] /etc/mha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.104 --orig_master_ip=192.168.1.104 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 192.168.1.108;/sbin/arping -I eth1 -c 3 -s 192.168.1.108 192.168.1.1 >/dev/null 2>&1=== Checking the Status of the script.. OK Sun Aug 14 11:59:12 2016 - [info] OK. Sun Aug 14 11:59:12 2016 - [warning] shutdown_script is not defined. Sun Aug 14 11:59:12 2016 - [info] Set master ping interval 1 seconds. Sun Aug 14 11:59:12 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Sun Aug 14 11:59:12 2016 - [info] Starting ping health check on 192.168.1.104(192.168.1.104:3306).. Sun Aug 14 11:59:12 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. Sun Aug 14 11:59:23 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Sun Aug 14 11:59:23 2016 - [info] Executing SSH check script: exit 0 Sun Aug 14 11:59:23 2016 - [warning] HealthCheck: SSH to 192.168.1.104 is NOT reachable. Sun Aug 14 11:59:24 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Aug 14 11:59:24 2016 - [warning] Connection failed 2 time(s).. Sun Aug 14 11:59:25 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Aug 14 11:59:25 2016 - [warning] Connection failed 3 time(s).. Sun Aug 14 11:59:26 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun Aug 14 11:59:26 2016 - [warning] Connection failed 4 time(s).. Sun Aug 14 11:59:26 2016 - [warning] Master is not reachable from health checker! Sun Aug 14 11:59:26 2016 - [warning] Master 192.168.1.104(192.168.1.104:3306) is not reachable! Sun Aug 14 11:59:26 2016 - [warning] SSH is NOT reachable. Sun Aug 14 11:59:26 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status.. Sun Aug 14 11:59:26 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Aug 14 11:59:26 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:59:26 2016 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sun Aug 14 11:59:27 2016 - [info] GTID failover mode = 1 Sun Aug 14 11:59:27 2016 - [info] Dead Servers: Sun Aug 14 11:59:27 2016 - [info] 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:59:27 2016 - [info] Alive Servers: Sun Aug 14 11:59:27 2016 - [info] 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:59:27 2016 - [info] Alive Slaves: Sun Aug 14 11:59:27 2016 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:59:27 2016 - [info] GTID ON Sun Aug 14 11:59:27 2016 - [info] Replicating from 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:59:27 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:59:27 2016 - [info] Checking slave configurations.. Sun Aug 14 11:59:27 2016 - [info] read_only=1 is not set on slave 192.168.1.103(192.168.1.103:3306). Sun Aug 14 11:59:27 2016 - [info] Checking replication filtering settings.. Sun Aug 14 11:59:27 2016 - [info] Replication filtering check ok. Sun Aug 14 11:59:27 2016 - [info] Master is down! Sun Aug 14 11:59:27 2016 - [info] Terminating monitoring script. Sun Aug 14 11:59:27 2016 - [info] Got exit code 20 (Master dead). Sun Aug 14 11:59:27 2016 - [info] MHA::MasterFailover version 0.56. Sun Aug 14 11:59:27 2016 - [info] Starting master failover. Sun Aug 14 11:59:27 2016 - [info] Sun Aug 14 11:59:27 2016 - [info] * Phase 1: Configuration Check Phase.. Sun Aug 14 11:59:27 2016 - [info] Sun Aug 14 11:59:27 2016 - [info] GTID failover mode = 1 Sun Aug 14 11:59:27 2016 - [info] Dead Servers: Sun Aug 14 11:59:27 2016 - [info] 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:59:27 2016 - [info] Checking master reachability via MySQL(double check)... Sun Aug 14 11:59:28 2016 - [info] ok. Sun Aug 14 11:59:28 2016 - [info] Alive Servers: Sun Aug 14 11:59:28 2016 - [info] 192.168.1.103(192.168.1.103:3306) Sun Aug 14 11:59:28 2016 - [info] Alive Slaves: Sun Aug 14 11:59:28 2016 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.6.30-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled Sun Aug 14 11:59:28 2016 - [info] GTID ON Sun Aug 14 11:59:28 2016 - [info] Replicating from 192.168.1.104(192.168.1.104:3306) Sun Aug 14 11:59:28 2016 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 14 11:59:28 2016 - [info] Starting GTID based failover. Sun Aug 14 11:59:28 2016 - [info] Sun Aug 14 11:59:28 2016 - [info] ** Phase 1: Configuration Check Phase completed. Sun Aug 14 11:59:28 2016 - [info] Sun Aug 14 11:59:28 2016 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun Aug 14 11:59:28 2016 - [info] Sun Aug 14 11:59:28 2016 - [info] Forcing shutdown so that applications never connect to the current master.. Sun Aug 14 11:59:28 2016 - [info] Executing mas |