pt-heartbeat監控mysql主從複製延遲整理

對於MySQL資料庫主從複製延遲的監控，可以藉助percona的有力武器pt-heartbeat來實現。

pt-heartbeat的工作原理通過使用時間戳方式在主庫上更新特定表，然後在從庫上讀取被更新的時間戳然後與本地系統時間對比來得出其延遲。具體流程：
1）在主上建立一張heartbeat表，按照一定的時間頻率更新該表的欄位（把時間更新進去）。監控操作執行後，heartbeat表能促使主從同步！
2）連線到從庫上檢查複製的時間記錄，和從庫的當前系統時間進行比較，得出時間的差異。

使用方法（主從和從庫上都可以執行監控操作）：
pt-heartbeat [OPTIONS] [DSN] --update|--monitor|--check|--stop

注意：需要指定的引數至少有 --stop，--update，--monitor，--check。
其中--update，--monitor和--check是互斥的，--daemonize和--check也是互斥。
--ask-pass     隱式輸入MySQL密碼
--charset     字符集設定
--check      檢查從的延遲，檢查一次就退出，除非指定了--recurse會遞迴的檢查所有的從伺服器。
--check-read-only    如果從伺服器開啟了只讀模式，該工具會跳過任何插入。
--create-table    在主上建立心跳監控的表，如果該表不存在，可以自己手動建立，建議儲存引擎改成memory。通過更新該表知道主從延遲的差距。
CREATE TABLE heartbeat (
  ts                    varchar(26) NOT NULL,
  server_id             int unsigned NOT NULL PRIMARY KEY,
  file                  varchar(255) DEFAULT NULL,
  position              bigint unsigned DEFAULT NULL,
  relay_master_log_file varchar(255) DEFAULT NULL,
  exec_master_log_pos   bigint unsigned DEFAULT NULL
);
heratbeat   表一直在更改ts和position,而ts是我們檢查複製延遲的關鍵。
--daemonize   執行時，放入到後臺執行
--user=-u，   連線資料庫的帳號
--database=-D，    連線資料庫的名稱
--host=-h，     連線的資料庫地址
--password=-p，     連線資料庫的密碼
--port=-P，     連線資料庫的埠
--socket=-S，    連線資料庫的套接字檔案
--file 【--file=output.txt】   列印--monitor最新的記錄到指定的檔案，很好的防止滿螢幕都是資料的煩惱。
--frames 【--frames=1m,2m,3m】  在--monitor裡輸出的[]裡的記錄段，預設是1m,5m,15m。可以指定1個，如：--frames=1s，多個用逗號隔開。可用單位有秒（s）、分鐘（m）、小時（h）、天（d）。
--interval   檢查、更新的間隔時間。預設是見是1s。最小的單位是0.01s，最大精度為小數點後兩位，因此0.015將調整至0.02。
--log    開啟daemonized模式的所有日誌將會被列印到制定的檔案中。
--monitor    持續監控從的延遲情況。通過--interval指定的間隔時間，打印出從的延遲資訊，通過--file則可以把這些資訊列印到指定的檔案。
--master-server-id    指定主的server_id，若沒有指定則該工具會連到主上查詢其server_id。
--print-master-server-id    在--monitor和--check 模式下，指定該引數則打印出主的server_id。
--recurse    多級複製的檢查深度。模式M-S-S...不是最後的一個從都需要開啟log_slave_updates，這樣才能檢查到。
--recursion-method     指定複製檢查的方式,預設為processlist,hosts。
--update    更新主上的心跳錶。
--replace     使用--replace代替--update模式更新心跳錶裡的時間欄位，這樣的好處是不用管表裡是否有行。
--stop    停止執行該工具（--daemonize），在/tmp/目錄下建立一個“pt-heartbeat-sentinel” 檔案。後面想重新開啟則需要把該臨時檔案刪除，才能開啟（--daemonize）。
--table   指定心跳錶名，預設heartbeat。

例項說明：
master：192.168.1.101
slave：192.168.1.102
同步的庫：huanqiu、huanpc
主從庫都能使用root賬號、密碼123456登入

先操作針對huanqiu庫的檢查，其他同步的庫的檢查操作類似！

mysql> use huanqiu;                   
Database changed
 
mysql> CREATE TABLE heartbeat (            //主庫上的對應庫下建立heartbeat表，一般建立後從庫會同步這張表（不同步的話，就在從庫那邊手動也手動建立）
    ->   ts                    varchar(26) NOT NULL,
    ->   server_id             int unsigned NOT NULL PRIMARY KEY,
    ->   file                  varchar(255) DEFAULT NULL,
    ->   position              bigint unsigned DEFAULT NULL,
    ->   relay_master_log_file varchar(255) DEFAULT NULL,
    ->   exec_master_log_pos   bigint unsigned DEFAULT NULL
    -> );
Query OK, 0 rows affected (0.02 sec)

更新主庫上的heartbeat,--interval=1表示1秒鐘更新一次（注意這個啟動操作要在主庫伺服器上執行）
[[email protected] ~]# pt-heartbeat --user=root --ask-pass --host=192.168.1.101 --create-table -D huanqiu --interval=1 --update --replace --daemonize
Enter password:
[[email protected] ~]#
[[email protected] ~]# ps -ef|grep pt-heartbeat
root 15152 1 0 19:49 ? 00:00:00 perl /usr/bin/pt-heartbeat --user=root --ask-pass --host=192.168.1.101 --create-table -D huanqiu --interval=1 --update --replace --daemonize
root 15154 14170 0 19:49 pts/3 00:00:00 grep pt-heartbeat

在主庫執行監測同步延遲：
[[email protected] ~]# pt-heartbeat -D huanqiu --table=heartbeat --monitor --host=192.168.1.102 --user=root --password=123456
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
........
解釋：0表示從沒有延遲。 [ 0.00s, 0.00s, 0.00s ] 表示1m,5m,15m的平均值。可以通過--frames去設定。

或者加上--master-server-id引數（主庫my.cnf裡配置的server-id值）
[[email protected] ~]# pt-heartbeat -D huanqiu --table=heartbeat --monitor --host=192.168.1.102 --user=root --password=123456 --master-server-id=101
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
.........
也可以將主庫的server-id打印出來（--print-master-server-id）
[[email protected] ~]# pt-heartbeat -D huanqiu --table=heartbeat --monit --host=192.168.1.102 --user=root --password=123456 --print-master-server-id
0.00s [ 0.00s, 0.00s, 0.00s ] 101
0.00s [ 0.00s, 0.00s, 0.00s ] 101
0.00s [ 0.00s, 0.00s, 0.00s ] 101
0.00s [ 0.00s, 0.00s, 0.00s ] 101
.........
[[email protected] ~]# pt-heartbeat -D huanqiu --table=heartbeat --check --host=192.168.1.102 --user=root --password=123456 --print-master-server-id
0.00 101

上面的監測命令會一直在執行狀態中，可以使用--check監測一次就退出
注意：使用了--check，就不能使用--monit
--update，--monitor和--check是互斥的，--daemonize和--check也是互斥。

[[email protected] ~]# pt-heartbeat -D huanqiu --table=heartbeat --check --host=192.168.1.102 --user=root --password=123456
0.00
[[email protected] ~]#

注意：
如果想把這個輸出結果加入自動化監控，那麼可以使用如下命令使監控輸出寫到檔案，然後使用指令碼定期過濾檔案中的最大值作為預警即可：
注意--log選項必須在有--daemonize引數的時候才會列印到檔案中，且這個檔案的路徑最好在/tmp下，否則可能因為許可權問題無法建立
[[email protected] ~]# pt-heartbeat -D huanqiu --table=heartbeat --monitor --host=192.168.1.102 --user=root --password=123456 --log=/opt/master-slave.txt --daemonize
[[email protected] ~]# tail -f /opt/master-slave.txt //可以測試，在主庫上更新資料時，從庫上是否及時同步，如不同步，可以在這裡看到監控的延遲資料
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
.......

下面是編寫的主從同步延遲監控指令碼，就是定期過濾--log檔案中最大值（此指令碼執行的前提是：啟動更新主庫heartbeat命令以及帶上--log的同步延遲檢測命令）。如果發生延遲，傳送報警郵件。sendemail郵件傳送環境部署參考：
[[email protected] ~]# cat /root/check-slave-monit.sh

#!/bin/bash
cat /opt/master-slave.txt > /opt/master_slave.txt
echo > /opt/master-slave.txt
max_time=`cat /opt/master_slave.txt |grep -v '^$' |awk '{print $1}' |sort -k1nr |head -1`
NUM=$(echo "$max_time"|cut -d"s" -f1)
if [ $NUM == "0.00" ];then
   echo "Mysql主從資料一致"
else
   /usr/local/bin/sendEmail -f [email protected] -t [email protected] -s smtp.huanqiu.cn -u "Mysql主從同步延遲" -o message-content-type=html -o message-charset=utf8 -xu [email protected] -xp [email protected]$ -m "Mysql主從資料同步有延遲"
fi

[[email protected] ~]# chmod /root/check-slave-monit.sh
[[email protected] ~]# sh /root/check-slave-monit.sh
Mysql主從資料一致

結合crontab，每隔一分鐘檢查一次

[[email protected] ~]# crontab -e
#mysql主從同步延遲檢查
* * * * * /bin/bash -x /root/check-slave-monit.sh > /dev/null 2>&1

在從庫上執行監測同步延遲(也可以在命令後加上--master-server-id=101或--print-master-server-id，同上操作)
[[email protected] src]# pt-heartbeat -D huanqiu --table=heartbeat --monitor --user=root --password=123456
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
........
[[email protected] src]# pt-heartbeat -D huanqiu --table=heartbeat --user=root --password=123456 --check
0.00
[[email protected] src]# pt-heartbeat -D huanqiu --table=heartbeat --monitor --user=root --password=123456 --log=/opt/master-slave.txt --daemonize
[[email protected] src]# tail -f /opt/master-slave.txt
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]
0.00s [ 0.00s, 0.00s, 0.00s ]

如何關閉上面在主庫上執行的heartbeat更新程序呢？
方法一：可以用引數--stop去關閉
[[email protected] ~]# ps -ef|grep heartbeat
root 15152 1 0 19:49 ? 00:00:02 perl /usr/bin/pt-heartbeat --user=root --ask-pass --host=192.168.1.101 --create-table -D huanqiu --interval=1 --update --replace --daemonize
root 15310 1 0 19:59 ? 00:00:01 perl /usr/bin/pt-heartbeat -D huanqiu --table=heartbeat --monitor --host=192.168.1.102 --user=root --password=123456 --log=/opt/master-slave.txt --daemonize
root 15555 31932 0 20:13 pts/2 00:00:00 grep heartbeat
[[email protected] ~]# pt-heartbeat --stop
Successfully created file /tmp/pt-heartbeat-sentinel
[[email protected] ~]# ps -ef|grep heartbeat
root 15558 31932 0 20:14 pts/2 00:00:00 grep heartbeat
[[email protected] ~]#

這樣就把在主上開啟的程序殺掉了。
但是後續要繼續開啟後臺進行的話，記住一定要先把/tmp/pt-heartbeat-sentinel 檔案刪除，否則啟動不了

方法二：直接kill掉程序pid（推薦這種方法）
[[email protected] ~]# ps -ef|grep heartbeat
root 15152 1 0 19:49 ? 00:00:02 perl /usr/bin/pt-heartbeat --user=root --ask-pass --host=192.168.1.101 --create-table -D huanqiu --interval=1 --update --replace --daemonize
root 15310 1 0 19:59 ? 00:00:01 perl /usr/bin/pt-heartbeat -D huanqiu --table=heartbeat --monitor --host=192.168.1.102 --user=root --password=123456 --log=/opt/master-slave.txt --daemonize
root 15555 31932 0 20:13 pts/2 00:00:00 grep heartbeat
[[email protected] ~]# kill -9 15152
[[email protected] ~]# ps -ef|grep heartbeat
root 15558 31932 0 20:14 pts/2 00:00:00 grep heartbeat

最後總結：
通過pt-heartbeart工具可以很好的彌補預設主從延遲的問題，但需要搞清楚該工具的原理。
預設的Seconds_Behind_Master值是通過將伺服器當前的時間戳與二進位制日誌中的事件時間戳相對比得到的，所以只有在執行事件時才能報告延時。備庫複製執行緒沒有執行，也會報延遲null。
還有一種情況：大事務，一個事務更新資料長達一個小時，最後提交。這條更新將比它實際發生時間要晚一個小時才記錄到二進位制日誌中。當備庫執行這條語句時，會臨時地報告備庫延遲為一個小時，執行完後又很快變成0。

---------------------------------------percona-toolkit其他元件命令用法----------------------------------

下面這些工具最好不要直接在線上使用，應該作為上線輔助或故障後離線分析的工具，也可以做效能測試的時候配合著使用。

1）pt-online-schema-change
功能介紹：
功能為:在alter操作更改表結構的時候不用鎖定表，也就是說執行alter的時候不會阻塞寫和讀取操作，注意執行這個工具的時候必須做好備份，操作之前最好要充分了解它的原理。
工作原理是:建立一個和你要執行alter操作的表一樣的空表結構，執行表結構修改，然後從原表中copy原始資料到表結構修改後的表，當資料copy完成以後就會將原表移走，用新表代替原表，預設動作是將原表drop掉。在copy資料的過程中，任何在原表的更新操作都會更新到新表，因為這個工具在會在原表上建立觸發器，觸發器會將在原表上更新的內容更新到新表。如果表中已經定義了觸發器這個工具就不能工作了。

用法介紹：
pt-online-schema-change [OPTIONS] DSN
options可以自行檢視help（或加--help檢視有哪些選項），DNS為你要操作的資料庫和表。
有兩個引數需要注意一下：
--dry-run 這個引數不建立觸發器，不拷貝資料，也不會替換原表。只是建立和更改新表。
--execute 這個引數的作用和前面工作原理的介紹的一樣，會建立觸發器，來保證最新變更的資料會影響至新表。注意：如果不加這個引數，這個工具會在執行一些檢查後退出。這一舉措是為了讓使用這充分了解了這個工具的原理。

使用示例：
線上更改表的的引擎，這個尤其在整理innodb表的時候非常有用，如下huanqiu庫的haha表預設是Myisam儲存引擎，現需要線上修改成Innodb型別。

mysql> show create table huanqiu.haha;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                             |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| haha  | CREATE TABLE `haha` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(50) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=91 DEFAULT CHARSET=utf8 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

修改操作如下：

[[email protected] ~]# pt-online-schema-change --user=root --password=123456 --host=localhost --alter="ENGINE=InnoDB" D=huanqiu,t=haha --execute --check-replication-filters
Found 1 slaves:
  slave-server
Will check slave lag on:
  slave-server
Replication filters are set on these hosts:
  slave-server
    slave_skip_errors = ALL
    replicate_ignore_db = mysql
    replicate_do_db = huanqiu,huanpc
Please read the --check-replication-filters documentation to learn how to solve this problem. at /usr/bin/pt-online-schema-change line 8083.

如上命令就是在主庫上操作的，會提示它有從庫，需要新增引數--nocheck-replication-filters，即不檢查從庫。（注意：下面命令中可以將localhost換成主庫ip。另外：該命令只能針對某張表進行修改，因為它是針對alter操作的，而alter是針對表的操作命令。所以不能省略命令中"t=表名"的選項）

[[email protected] ~]# pt-online-schema-change --user=root --password=123456 --host=localhost --alter="ENGINE=InnoDB" D=huanqiu,t=haha --execute --nocheck-replication-filters       
Found 1 slaves:
  slave-server
.......
2017-01-16T10:36:33 Dropped old table `huanqiu`.`_haha_old` OK.
2017-01-16T10:36:33 Dropping triggers...
2017-01-16T10:36:33 Dropped triggers OK.
Successfully altered `huanqiu`.`haha`.

然後再次檢視huanqiu.haha表的儲存引擎，發現已是Innodb型別的了。

mysql> show create table huanqiu.haha;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                             |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| haha  | CREATE TABLE `haha` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(50) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=91 DEFAULT CHARSET=utf8 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

若是在從庫上，則可以直接執行（也可以將下面從庫ip替換成localhost）

[[email protected] ~]# pt-online-schema-change --user=root --password=123456 --host=192.168.1.102 --alter="ENGINE=InnoDB" D=huanqiu,t=haha --execute
No slaves found.  See --recursion-method if host slave-server has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
.......
2017-01-15T21:40:35 Swapped original and new tables OK.
2017-01-15T21:40:35 Dropping old table...
2017-01-15T21:40:35 Dropped old table `huanqiu`.`_haha_old` OK.
2017-01-15T21:40:35 Dropping triggers...
2017-01-15T21:40:35 Dropped triggers OK.
Successfully altered `huanqiu`.`haha`.

2）pt-duplicate-key-checker
功能介紹：
功能為從mysql表中找出重複的索引和外來鍵，這個工具會將重複的索引和外來鍵都列出來，並生成了刪除重複索引的語句，非常方便
用法介紹：
pt-duplicate-key-checker [OPTION...] [DSN]
包含比較多的選項，具體的可以通過命令pt-duplicate-key-checker --help來檢視具體支援那些選項，我這裡就不一一列舉了。DNS為資料庫或者表。
使用示例：
檢視huanqiu庫或huanqiu.haha表的重複索引和外來鍵使用情況使用，如下命令：

[[email protected] ~]# pt-duplicate-key-checker  --host=localhost --user=root --password=123456  --databases=huanqiu
# ########################################################################
# Summary of indexes                                                     
# ########################################################################
 
# Total Indexes  6

# [[email protected] ~]# pt-duplicate-key-checker  --host=localhost --user=root --password=123456  --databases=huanqiu --table=haha
# ########################################################################
# Summary of indexes                                                     
# ########################################################################
 
# Total Indexes  1

3）pt-slave-find
功能介紹：
查詢和列印mysql所有從伺服器複製層級關係
用法介紹：
pt-slave-find [OPTION...] MASTER-HOST
原理：連線mysql主伺服器並查詢其所有的從，然後打印出所有從伺服器的層級關係。
使用示例：
查詢主伺服器為192.168.1.101的mysql有所有從的層級關係（將下面的192.168.1.101改成localhost，就是查詢本機mysql的從關係）：

[[email protected] ~]# pt-slave-find --user=root --password=123456 --host=192.168.1.101
192.168.1.101
Version         5.6.33-log
Server ID       101
Uptime          5+02:59:42 (started 2017-01-11T10:44:14)
Replication     Is not a slave, has 1 slaves connected, is not read_only
Filters        
Binary logging  MIXED
Slave status   
Slave mode      STRICT
Auto-increment  increment 1, offset 1
InnoDB version  5.6.33
+- 192.168.1.102
   Version         5.6.34-log
   Server ID       102
   Uptime          4+22:22:18 (started 2017-01-11T15:21:38)
   Replication     Is a slave, has 0 slaves connected, is not read_only
   Filters         slave_skip_errors=ALL; replicate_ignore_db=mysql; replicate_do_db=huanqiu,huanpc
   Binary logging  MIXED
   Slave status    265831 seconds behind, running, no errors
   Slave mode      STRICT
   Auto-increment  increment 1, offset 1
   InnoDB version  5.6.34

4）pt-show-grants
功能介紹：
規範化和列印mysql許可權，讓你在複製、比較mysql許可權以及進行版本控制的時候更有效率！
用法介紹：
pt-show-grants [OPTION...] [DSN]
選項自行用help檢視，DSN選項也請檢視help，選項區分大小寫。
使用示例：
檢視指定mysql的所有使用者許可權：

[[email protected] ~]# pt-show-grants --host='localhost' --user='root' --password='123456'
-- Grants dumped by pt-show-grants
-- Dumped from server Localhost via UNIX socket, MySQL 5.6.33-log at 2017-01-16 11:22:12
-- Grants for ''@'localhost'
GRANT USAGE ON *.* TO ''@'localhost';
-- Grants for 'data_check'@'%'
.......

檢視執行資料庫的許可權：

[[email protected] ~]# pt-show-grants --host='localhost' --user='root' --password='123456' --database='huanqiu'
-- Grants dumped by pt-show-grants
-- Dumped from server Localhost via UNIX socket, MySQL 5.6.33-log at 2017-01-16 11:23:16
-- Grants for ''@'localhost'
GRANT USAGE ON *.* TO ''@'localhost';
-- Grants for 'data_check'@'%'
GRANT SELECT ON *.* TO 'data_check'@'%' IDENTIFIED BY PASSWORD '*36B94ABF70E8D5E025CF9C059E66445CBB05B54F';
-- Grants for 'mksync'@'%'
GRANT ALL PRIVILEGES ON *.* TO 'mksync'@'%' IDENTIFIED BY PASSWORD '*B5E7409B1A22D47C6F1D8A693C6146CEB6570475';
........

檢視每個使用者許可權生成revoke收回許可權的語句：

[[email protected] ~]# pt-show-grants --host='localhost' --user='root' --password='123456'  --revoke
-- Grants dumped by pt-show-grants
-- Dumped from server Localhost via UNIX socket, MySQL 5.6.33-log at 2017-01-16 11:24:58
-- Revoke statements for ''@'localhost'
REVOKE USAGE ON *.* FROM ''@'localhost';
-- Grants for ''@'localhost'
..........

5）pt-upgrade
功能介紹：
這個工具用來檢查在新版本中執行的SQL是否與老版本一樣，返回相同的結果，最好的應用場景就是資料遷移的時候。這在升級伺服器的時候非常有用，可以先安裝並導資料到新的伺服器上，然後使用這個工具跑一下sql看看有什麼不同，可以找出不同版本之間的差異。
用法介紹：
pt-upgrade [OPTION...] DSN [DSN...] [FILE]
比較檔案中每一個查詢語句在每臺伺服器上執行的結果（主要是針對不同版本的執行結果）。（--help檢視選項）
使用示例：
檢視某個sql檔案在兩個伺服器的執行結果範例：

[[email protected] ~]# pt-upgrade h=192.168.1.101 h=192.168.1.102 --user=root --password=123456 /root/test.sql
 
#-----------------------------------------------------------------------
# Logs
#-----------------------------------------------------------------------
.......
host1:
  DSN:       h=192.168.1.101
  hostname:  master-server
  MySQL:     Source distribution 5.6.33
 
host2:
  DSN:       h=192.168.1.102
  hostname:  slave-server
  MySQL:     Source distribution 5.6.34
......
queries_read          1
queries_with_diffs    0
queries_with_errors   0

檢視慢查詢中的對應的查詢SQL在兩個伺服器的執行結果範例：

[[email protected] ~]# pt-upgrade h=192.168.1.101 h=192.168.1.102 --user=root --password=123456 /data/mysql/data/mysql-slow.log
.......

6）pt-index-usage
功能介紹：
這個工具主要是用來分析慢查詢的索引使用情況。從log檔案中讀取插敘語句，並用explain分析他們是如何利用索引。完成分析之後會生成一份關於索引沒有被查詢使用過的報告。
用法介紹：
pt-index-usage [OPTION...] [FILE...]
可以直接從慢查詢中獲取sql，FILE檔案中的sql格式必須和慢查詢中個是一致，如果不是一直需要用pt-query-digest轉換一下。也可以不生成報告直接儲存到資料庫中，具體的見後面的示例
注意：使用這個工具需要MySQL必須要有密碼，另外執行時可能報找不到/var/lib/mysql/mysql.sock的錯，簡單的從mysql啟動後的sock檔案做一個軟連結即可。
重點要說明的是pt-index-usage只能分析慢查詢日誌，所以如果想全面分析所有查詢的索引使用情況就得將slow_launch_time設定為0，因此請謹慎使用該工具，線上使用的話最好在凌晨進行分析，尤其分析大量日誌的時候是很耗CPU的。
整體來說這個工具是不推薦使用的，要想實現類似的分析可以考慮一些其他第三方的工具，比如：mysqlidxchx, userstat和check-unused-keys。網上比較推薦的是userstat，一個Google貢獻的patch。
使用示例：
從滿查詢中的sql檢視索引使用情況範例：

[[email protected] ~]# pt-index-usage --host=localhost --user=root --password=123456 /data/mysql/data/mysql-slow.log

將分析結果儲存到資料庫範例：

[[email protected] ~]# pt-index-usage --host=localhost --user=root --password=123456 /data/mysql/data/mysql-slow.log  --no-report --create-save-results-database

7）pt-visual-explain
功能介紹：
格式化explain出來的執行計劃按照tree方式輸出，方便閱讀。
用法介紹：
pt-visual-explain [OPTION...] [FILE...]
通過管道直接檢視explain輸出結果的範例：

mysql> select * from huanqiu.haha;
+----+-------------+
| id | name        |
+----+-------------+
|  1 | wangshibo   |
|  2 | wangshihuan |
|  3 | 王世博      |
| 10 | wangshiman  |
+----+-------------+
4 rows in set (0.00 sec)
 
[[email protected] ~]# mysql -uroot -p123456  -e "explain select * from huanqiu.haha" |pt-visual-explain
Warning: Using a password on the command line interface can be insecure.
Table scan
rows           4
+- Table
   table          haha
    
[[email protected] ~]# mysql -uroot -p123456  -e "explain select * from huanqiu.haha where id=3" |pt-visual-explain
Warning: Using a password on the command line interface can be insecure.
Bookmark lookup
+- Table
|  table          haha
|  possible_keys  PRIMARY
+- Constant index lookup
   key            haha->PRIMARY
   possible_keys  PRIMARY
   key_len        4
   ref            const
   rows           1

檢視包含查詢語句的test.sql檔案的範例：

[[email protected] ~]# pt-visual-explain --connect /root/test.sql --user=root --password=123456

8）pt-config-diff
功能介紹：
比較mysql配置檔案和伺服器引數
用法介紹：
pt-config-diff [OPTION...] CONFIG CONFIG [CONFIG...]
CONFIG可以是檔案也可以是資料來源名稱，最少必須指定兩個配置檔案源，就像unix下面的diff命令一樣，如果配置完全一樣就不會輸出任何東西。
使用示例：
範例1：檢視本地和遠端伺服器的配置檔案差異：

[[email protected] ~]# pt-config-diff h=localhost h=192.168.1.102 --user=root --password=123456
18 config differences
Variable                  master-server             slave-server
========================= ========================= =========================
binlog_checksum           NONE                      CRC32
general_log_file          /data/mysql/data/maste... /data/mysql/data/slave...
hostname                  master-server             slave-server
innodb_version            5.6.33                    5.6.34
log_bin_index             /data/mysql/data/maste... /data/mysql/data/slave...
log_slave_updates         OFF                       ON
relay_log_recovery        OFF                       ON
secure_file_priv                                    NULL
server_id                 101                       102
server_uuid               d8497104-d7a7-11e6-911... d8773e51-d7a7-11e6-911...
slave_net_timeout         3600                      5
slave_skip_errors         OFF                       ALL
sync_binlog               1                         0
sync_master_info          10000                     1
sync_relay_log            10000                     1
sync_relay_log_info       10000                     1
system_time_zone          CST                       EST
version                   5.6.33-log                5.6.34-log

範例2：比較本地配置檔案和遠端伺服器的差異：

[[email protected] ~]# pt-config-diff /usr/local/mysql/my.cnf h=localhost h=192.168.1.102 --user=root --password=123456
6 config differences
Variable                  /usr/local/mysql/my.cnf master-server      slave-ser
========================= ================= ================== ===============
binlog_checksum           none              NONE               CRC32
innodb_read_io_threads    1000              64                 64
innodb_write_io_threads   1000              64                 64
log_bin_index             master-bin.index  /data/mysql/dat... /data/mysql/...
server_id                 101               101                102
sync_binlog               1                 1                  0

9）pt-mysql-summary
功能介紹：
精細地對mysql的配置和sataus資訊進行彙總，彙總後你直接看一眼就能看明白。
工作原理：連線mysql後查詢出status和配置資訊儲存到臨時目錄中，然後用awk和其他的指令碼工具進行格式化。OPTIONS可以查閱官網的相關頁面。
用法介紹：
pt-mysql-summary [OPTIONS] [-- MYSQL OPTIONS]
使用示例：
彙總本地mysql伺服器的status和配置資訊：

[[email protected] ~]# pt-mysql-summary -- --user=root --password=123456 --host=localhost

10）pt-deadlock-logger
功能介紹：
提取和記錄mysql死鎖的相關資訊
用法介紹：
pt-deadlock-logger [OPTION...] SOURCE_DSN
收集和儲存mysql上最近的死鎖資訊，可以直接列印死鎖資訊和儲存死鎖資訊到資料庫中，死鎖資訊包括髮生死鎖的伺服器、最近發生死鎖的時間、死鎖執行緒id、死鎖的事務id、發生死鎖時事務執行了多長時間等等非常多的資訊。
使用示例：
檢視本地mysql的死鎖資訊

[[email protected] ~]# pt-deadlock-logger  --user=root --password=123456 h=localhost D=test,t=deadlocks
server ts thread txn_id txn_time user hostname ip db tbl idx lock_type lock_mode wait_hold victim query
localhost 2017-01-11T11:00:33 188 0 0 root  192.168.1.101 huanpc checksums PRIMARY RECORD X w 1 REPLACE INTO `huanpc`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'huanpc', 'heihei', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `member`, `city`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `huanpc`.`heihei` /*checksum table*/
localhost 2017-01-11T11:00:33 198 0 0 root  192.168.1.101 huanpc checksums PRIMARY RECORD X w 0 REPLACE INTO `huanpc`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'huanpc', 'heihei', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `member`, `city`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `huanpc`.`heihei` /*checksum table*/

11）pt-mext
功能介紹：
並行檢視SHOW GLOBAL STATUS的多個樣本的資訊。
用法介紹：
pt-mext [OPTIONS] -- COMMAND
原理：pt-mext執行你指定的COMMAND，並每次讀取一行結果，把空行分割的內容儲存到一個一個的臨時檔案中，最後結合這些臨時檔案並行檢視結果。
使用示例：
每隔10s執行一次SHOW GLOBAL STATUS，並將結果合併到一起檢視

[[email protected] ~]# pt-mext  -- mysqladmin ext -uroot -p123456  -i10 -c3

12）pt-query-digest
功能介紹：
分析查詢執行日誌，併產生一個查詢報告，為MySQL、PostgreSQL、 memcached過濾、重放或者轉換語句。
pt-query-digest可以從普通MySQL日誌，慢查詢日誌以及二進位制日誌中分析查詢，甚至可以從SHOW PROCESSLIST和MySQL協議的tcpdump中進行分析，如果沒有指定檔案，它從標準輸入流（STDIN）中讀取資料。
用法介紹：
pt-query-digest [OPTION...] [FILE]
解析和分析mysql日誌檔案
使用示例：（建議：當log很大的時候最好還是將日誌檔案移到其他機器上進行分析，以免過多耗費本機效能）
範例1：分析本地的慢查詢日誌檔案（本例是慢查詢日誌，也可以是mysql的其他日誌）

[[email protected] ~]# pt-query-digest --user=root --password=123456 /data/mysql/data/mysql-slow.log
 
# 260ms user time, 30ms system time, 24.85M rss, 204.71M vsz
# Current date: Mon Jan 16 13:20:39 2017
# Hostname: master-server
# Files: /data/mysql/data/mysql-slow.log
# Overall: 18 total, 2 unique, 0 QPS, 0x concurrency _____________________
# Time range: all events occurred at 2017-01-11 11:00:33
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time           812s      2s     92s     45s     80s     27s     52s
# Lock time              0       0       0       0       0       0       0
# Rows sent              0       0       0       0       0       0       0
# Rows examine           0       0       0       0       0       0       0
# Query size         1.10k      62      63   62.56   62.76    0.50   62.76
 
# Profile
# Rank Query ID           Response time  Calls R/Call  V/M   Item
# ==== ================== ============== ===== ======= ===== =============
#    1 0x50B84661D4CCF34B 467.9075 57.6%    10 46.7907 16.48 CREATE DATABASE `huanqiu`
#    2 0x9CC34439A4FB17E3 344.2984 42.4%     8 43.0373 16.22 CREATE DATABASE `huanpc`
 
# Query 1: 0 QPS, 0x concurrency, ID 0x50B84661D4CCF34B at byte 2642 _____
# This item is included in the report because it matches --limit.
# Scores: V/M = 16.48
# Time range: all events occurred at 2017-01-11 11:00:33
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         55      10
# Exec time     57    468s      2s     92s     47s     80s     28s     52s
# Lock time      0       0       0       0       0       0       0       0
# Rows sent      0       0       0       0       0       0       0       0
# Rows examine   0       0       0       0       0       0       0       0
# Query size    55     630      63      63      63      63       0      63
# String:
# Hosts
# Users        root
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s  #######
#  10s+  ################################################################
CREATE DATABASE IF NOT EXISTS `huanqiu` /* pt-table-checksum */\G
 
# Query 2: 0 QPS, 0x concurrency, ID 0x9CC34439A4FB17E3 at byte 3083 _____
# This item is included in the report because it matches --limit.
# Scores: V/M = 16.22
# Time range: all events occurred at 2017-01-11 11:00:33
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         44       8
# Exec time     42    344s      2s     82s     43s     80s     26s     56s
# Lock time      0       0       0       0       0       0       0       0
# Rows sent      0       0       0       0       0       0       0       0
# Rows examine   0       0       0       0       0       0       0       0
# Query size    44     496      62      62      62      62       0      62
# String:
# Hosts
# Users        root
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s  #########
#  10s+  ################################################################
CREATE DATABASE IF NOT EXISTS `huanpc` /* pt-table-checksum */\G
 
整個輸出分為三大部分：
1）整體概要（Overall）
這個部分是一個大致的概要資訊(類似loadrunner給出的概要資訊)，通過它可以對當前MySQL的查詢效能做一個初步的評估，比如各個指標的最大值(max)，平均值(min)，95%分佈值，中位數(median)，標準偏差(stddev)。
這些指標有查詢的執行時間（Exec time），鎖佔用的時間（Lock time），MySQL執行器需要檢查的行數（Rows examine），最後返回給客戶端的行數（Rows sent），查詢的大小。
 
2）查詢的彙總資訊（Profile）
這個部分對所有“重要”的查詢(通常是比較慢的查詢)做了個一覽表。
每個查詢都有一個Query ID，這個ID通過Hash計算出來的。pt-query-digest是根據這個所謂的Fingerprint來group by的。
Rank整個分析中該“語句”的排名，一般也就是效能最常的。
Response time  “語句”的響應時間以及整體佔比情況。
Calls 該“語句”的執行次數。
R/Call 每次執行的平均響應時間。
V/M 響應時間的差異平均對比率。
在尾部有一行輸出，顯示了其他2個佔比較低而不值得單獨顯示的查詢的統計資料。
 
3）詳細資訊
這個部分會列出Profile表中每個查詢的詳細資訊：
包括Overall中有的資訊、查詢響應時間的分佈情況以及該查詢”入榜”的理由。
pt-query-digest還有很多複雜的操作，這裡就不一一介紹了。比如：從PROCESSLIST中查詢某個MySQL中最慢的查詢：

範例2：重新回顧滿查詢日誌，並將結果儲存到query_review中，注意query_review表的表結構必須先建好，表結構如下：

mysql> use test;
Database changed
 
mysql> CREATE TABLE query_review (
    ->    checksum     BIGINT UNSIGNED NOT NULL PRIMARY KEY,
    ->    fingerprint  TEXT NOT NULL,
    ->    sample       TEXT NOT NULL,
    ->    first_seen   DATETIME,
    ->    last_seen    DATETIME,
    ->    reviewed_by  VARCHAR(20),
    ->    reviewed_on  DATETIME,
    ->    comments     TEXT
    -> );
Query OK, 0 rows affected (0.02 sec)
  
mysql> select * from query_review;
Empty set (0.00 sec)
  
[[email protected] ~]# pt-query-digest --user=root --password=123456 --review h=localhost,D=test,t=query_review /data/mysql/data/mysql-slow.log
 
mysql> select * from query_review;
+----------------------+------------------------------------------+-----------------------------------------------------------------+----------
| checksum | fingerprint | sample | first_see
+----------------------+------------------------------------------+-----------------------------------------------------------------+----------
| 5816476304744969035 | create database if not exists `huanqiu` | CREATE DATABASE IF NOT EXISTS `huanqiu` /* pt-table-checksum */ | 2017-01-1
| 11295947304747079651 | create database if not exists `huanpc` | CREATE DATABASE IF NOT EXISTS `huanpc` /* pt-table-checksum */ | 2017-01-1
+----------------------+------------------------------------------+-----------------------------------------------------------------+----------
2 rows in set (0.00 sec)

從tcpdump中分析：

[[email protected] ~]# tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 > mysql.tcp.txt
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
 
然後開啟另一個終端視窗：
[[email protected] ~]# pt-query-digest --type tcpdump mysql.tcp.txt
Pipeline process 3 (TcpdumpParser) caused an error: substr outside of string at /usr/bin/pt-query-digest line 3628, <> chunk 93.
Will retry pipeline process 2 (TcpdumpParser) 100 more times.
 
# 320ms user time, 20ms system time, 24.93M rss, 204.84M vsz
# Current date: Mon Jan 16 13:24:50 2017
# Hostname: master-server
# Files: mysql.tcp.txt
# Overall: 31 total, 4 unique, 4.43 QPS, 0.00x concurrency _______________
# Time range: 2017-01-16 13:24:43.000380 to 13:24:50.001205
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time           30ms    79us     5ms   967us     4ms     1ms   159us
# Rows affecte          14       0       2    0.45    1.96    0.82       0
# Query size         1.85k      17     200   61.16  192.76   72.25   17.65
.........

13）pt-slave-delay

pt-heartbeat監控mysql主從複製延遲整理

pt-heartbeat監控mysql主從複製延遲整理

pt-heartbeat監控mysql主從複製

mysql主從複製延遲問題的相關知識與解決方案

CentOS6.5搭建MySQL主從複製，讀寫分離(冷月宮主親自整理，最簡單明瞭)

MySQL的主從複製延遲問題

MySQL主從複製狀態監控指令碼

mysql主從同步延遲原因及解決方法

監控mysql主從同步狀態是否異常，如果異常，則發生短信或郵寄給管理員

zabbix監控mysql主從復制

監控mysql主從同步狀態

zabbix自定義監控mysql主從狀態，並做郵件告警

日常管理03-監控MYSQL主從延時3秒腳本;

zabbix企業級監控之監控MYSQL主從

zabbix監控mysql主從同步

python 監控mysql主從

MySQL主從複製(Replication for Backup)

MYSQL主從複製+讀寫分離

mysql主從複製讀寫分離之——proxysql應用

zabbix自定義監控mysql主從狀態，並做郵件告警（詳細解說）

MySQL主從複製，並行複製，半同步複製和組複製

pt-heartbeat監控mysql主從複製延遲整理

相關推薦