1. 程式人生 > >04 . Prometheus(聯邦叢集)監控MySQL

04 . Prometheus(聯邦叢集)監控MySQL

--- #### List ```python CentOS7.3 alertmanager-0.19.0.linux-amd64.tar.gz mysql-5.7.23-1.el7.x86_64.rpm-bundle.tar mysqld_exporter-0.12.1.linux-amd64.tar.gz prometheus-2.13.0.linux-amd64.tar.gz ``` | 節點名 | IP | 軟體版本 | 說明 | | ----------------- | ------------------------- | ------------------------------------------------ | ---------------------------------- | | prometheus-master | 172.19.0.55 | alertmanager-0.19/prometheus-2.13.0 | Prometheus聯邦叢集Master節點 | | prometheus-slave | 172.19.0.56\192.168.50.10 | prometheus-2.13 | Prometheus聯邦叢集slave節點(proxy) | | prometheus_mysql | 192.168.50.5 | mysqld_exporter-0.10/mysql-5.7.23.rpm-bundle.tar | Prometheus的Mysql_Test節點 | #### 初始化系統環境 ```python 1.初始化 init_security() { systemctl stop firewalld systemctl disable firewalld &>/dev/null setenforce 0 sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config sed -i '/^GSSAPIAu/ s/yes/no/' /etc/ssh/sshd_config sed -i '/^#UseDNS/ {s/^#//;s/yes/no/}' /etc/ssh/sshd_config systemctl enable sshd crond &> /dev/null rpm -e postfix --nodeps echo -e "\033[32m [安全配置] ==> OK \033[0m" } init_security init_yumsource() { if [ ! -d /etc/yum.repos.d/backup ];then mkdir /etc/yum.repos.d/backup fi mv /etc/yum.repos.d/* /etc/yum.repos.d/backup 2>/dev/null if ! ping -c2 www.baidu.com &>/dev/null then echo "您無法上外網,不能配置yum源" exit fi curl -o /etc/yum.repos.d/163.repo http://mirrors.163.com/.help/CentOS7-Base-163.repo &>/dev/null curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo &>/dev/null yum clean all timedatectl set-timezone Asia/Shanghai echo "nameserver 114.114.114.114" > /etc/resolv.conf echo "nameserver 8.8.8.8" >> /etc/resolv.conf chattr +i /etc/resolv.conf yum -y install ntpdate ntpdate -b ntp1.aliyun.com # 對時很重要,避免zookeeper因為時間不準找不到主機 echo -e "\033[32m [YUM Source] ==> OK \033[0m" } init_yumsource # 配置主機名解析 tail -2 /etc/hosts 172.19.0.55 prometheus-master 172.19.0.56 prometheus-slave ``` #### 部署Mysql及MySQLD Exporter(Prometheus_Mysql上安裝) ```python # 為了簡化測試環境複雜度,這裡使用Docker Compose定義並啟動MySQL以及MySQLD Exporter: cat docker-compose.yml version: '3' services: mysql: image: daocloud.io/library/mysql:5.7 ports: - "3306:3306" environment: - MYSQL_ROOT_PASSWORD=XUANji.20 - MYSQL_DATABASE=database mysqlexporter: image: prom/mysqld-exporter ports: - "9104:9104" environment: - DATA_SOURCE_NAME=root:XUANji.20@(mysql:3306)/database # 這裡通過環境變數DATA_SOURCE_NAME方式定義監控目標,使用Docker Compose啟動測試用的Mysql示例以及MySQLD Exporter # docker-compose up -d # 啟動完成後,可以通過以下命令登入到Mysql容器當中,並執行Mysql相關的指令. docker exec -it cb2017 mysql -uroot -pXUANji.20 mysql> ``` **如果不是容器裡面的Mysql,需要下載mysqld_exporter的tar包,並且安裝好Mysql服務** **安裝Mysql** ``` init_mysql() { rpm -e mariadb-libs --nodeps rm -rf /var/lib/mysql rm -rf /etc/my.cnf tar xvf /root/mysql-5.7.23-1.el7.x86_64.rpm-bundle.tar -C /usr/local/ cd /usr/local rpm -ivh mysql-community-server-5.7.23-1.el7.x86_64.rpm \ mysql-community-client-5.7.23-1.el7.x86_64.rpm \ mysql-community-common-5.7.23-1.el7.x86_64.rpm \ mysql-community-libs-5.7.23-1.el7.x86_64.rpm | rm -rf mysql-community-* } changepass() { sed -i '/\[mysqld]/ a skip-grant-tables' /etc/my.cnf systemctl restart mysqld mysql <
/etc/ntp.conf << EOF restrict default nomodify server 127.127.1.0 fudge 127.127.1.0 stratum 10 EOF systemctl start ntpd && systemctl enable ntpd expect <<-EOF spawn mysqladmin -uroot -p password "ZHOUjian.20" expect { "password" { send "\r" } } expect eof EOF systemctl restart mysqld } main() { init_mysql changepass } main ``` **安裝mysqld_exporter** ``` wget \ https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz tar xvf mysqld_exporter-0.10.0.linux-amd64.tar.gz -C /usr/local/ cd /usr/local/ mv mysqld_exporter-0.10.0.linux-amd64/ /usr/local/mysqld_exporter # 載入mysqld_exporter 新增配置檔案(需要MySQL授權使用者) # mysqld_exporter需要連線到MySQL,需要授權 mysql> grant replication client, process on *.* to prometheus@"localhost" identified by "ZHOUjian.20"; mysql> grant select on performance_schema.* to prometheus@"localhost"; cat .my.cnf [client] user=prometheus password=ZHOUjian.20 nohup ./mysqld_exporter --config.my-cnf=.my.cnf & ``` **我們可以通過那臺Mysql伺服器IP去瀏覽器訪問** ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001504029-1257217451.png) >
可以再/metrics檢視mysql_up指標判斷當前MySQLD Exporter是否正常連線到了MySQL例項,當指標值為1時表示能夠正常獲取監控資料: ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001529769-511818075.png) > 修改Prometheus配置檔案/etc/prometheus/prometheus.yml,增加對MySQLD Exporter例項的採集任務配置. ```python - job_name: 'mysqld' static_configs: - targets: ['172.19.0.27:9104'] systemctl restart prometheus ``` ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001537717-1767703109.png) >
為了確保資料庫的穩定執行,通常會關注一下四個與效能和資源利用率相關的指標:查詢吞吐量、連線情況、緩衝池使用情況以及查詢執行效能等。 ##### 監控資料庫吞吐量 > 對於資料庫而言,最重要的工作就是實現對資料的增、刪、改、查。為了衡量資料庫伺服器當前的吞吐量變化情況。在MySQL內部通過一個名為Questions的計數器,當客戶端傳送一個查詢語句後,其值就會+1。可以通過以下MySQL指令查詢Questions等伺服器狀態變數的值: ``` show global status like "Questions"; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Questions | 545 | +---------------+-------+ ``` > MySQLD Exporter中返回的樣本資料中通過mysql_global_status_questions反映當前Questions計數器的大小. ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001551509-1752399566.png) ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001558282-1686262478.png) > 通過以下PromQL可以檢視當前MySQL例項查詢速率的變化情況,查詢數量的突變往往暗示著可能發生了某些嚴重的問題,因此用於使用者應該關注並且設定響應的告警規則,以及時獲取該指標的變化情況: ``` rate(mysql_global_status_questions[2m]) ``` ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001611541-145961209.png) > 一般還可以監控讀操作和寫操作的執行情況進行判斷。通過MySQL全域性狀態中的Com_select可以查詢到當前伺服器執行查詢語句的總次數:相應的,也可以通過Com_insert、Com_update以及Com_delete的總量衡量當前伺服器寫操作的總次數,例如,可以通過以下指令查詢當前MySQL例項insert語句的執行次數總量: ```mysql show global status like "Com_insert"; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Com_insert | 0 | +---------------+-------+ 1 row in set (0.01 sec) ``` > 從MySQLD Exporter的/metrics返回的監控樣本中,可以通過global_status_commands_total獲取當前例項各類指令執行的次數: ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001626801-1674711167.png) ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001638526-1443999278.png) > 使用者可以通過以下PromQL檢視當前MySQL例項寫操作速率的變化情況. > sum(rate(mysql_global_status_commands_total{command=~"insert|update|delete"}[2m])) without (command) > > > > 為了方便看出效果建議去mysql建立一個庫一個表,使用for迴圈不斷的插入 ```python docker exec -it cb2017 /bin/bash create database test; use test create table student(id int,name varchar(50),sex enum('male','emale'),age int); insert into student values(1,'tom','male',12),(2,'jack','male',13),(3,'alice','male',14); for i in {1..1000}; do mysql -uroot -pZHOUjian.20 -e "use test;insert into student values(1,'tom','male',12),(2,'jack','male',13),(3,'alice','male',14);" ;done ``` ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001658442-605630698.png) ##### 連線情況 > 在MySQL中通過全域性設定max_connections限制了當前伺服器允許的最大客戶端連線數量。一旦可用連線數被用盡,新的客戶端連線都會被直接拒絕。 因此當監控MySQL執行狀態時,需要時刻關注MySQL伺服器的連線情況。使用者可以通過以下指令檢視當前MySQL服務的max_connections配置 ```mysql show variables like 'max_connections'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 151 | +-----------------+-------+ 1 row in set (0.00 sec) # Mysql預設最大連線數為151,臨時調整最大連線數,可以通過以下指令進行設定. set global max_connections = 200; mysql> show variables like 'max_connections'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 200 | +-----------------+-------+ # 如果想永久化設定,則需要通過MySQL配置檔案my.cnf,新增以下內容. max_connections = 200 ``` > 通過Global Status中的Threads_connected、Aborted_connects、Connection_errors_max_connections以及Threads_running可以檢視當前MySQL例項的連線情況。 > > > > 例如,通過以下指令可以直接檢視當前Mysql例項的連線數 ```mysql show global status like "Threads_connected"; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Threads_connected | 1 | +-------------------+-------+ ``` > 當所有可用連線都被佔用時,如果一個客戶端嘗試連線至MySQL,會出現“Too many connections(連線數過多)”錯誤,同時Connection_errors_max_connections的值也會增加。為了防止出現此類情況,你應該監控可用連線的數量,並確保其值保持在max_connections限制以內。同時如果Aborted_connects的數量不斷增加時,說明客戶端嘗試連線到MySQL都失敗了。此時可以通過Connection_errors_max_connections以及Connection_errors_internal分析連線失敗的問題原因。 > > 下面列舉了與MySQL連線相關的監控指標 ```mysql * mysql_global_variables_max_connections: 允許的最大連線數; * mysql_global_status_threads_connected: 當前開放的連線; * mysql_global_status_threads_running:當前開放的連線; * mysql_global_status_aborted_connects:當前開放的連線; * mysql_global_status_connection_errors_total{error="max_connections"}:由於超出最大連線數導致的錯誤; * mysql_global_status_connection_errors_total{error="internal"}:由於系統內部導致的錯誤; ``` `通過PromQL查詢當前剩餘的可用連線數.` ``` mysql_global_variables_max_connections - mysql_global_status_threads_connected ``` ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001715992-718896162.png) `使用PromQL查詢當前MySQL例項連線拒絕數` ``` mysql_global_status_aborted_connects ``` ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001725197-1307048292.png) ##### 監控緩衝池使用情況 > MySQL預設的儲存引擎InnoDB使用了一片稱為緩衝池的記憶體區域,用於快取資料表以及索引的資料。 當緩衝池的資源使用超出限制後,可能會導致資料庫效能的下降,同時很多查詢命令會直接在磁碟中執行,導致磁碟I/O不斷攀升。 因此,應該關注MySQL緩衝池的資源使用情況,並且在合理的時間擴大緩衝池的大小可以優化資料庫的效能。 > > > > Innodb_buffer_pool_pages_total反映了當前緩衝池中的記憶體頁的總頁數。可以通過以下指令檢視: ``` show global status like "Innodb_buffer_pool_pages_total"; +--------------------------------+-------+ | Variable_name | Value | +--------------------------------+-------+ | Innodb_buffer_pool_pages_total | 8191 | +--------------------------------+-------+ ``` `MySQLD Exporter通過以下指標返回緩衝池中各類記憶體頁的數量:` ``` # HELP mysql_global_status_buffer_pool_pages Innodb buffer pool pages by state. # TYPE mysql_global_status_buffer_pool_pages gauge mysql_global_status_buffer_pool_pages{state="data"} 516 mysql_global_status_buffer_pool_pages{state="dirty"} 0 mysql_global_status_buffer_pool_pages{state="free"} 7675 mysql_global_status_buffer_pool_pages{state="misc"} 0 ``` `Innodb_buffer_pool_read_requests記錄了正常從緩衝池讀取資料的請求數量。可以通過以下指令檢視:` ```mysql show global status like "Innodb_buffer_pool_read_requests"; +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | Innodb_buffer_pool_read_requests | 1481 | +----------------------------------+-------+ ``` > MySQLD Exporter通過以下指標返回緩衝池中Innodb_buffer_pool_read_requests的值: ```mysql # HELP mysql_global_status_innodb_buffer_pool_read_requests Generic metric from SHOW GLOBAL STATUS. # TYPE mysql_global_status_innodb_buffer_pool_read_requests untyped mysql_global_status_innodb_buffer_pool_read_requests 1481 ``` ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001803046-2046120575.png) > 當緩衝池無法滿足時,MySQL只能從磁碟中讀取資料。Innodb_buffer_pool_reads即記錄了從磁碟讀取資料的請求數量。通常來說從記憶體中讀取資料的速度要比從磁碟中讀取快很多,因此,如果Innodb_buffer_pool_reads的值開始增加,可能意味著資料庫的效能有問題。 可以通過以下只能檢視Innodb_buffer_pool_reads的數量. ``` show global status like "Innodb_buffer_pool_reads"; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Innodb_buffer_pool_reads | 408 | +--------------------------+-------+ ``` > 在MySQLD Exporter中可以通過以下指標檢視Innodb_buffer_pool_reads的數量. ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001821485-1302045606.png) > 通過以下PromQL可以得到各個MySQL例項的緩衝池利用率。一般來說還需要結合Innodb_buffer_pool_reads的增長率情況來結合判斷緩衝池大小是否合理: `(sum(mysql_global_status_buffer_pool_pages) by (instance) - sum(mysql_global_status_buffer_pool_pages{state="free"}) by (instance)) / sum(mysql_global_status_buffer_pool_pages) by (instance)` > 也可以通過以下PromQL計算2分鐘內磁碟讀取請求次數的增長率的變化情況: ``` rate(mysql_global_status_innodb_buffer_pool_reads[2m]) ``` ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001837156-939042162.png) ##### 查詢效能 > MySQL還提供了一個Slow_queries的計數器,當查詢的執行時間超過long_query_time的值後,計數器就會+1,其預設值為10秒,可以通過以下指令在MySQL中查詢當前long_query_time的設定: ``` show variables like 'long_query_time'; +-----------------+-----------+ | Variable_name | Value | +-----------------+-----------+ | long_query_time | 10.000000 | +-----------------+-----------+ ``` > 通過以下指令可以檢視當前Mysql例項中Slow_queries的數量: ```mysql show global status like "slow_queries"; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Slow_queries | 0 | +---------------+-------+ ``` `MySQLD Exporter返回的樣本資料中,通過以下指標展示當前的Slow_queries的值:` ``` # HELP mysql_global_status_slow_queries Generic metric from SHOW GLOBAL STATUS. # TYPE mysql_global_status_slow_queries untyped mysql_global_status_slow_queries 0 ``` `通過監控Slow_queries的增長率,可以反映出當前MySQL伺服器的效能狀態,可以通過以下PromQL查詢Slow_queries的增長情況` ``` rate(mysql_global_status_slow_queries[2m]) ``` > 在MySQL中還可以通過安裝response time外掛,從而支援記錄查詢時間區間的統計資訊。啟動該功能後MySQLD Exporter也會自動獲取到相關資料,從而可以細化MySQL查詢響應時間的分佈情況。 感興趣的讀者可以自行嘗試。 #### 安裝Prometheus服務(聯邦叢集的兩臺Prometheus服務都要安裝) **下載Prometheus二進位制安裝包並配置啟動** > wget https://github.com/prometheus/prometheus/releases/download/v2.13.0/prometheus-2.13.0.linux-amd64.tar.gz ```python tar xvf prometheus-2.13.0.linux-amd64.tar.gz -C /usr/local/ cd /usr/local/ ln -s prometheus-2.13.0.linux-amd64/ prometheus # 為了安全,我們使用普通使用者來啟動prometheus服務 # 作為一個時序型資料庫產品,prometheus的資料預設會存放在應用目錄下,我們需修改為/data/prometheus下 useradd -s /sbin/nologin -M prometheus mkdir -p /data/prometheus chown -R prometheus:prometheus /usr/local/prometheus chown -R prometheus:prometheus /data/prometheus/ vim /usr/lib/systemd/system/prometheus.service [Unit] Description=Prometheus Documentation=https://prometheus.io/ After=network.target [Service] Type=simple User=prometheus ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml \ --storage.tsdb.path=/data/prometheus Restart=on-failure [Install] WantedBy=multi-user.target #在此配置檔案裡面,定義了啟動的命令,定義了資料儲存在/data/prometheus路徑下 #否則預設會在prometheus二進位制目錄的data下. systemctl start prometheus systemctl status prometheus systemctl enable prometheus ``` #### 安裝alertmanager報警外掛 `(聯邦叢集的主PrometheusServer安裝)` > alertmanager是用來接收prometheus發出的告警,然後按照配置檔案的要求,將告警用對應的方式傳送出去。將告警集中到alertmanager,可以對告警進行更細緻的管理。 > **alertmanager的安裝和啟動** wget https://github.com/prometheus/alertmanager/releases/download/v0.19.0/alertmanager-0.19.0.linux-amd64.tar.gz ```python # 從官網下載好包後我們可以對他進行解壓 tar xvf alertmanager-0.19.0.linux-amd64.tar.gz -C /usr/local/ cd /usr/local/alertmanager-0.19.0.linux-amd64/ ./alertmanager &>/dev/null & # 接下來我們訪問http://172.19.0.51:9093/#/alerts,就可以開啟alertmanager的頁面, ``` ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001918898-700633000.png) ```python # 接下來我們編輯alertmanager.yml檔案,配置郵箱賬號授權碼相關配置 cat alertmanager.yml global: # 全域性配置,報警策略,報警渠道等. smtp_smarthost: 'smtp.163.com:25' smtp_from: '[email protected]' smtp_auth_username: '[email protected]' smtp_auth_password: 'ZHOUjian22' # 郵箱授權碼 smtp_require_tls: false route: # 分發策略 group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 5m receiver: 'email' receivers: # 接受者 - name: 'email' email_configs: - to: '[email protected]' # 接受的郵箱 send_resolved: true inhibit_rules: # 抑制策略,當存在另一組匹配的警報,抑制規則將禁止與另一組匹配的警報. - source_match: serverity: 'critical' ``` `接下來我們重啟一下服務使配置生效` `將alertmanager加入到systemd服務` ```python cat /usr/lib/systemd/system/alertmanager.service [Unit] Description=Alertmanager After=network.target [Service] Type=simple User=prometheus ExecStart=//usr/local/alertmanager-0.19.0.linux-amd64/alertmanager \ --config.file=/usr/local/alertmanager-0.19.0.linux-amd64/alertmanager.yml \ --storage.path=/usr/local/alertmanager-0.19.0.linux-amd64/data Restart=on-failure [Install] WantedBy=multi-user.target systemctl restart alertmanager # 如果服務啟動失敗報錯,可以先systemctl daemon-reload,再重啟 ``` `將mysql這臺機器加入到prometheus其中一個節點,以這個節點為proxy傳向聯邦叢集的一個主PormetheusServer` ##### 將mysql服務監控項加入到聯邦叢集的slave(proxy) ``` scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'mysqld' static_configs: - targets: ['172.19.0.56:9104'] - job_name: 'mysql' static_configs: - targets: ['172.19.0.27:9104'] systemctl restart prometheus ``` ![](https://img2020.cnblogs.com/blog/1871335/202006/1871335-20200619001906755-1468655132.png) ##### 修改聯邦叢集的master去拉取slave叢集上的監控項. ```python vim prometheus.yml - "first_rules.yml" scrape_configs: - job_name: 'mysqld' scrape_interval: 15s honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="prometheus"}' - '{__name__=~"mysql_up.*"}' - '{__name__=~"mysql_up.*"}' static_configs: - targets: - 'prometheus-slave2:9090' - job_name: 'mysql' scrape_interval: 15s honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="prometheus"}' - '{__name__=~"mysql_up.*"}' - '{__name__=~"mysql_up.*"}' static_configs: - targets: - 'prometheus-slave2:9090' ``` ##### 自定義報警規則,然後宕掉那臺機器引起報警 ```python vim /usr/local/prometheus/first_rules.yml groups: - name: mysql_up rules: - alert: mysql_up expr: up == 0 for: 15s labels: severity: 1 team: Prometheus_mysql annotations: summary: "{{$labels.instance}}Instance has been down for more than 5 minutes" systemctl restart prometheus alertmanager ``` ![17](/Users/youmen/Documents/blog/z/監控/Prometheus/Prometheus4