mysql galera 叢集常見問題處理

阿新 • • 發佈：2018-12-25

一、mysql HA叢集在斷網過久或者所有節點都down了之後的恢復有以下的方法：
解決方案1：
1、等三臺機器恢復網路通訊後，因為此時的mysql已經異常無法加入叢集，因此需要先保證所有的mysql都是down的，再上臺執行/usr/libexec/mysqld --wsrep-new-cluster --wsrep-cluster-address='gcomm://' & 這條命令，並進入mysql（只有一臺機器能夠成功執行，其他機器執行了過幾秒鐘都會異常退出這個程序，我們這裡把能夠成功執行的機器稱為master）
2、此時三臺只有一臺能夠成功進入mysql（即執行mysql這條命令），在非master上的兩臺上一臺一臺的執行systemctl start mysqld，必須等一臺成功了，另一臺才能執行。

3、在mysql中執行show status like "wsrep%";結果如下圖：

我們需要保證圖中的第一項為synced，以及第二項必須為三個mysql的ip

4、保證3的結果是想要的說明叢集已經恢復了，此時需要將master機器上面的/usr/libexec/mysqld --wsrep-new-cluster --wsrep-cluster-address='gcomm://'這個程序kill掉，然後再執行systemctl start mysqld即可

二、mysql HA叢集某個節點無故down了並且有一段時間處於down的情況通過以下方式恢復：

1、若日誌裡面出現以下日誌

160119 14:11:05 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (eb9f50c6-bc95-11e5-a735-9f48e437dc03): 1 (Operation not permitted)

解決方法：刪除/var/lib/mysql/grastate.dat 檔案（若還存在無法同步的情況則刪除galera.cache檔案）

2、若那個down了的節點出現以下日誌

（異常情況叢集掛了）[ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions

解決方法：
1、/usr/libexec/mysqld start --innodb_force_recovery=6
1. (SRV_FORCE_IGNORE_CORRUPT):忽略檢查到的corrupt頁。
2. (SRV_FORCE_NO_BACKGROUND):阻止主執行緒的執行，如主執行緒需要執行full purge操作，會導致crash。
3. (SRV_FORCE_NO_TRX_UNDO):不執行事務回滾操作。
4. (SRV_FORCE_NO_IBUF_MERGE):不執行插入緩衝的合併操作。
5. (SRV_FORCE_NO_UNDO_LOG_SCAN):不檢視重做日誌，InnoDB儲存引擎會將未提交的事務視為已提交。
6. (SRV_FORCE_NO_LOG_REDO):不執行前滾的操作。
如果配置後出現以下情況：
130507 14:14:01 InnoDB: Waiting for the background threads to start
130507 14:14:02 InnoDB: Waiting for the background threads to start
130507 14:14:03 InnoDB: Waiting for the background threads to start
130507 14:14:04 InnoDB: Waiting for the background threads to start
130507 14:14:05 InnoDB: Waiting for the background threads to start
130507 14:14:06 InnoDB: Waiting for the background threads to start
130507 14:14:07 InnoDB: Waiting for the background threads to start
130507 14:14:08 InnoDB: Waiting for the background threads to start
130507 14:14:09 InnoDB: Waiting for the background threads to start

需要在galera.cfg中新增這一下：
如果在設定 innodb_force_recovery >2 的同時innodb_purge_thread = 0
2、mysqld --tc-heuristic-recover=ROLLBACK
3、刪除/var/lib/mysql/ib_logfile*
4、當某個mysql節點掛了，並且存在三個mysql所在host有不同的網段，當mysql想重新加入需要一個sst的過程，sst時會需要知道叢集中某個節點的ip因此需要制定引數--wsrep-sst-receive-address否則可能出現同步的ip不在三臺機器所共有的網段
解決參考：
http://blog.itpub.net/22664653/viewspace-1441389/

三、一個mysql節點若down了一段時間。重新啟動的時候需要一些時間去同步資料，服務的啟動超時時間不夠，導致服務無法啟動，解決方法如下：
The correct way to adjust systemd settings so they don't get overwritten is to create a directory and file as such:
/etc/systemd/system/mariadb.service.d/timeout.conf
[Service]

TimeoutStartSec=12min

或者直接修改/usr/lib/systemd/system/mariadb.service
[Service]

TimeoutStartSec=12min
這裡的時間最少要大於90s，預設是90s之後執行 systemctl daemon-reload再重啟服務即可
四、日誌中出現類似如下錯誤：
160428 13:54:49 [ERROR] Slave SQL: Error 'Table 'manage_operations' already exists' on query. Default database: 'horizon'. Query: 'CREATE TABLE `manage_operations` (
`id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
`name` varchar(50) NOT NULL,
`type` varchar(20) NOT NULL,
`operation` varchar(20) NOT NULL,
`status` varchar(20) NOT NULL,
`time` date NOT NULL,
`operator` varchar(50) NOT NULL
) default charset=utf8', Error_code: 1050
160428 13:54:49 [Warning] WSREP: RBR event 1 Query apply warning: 1, 28585
160428 13:54:49 [Warning] WSREP: Ignoring error for TO isolated action: source: 752eecd1-0ce0-11e6-83fc-3e0502d0bdd2 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 24053 trx_id: -1 seqnos (l: 28668, g: 28585, s: 28584, d: 28584, ts: 80224119986850)
導致程序異常關閉，
此時可以通過執行mysqladmin flush-tables來重新整理表項，這個問題的原因是三個節點之間的表同步存在問題，重新整理一下表即可

五、日誌出現以下錯誤：
160520 10:48:23 [Note] WSREP: COMMIT failed, MDL released: 367194
160520 10:48:23 [Note] WSREP: cert failure, thd: 358780 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 358784 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: COMMIT failed, MDL released: 367188
160520 10:48:23 [Note] WSREP: cert failure, thd: 359683 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 358808 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 367191 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 367196 is_AC: 0, retry: 0 - 1 SQL: commit
160520 10:48:23 [Note] WSREP: cert failure, thd: 367194 is_AC: 0, retry: 0 - 1 SQL: commit

160520 10:48:23 [Note] WSREP: cert failure, thd: 367188 is_AC: 0, retry: 0 - 1 SQL: commit

8、日誌出現以下錯誤：

160820 3:13:41 [ERROR] Error in accept: Too many open files
160820 3:19:42 [ERROR] Error in accept: Too many open files
160827 3:16:24 [ERROR] Error in accept: Too many open files
160831 17:20:52 [ERROR] Error in accept: Too many open files
160831 19:54:29 [ERROR] Error in accept: Too many open files
160831 20:21:53 [ERROR] Error in accept: Too many open files
160901 11:25:57 [ERROR] Error in accept: Too many open files

解決方法

vim /usr/lib/systemd/system/mariadb.service

[Service]
LimitNOFILE=10000

預設的mysql的open_file_limits是1024將該項增大，並且修改vim /etc/my.cnf.d/server.cnf該檔案的open_files_limit值

systemctl daemon-reload

systemctl restart mysqld

檢視mysql的open_file_limits值是否調整成功

cat /proc/$pid/limit

其中$pid為mysql程序的pid看看值是否調整成功，並看看日誌是否還會出現上述錯誤

mysql galera 叢集常見問題處理

mysql galera 叢集常見問題處理

MySQL Galera 叢集安裝與配置

Percona Mysql Galera多讀寫叢集部署生產環境實記

MySQL叢集常見高可用方案（轉）

linux上 mysql主從報錯處理常見方式

MySQL的事務的處理

undo過大的常見處理方式

mq/mysql/redis/nginx常見服務&工具安裝

mysql安裝及常見使用

mysql 對時間的處理

MYSQL GALERA部署實錄

MySQL主從說明詳解、MySQL主從不同步處理方案

mysql 架構~MHA常見問題

mysql 運維常見操作

Mysql主從同步常見錯誤

MySQL的SQL預處理(Prepared)

VMware CentOS7 安裝 Mysql 及一些常見問題

Mariadb10 galera叢集配置

關於mysql使用between and處理時間區間右邊界問題

MySQL 8.0常見問題

mysql galera 叢集常見問題處理

相關推薦