異常斷電mysql叢集資料庫恢復
說明:專案相關,有些命令不一定適用於其他場景,僅供參考
20180121日,xxxx專案,超融合異常掉電,導致資料庫啟動不了。
首先要備份/var/lib/mysql資料夾!!!!
恢復:
1)不強制恢復
180121 20:00:37 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql//wsrep_recovery.7565d7' --pid-file='/var/lib/mysql//plhcs_controller_3-recover.pid' 2018-01-21 20:00:38 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 180121 20:00:38 mysqld_safe WSREP: Failed to recover position: 2018-01-21 20:00:38 338811 [Warning] Using unique option prefix myisam_recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead. 2018 -01-21 20:00:38 338811 [Note] Plugin 'FEDERATED' is disabled. 2018-01-21 20:00:38 7f64279b9740 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future releases. Please use READ COMMITTED transaction isolation level instead, see |
2)強制恢復1
2018-01-21 20:02:29 340095 [ERROR] InnoDB: Space id in fsp header 1316159744,but in the page header 33554432 InnoDB: Error: tablespace id is 2163 in the data dictionary InnoDB: but in file ./cmon/cmon_job.ibd it is 18446744073709551615! 2018-01-21 20:02:29 7fa4c9b82700 InnoDB: Assertion failure in thread 140345735653120 in file fil0fil.cc line 796 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: about forcing recovery. 12:02:29 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=0 read_buffer_size=131072 max_used_connections=0 max_threads=10000 thread_count=2 connection_count=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3981875 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x3b)[0x904aeb] /usr/sbin/mysqld(handle_fatal_signal+0x491)[0x68dc71] /lib64/libpthread.so.0(+0xf370)[0x7fa6db20c370] /lib64/libc.so.6(gsignal+0x37)[0x7fa6da0131d7] /lib64/libc.so.6(abort+0x148)[0x7fa6da0148c8] /usr/sbin/mysqld[0xacfd89] /usr/sbin/mysqld[0xacff9c] /usr/sbin/mysqld[0xad7a6b] /usr/sbin/mysqld[0xaa020b] /usr/sbin/mysqld[0xa86e40] /usr/sbin/mysqld[0xa6bae3] /usr/sbin/mysqld[0xa10836] /usr/sbin/mysqld[0xa0d700] /usr/sbin/mysqld[0xa0e4ca] /usr/sbin/mysqld[0xa0ee08] /usr/sbin/mysqld[0x9dd365] /usr/sbin/mysqld[0xa359ae] /usr/sbin/mysqld[0xa27a2c] /lib64/libpthread.so.0(+0x7dc5)[0x7fa6db204dc5] /lib64/libc.so.6(clone+0x6d)[0x7fa6da0d576d] |
3)強制恢復2
2018-01-21 20:03:51 341253 [Note] WSREP: Service thread queue flushed. 2018-01-21 20:03:51 341253 [Note] WSREP: GCache history reset: old(bbdd25de-fe77-11e7-9e6f-2b7b75cdd72a:0) -> new(bbdd25de-fe77-11e7-9e6f-2b7b75cdd72a:3644) 2018-01-21 20:03:51 341253 [Note] WSREP: Synchronized with group, ready for connections 2018-01-21 20:03:51 341253 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. InnoDB: A new raw disk partition was initialized or InnoDB: innodb_force_recovery is on: we do not allow InnoDB: database modifications by the user. Shut down InnoDB: mysqld and edit my.cnf so that newraw is replaced InnoDB: with raw, and innodb_force_... is removed. InnoDB: A new raw disk partition was initialized or InnoDB: innodb_force_recovery is on: we do not allow InnoDB: database modifications by the user. Shut down InnoDB: mysqld and edit my.cnf so that newraw is replaced InnoDB: with raw, and innodb_force_... is removed. InnoDB: A new raw disk partition was initialized or InnoDB: innodb_force_recovery is on: we do not allow InnoDB: database modifications by the user. Shut down InnoDB: mysqld and edit my.cnf so that newraw is replaced InnoDB: with raw, and innodb_force_... is removed. InnoDB: A new raw disk partition was initialized or InnoDB: innodb_force_recovery is on: we do not allow InnoDB: database modifications by the user. Shut down InnoDB: mysqld and edit my.cnf so that newraw is replaced InnoDB: with raw, and innodb_force_... is removed. …… …… InnoDB: Error: tablespace id is 2168 in the data dictionary InnoDB: but in file ./cmon/backup.ibd it is 0! 2018-01-21 20:06:24 7f26f0255700 InnoDB: Assertion failure in thread 139805214463744 in file fil0fil.cc line 796 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: about forcing recovery. |
Keystone服務報錯:
Operation not allowed when innodb_forced_recovery > 0
Mysql可以登入,查表,但無法dump。在innodb_forced_recovery > 0模式下各個庫表是隻讀模式。
4)強制恢復3
錯誤和強制恢復2一致
5)強制恢復4
2018-01-21 20:10:43 350749 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."networkdhcpagentbindings"' in the cache. Attempting to load the tablespace with space id 385. 2018-01-21 20:10:47 350749 [Warning] InnoDB: Allocated tablespace 385, old maximum was 0 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."agents"' in the cache. Attempting to load the tablespace with space id 578. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."bgp_speaker_dragent_bindings"' in the cache. Attempting to load the tablespace with space id 576. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_hosting_devices"' in the cache. Attempting to load the tablespace with space id 474. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_router_mappings"' in the cache. Attempting to load the tablespace with space id 476. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."ha_router_agent_port_bindings"' in the cache. Attempting to load the tablespace with space id 392. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."poolloadbalanceragentbindings"' in the cache. Attempting to load the tablespace with space id 441. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."routerl3agentbindings"' in the cache. Attempting to load the tablespace with space id 390. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_LOCKS"' in the cache. Attempting to load the tablespace with space id 1400. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SCHEDULER_STATE"' in the cache. Attempting to load the tablespace with space id 1399. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_FIRED_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1398. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1391. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_BLOB_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1395. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_CRON_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1393. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPLE_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1392. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPROP_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1394. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_PAUSED_TRIGGER_GRPS"' in the cache. Attempting to load the tablespace with space id 1397. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_JOB_DETAILS"' in the cache. Attempting to load the tablespace with space id 1390. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"zabbix"."escalations"' in the cache. Attempting to load the tablespace with space id 1437. |
Keystone報錯Can't lock file (errno: 165 - Table is read only
備份資料庫:
/usr/bin/sh /opt/backup/shell/backupmysql.sh
Warning: Using a password on the command line interface can be insecure.
Error: Couldn't read status information for table backup ()
mysqldump: Couldn't execute 'show create table `backup`': Table 'cmon.backup' doesn't exist (1146)
但如下:
mysql -uroot -p`cat /etc/contrail/mysql.token` aodh ceilometer cinder glance heat keystone mysql neutron nova nova_api zabbix > ./aaa.sql
說明是cmon表出錯。
6)強制恢復5
2018-01-21 20:10:43 350749 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."networkdhcpagentbindings"' in the cache. Attempting to load the tablespace with space id 385. 2018-01-21 20:10:47 350749 [Warning] InnoDB: Allocated tablespace 385, old maximum was 0 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."agents"' in the cache. Attempting to load the tablespace with space id 578. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."bgp_speaker_dragent_bindings"' in the cache. Attempting to load the tablespace with space id 576. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_hosting_devices"' in the cache. Attempting to load the tablespace with space id 474. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_router_mappings"' in the cache. Attempting to load the tablespace with space id 476. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."ha_router_agent_port_bindings"' in the cache. Attempting to load the tablespace with space id 392. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."poolloadbalanceragentbindings"' in the cache. Attempting to load the tablespace with space id 441. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."routerl3agentbindings"' in the cache. Attempting to load the tablespace with space id 390. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_LOCKS"' in the cache. Attempting to load the tablespace with space id 1400. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SCHEDULER_STATE"' in the cache. Attempting to load the tablespace with space id 1399. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_FIRED_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1398. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1391. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_BLOB_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1395. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_CRON_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1393. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPLE_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1392. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPROP_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1394. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_PAUSED_TRIGGER_GRPS"' in the cache. Attempting to load the tablespace with space id 1397. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_JOB_DETAILS"' in the cache. Attempting to load the tablespace with space id 1390. 2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"zabbix"."escalations"' in the cache. Attempting to load the tablespace with space id 1437. |
恢復步驟:
A, B兩臺controller,均出現上述問題。
0)備份A B兩臺的/var/lib/mysql資料夾
1)A –>強制恢復4,這樣雖然資料庫是readonly的,但是仍然是可以dump的,一個個資料庫進行dump,現場的情況是cmon資料庫無法dump。這樣產生一個dump檔案。
2)B,刪除/var/lib/mysql下的所有檔案,然後進行如下:
-- mysql_install_db --user=mysql
-- mysqld_safe --wsrep-new-cluster &
-- sudo -E /usr/local/security/kolla_security_reset
-- mysql -u root --password="${DB_ROOT_PASSWORD}" -e "GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY '${DB_ROOT_PASSWORD}' WITH GRANT OPTION;"
-- mysql -u root --password="${DB_ROOT_PASSWORD}" -e "GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '${DB_ROOT_PASSWORD}' WITH GRANT OPTION;"
-- mysql -u root --password="${DB_ROOT_PASSWORD}" -e "CREATE USER 'haproxy'@'%' IDENTIFIED BY '';"
-- mysqladmin -uroot -p"${DB_ROOT_PASSWORD}" shutdown
3)B的資料庫重建好了,將第一步驟產生的dump檔案匯入,並啟動資料庫,之所以這樣是因為B沒有各個表的使用者名稱及密碼;
4)將B中的/var/lib/mysql/下的各個database資料夾拷貝至A中,並且chown –R mysql:mysql
5)A的強制恢復去掉,正常啟動/etc/init.d/mysql start --wsrep-new-cluster
6)刪除A的cmon資料夾,進行cmon恢復
-- mysql -u root -p`cat /etc/contrail/mysql.token` -e "CREATE SCHEMA IF NOT EXISTS cmon"
-- mysql -u root -p`cat /etc/contrail/mysql.token` < /usr/share/cmon/cmon_db.sql
-- mysql -u root -p`cat /etc/contrail/mysql.token` < /usr/share/cmon/cmon_data.sql
-- mysql -u root -p`cat /etc/contrail/mysql.token` -e "use cmon; insert into cluster(type) VALUES ('galera')"
7)完成恢復
20180319:
刪除/var/lib/mysql後,動作如下:
1)mysql_install_db --user=mysql
2)mysqld_safe --wsrep-new-cluster &
3)sudo -E /usr/local/security/kolla_security_reset
4)再分別執行上述幾個建表和使用者許可權設定的語句
總結:
上述斷電場景是innodb和cmon庫表文件不一致導致的,主要恢復思路是強制恢復,此時資料庫為只讀,這時候mysql會報告不同步的錯誤,直接忽略,將這些庫表dump出來至一個空的mysql中,利用這個空的mysql產生新的innodb檔案。