1. 程式人生 > >異常斷電mysql叢集資料庫恢復

異常斷電mysql叢集資料庫恢復

說明:專案相關,有些命令不一定適用於其他場景,僅供參考

20180121日,xxxx專案,超融合異常掉電,導致資料庫啟動不了。 

首先要備份/var/lib/mysql資料夾!!!!

恢復:

1)不強制恢復

180121 20:00:37 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql//wsrep_recovery.7565d7' --pid-file='/var/lib/mysql//plhcs_controller_3-recover.pid'

2018-01-21 20:00:38 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).

180121 20:00:38 mysqld_safe WSREP: Failed to recover position:  2018-01-21 20:00:38 338811 [Warning] Using unique option prefix myisam_recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead. 2018

-01-21 20:00:38 338811 [Note] Plugin 'FEDERATED' is disabled. 2018-01-21 20:00:38 7f64279b9740 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future releases. Please use READ COMMITTED transaction isolation level instead, see 

http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html. 2018-01-21 20:00:38 338811 [Note] InnoDB: Using atomics to ref count buffer pool pages 2018-01-21 20:00:38 338811 [Note] InnoDB: The InnoDB memory heap is disabled 2018-01-21 20:00:38 338811 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-01-21 20:00:38 338811 [Note] InnoDB: Memory barrier is not used 2018-01-21 20:00:38 338811 [Note] InnoDB: Compressed tables use zlib 1.2.3 2018-01-21 20:00:38 338811 [Note] InnoDB: Using Linux native AIO 2018-01-21 20:00:38 338811 [Note] InnoDB: Using CPU crc32 instructions 2018-01-21 20:00:38 338811 [Note] InnoDB: Initializing buffer pool, size = 4.9G 2018-01-21 20:00:38 338811 [Note] InnoDB: Completed initialization of buffer pool 2018-01-21 20:00:38 338811 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-21 20:00:38 338811 [Note] InnoDB: Log scan progressed past the checkpoint lsn 1956901700 2018-01-21 20:00:38 338811 [Note] InnoDB: Database was not shutdown normally! 2018-01-21 20:00:38 338811 [Note] InnoDB: Starting crash recovery. 2018-01-21 20:00:38 338811 [Note] InnoDB: Reading tablespace information from the .ibd files... 2018-01-21 20:00:38 338811 [ERROR] InnoDB: space header page consists of zero bytes in tablespace ./cmon/cmon_stats.ibd (table cmon/cmon_stats)
 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:1024 Pages to analyze:64 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 1024, Possible space_id count:0 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:2048 Pages to analyze:56 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 2048, Possible space_id count:0 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:4096 Pages to analyze:28 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 4096, Possible space_id count:0 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:8192 Pages to analyze:14 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 8192, Possible space_id count:0 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:16384 Pages to analyze:7 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 16384, Possible space_id count:0 2018-01-21 20:00:38 7f64279b9740 InnoDB: Operating system error number 2 in a file operation. InnoDB: The error means the system cannot find the path specified. InnoDB: If you are installing InnoDB, remember that you must create InnoDB: directories yourself, InnoDB does not create them. InnoDB: Error: could not open single-table tablespace file ./cmon/cmon_stats.ibd InnoDB: We do not continue the crash recovery, because the table may become InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it. InnoDB: To fix the problem and start mysqld: InnoDB: 1) If there is a permission problem in the file and mysqld cannot InnoDB: open the file, you should modify the permissions. InnoDB: 2) If the table is not needed, or you can restore it from a backup, InnoDB: then you can remove the .ibd file, and InnoDB will do a normal InnoDB: crash recovery and ignore that table. InnoDB: 3) If the file system or the disk is broken, and you cannot remove InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf InnoDB: and force InnoDB to continue crash recovery here.

2)強制恢復1

2018-01-21 20:02:29 340095 [ERROR] InnoDB: Space id in fsp header 1316159744,but in the page header 33554432

InnoDB: Error: tablespace id is 2163 in the data dictionary

InnoDB: but in file ./cmon/cmon_job.ibd it is 18446744073709551615!

2018-01-21 20:02:29 7fa4c9b82700  InnoDB: Assertion failure in thread 140345735653120 in file fil0fil.cc line 796

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to http://bugs.mysql.com.

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: about forcing recovery.

12:02:29 UTC - mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

key_buffer_size=0

read_buffer_size=131072

max_used_connections=0

max_threads=10000

thread_count=2

connection_count=2

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3981875 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0 thread_stack 0x40000

/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0x904aeb]

/usr/sbin/mysqld(handle_fatal_signal+0x491)[0x68dc71]

/lib64/libpthread.so.0(+0xf370)[0x7fa6db20c370]

/lib64/libc.so.6(gsignal+0x37)[0x7fa6da0131d7]

/lib64/libc.so.6(abort+0x148)[0x7fa6da0148c8]

/usr/sbin/mysqld[0xacfd89]

/usr/sbin/mysqld[0xacff9c]

/usr/sbin/mysqld[0xad7a6b]

/usr/sbin/mysqld[0xaa020b]

/usr/sbin/mysqld[0xa86e40]

/usr/sbin/mysqld[0xa6bae3]

/usr/sbin/mysqld[0xa10836]

/usr/sbin/mysqld[0xa0d700]

/usr/sbin/mysqld[0xa0e4ca]

/usr/sbin/mysqld[0xa0ee08]

/usr/sbin/mysqld[0x9dd365]

/usr/sbin/mysqld[0xa359ae]

/usr/sbin/mysqld[0xa27a2c]

/lib64/libpthread.so.0(+0x7dc5)[0x7fa6db204dc5]

/lib64/libc.so.6(clone+0x6d)[0x7fa6da0d576d]

3)強制恢復2

2018-01-21 20:03:51 341253 [Note] WSREP: Service thread queue flushed.

2018-01-21 20:03:51 341253 [Note] WSREP: GCache history reset: old(bbdd25de-fe77-11e7-9e6f-2b7b75cdd72a:0) -> new(bbdd25de-fe77-11e7-9e6f-2b7b75cdd72a:3644)

2018-01-21 20:03:51 341253 [Note] WSREP: Synchronized with group, ready for connections

2018-01-21 20:03:51 341253 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

InnoDB: A new raw disk partition was initialized or

InnoDB: innodb_force_recovery is on: we do not allow

InnoDB: database modifications by the user. Shut down

InnoDB: mysqld and edit my.cnf so that newraw is replaced

InnoDB: with raw, and innodb_force_... is removed.

InnoDB: A new raw disk partition was initialized or

InnoDB: innodb_force_recovery is on: we do not allow

InnoDB: database modifications by the user. Shut down

InnoDB: mysqld and edit my.cnf so that newraw is replaced

InnoDB: with raw, and innodb_force_... is removed.

InnoDB: A new raw disk partition was initialized or

InnoDB: innodb_force_recovery is on: we do not allow

InnoDB: database modifications by the user. Shut down

InnoDB: mysqld and edit my.cnf so that newraw is replaced

InnoDB: with raw, and innodb_force_... is removed.

InnoDB: A new raw disk partition was initialized or

InnoDB: innodb_force_recovery is on: we do not allow

InnoDB: database modifications by the user. Shut down

InnoDB: mysqld and edit my.cnf so that newraw is replaced

InnoDB: with raw, and innodb_force_... is removed.

……

……

InnoDB: Error: tablespace id is 2168 in the data dictionary

InnoDB: but in file ./cmon/backup.ibd it is 0!

2018-01-21 20:06:24 7f26f0255700  InnoDB: Assertion failure in thread 139805214463744 in file fil0fil.cc line 796

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to http://bugs.mysql.com.

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: about forcing recovery.

Keystone服務報錯:
Operation not allowed when innodb_forced_recovery > 0

Mysql可以登入,查表,但無法dump。在innodb_forced_recovery > 0模式下各個庫表是隻讀模式。

4)強制恢復3

錯誤和強制恢復2一致

5)強制恢復4

2018-01-21 20:10:43 350749 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."networkdhcpagentbindings"' in the cache. Attempting to load the tablespace with space id 385.

2018-01-21 20:10:47 350749 [Warning] InnoDB: Allocated tablespace 385, old maximum was 0

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."agents"' in the cache. Attempting to load the tablespace with space id 578.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."bgp_speaker_dragent_bindings"' in the cache. Attempting to load the tablespace with space id 576.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_hosting_devices"' in the cache. Attempting to load the tablespace with space id 474.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_router_mappings"' in the cache. Attempting to load the tablespace with space id 476.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."ha_router_agent_port_bindings"' in the cache. Attempting to load the tablespace with space id 392.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."poolloadbalanceragentbindings"' in the cache. Attempting to load the tablespace with space id 441.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."routerl3agentbindings"' in the cache. Attempting to load the tablespace with space id 390.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_LOCKS"' in the cache. Attempting to load the tablespace with space id 1400.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SCHEDULER_STATE"' in the cache. Attempting to load the tablespace with space id 1399.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_FIRED_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1398.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1391.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_BLOB_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1395.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_CRON_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1393.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPLE_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1392.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPROP_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1394.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_PAUSED_TRIGGER_GRPS"' in the cache. Attempting to load the tablespace with space id 1397.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_JOB_DETAILS"' in the cache. Attempting to load the tablespace with space id 1390.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"zabbix"."escalations"' in the cache. Attempting to load the tablespace with space id 1437.

Keystone報錯Can't lock file (errno: 165 - Table is read only

備份資料庫:

/usr/bin/sh /opt/backup/shell/backupmysql.sh

Warning: Using a password on the command line interface can be insecure.

Error: Couldn't read status information for table backup ()

mysqldump: Couldn't execute 'show create table `backup`': Table 'cmon.backup' doesn't exist (1146)

但如下:
mysql  -uroot -p`cat /etc/contrail/mysql.token` aodh ceilometer cinder glance heat keystone mysql neutron nova nova_api zabbix > ./aaa.sql

說明是cmon表出錯。

6)強制恢復5

2018-01-21 20:10:43 350749 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."networkdhcpagentbindings"' in the cache. Attempting to load the tablespace with space id 385.

2018-01-21 20:10:47 350749 [Warning] InnoDB: Allocated tablespace 385, old maximum was 0

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."agents"' in the cache. Attempting to load the tablespace with space id 578.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."bgp_speaker_dragent_bindings"' in the cache. Attempting to load the tablespace with space id 576.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_hosting_devices"' in the cache. Attempting to load the tablespace with space id 474.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_router_mappings"' in the cache. Attempting to load the tablespace with space id 476.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."ha_router_agent_port_bindings"' in the cache. Attempting to load the tablespace with space id 392.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."poolloadbalanceragentbindings"' in the cache. Attempting to load the tablespace with space id 441.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."routerl3agentbindings"' in the cache. Attempting to load the tablespace with space id 390.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_LOCKS"' in the cache. Attempting to load the tablespace with space id 1400.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SCHEDULER_STATE"' in the cache. Attempting to load the tablespace with space id 1399.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_FIRED_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1398.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1391.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_BLOB_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1395.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_CRON_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1393.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPLE_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1392.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPROP_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1394.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_PAUSED_TRIGGER_GRPS"' in the cache. Attempting to load the tablespace with space id 1397.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_JOB_DETAILS"' in the cache. Attempting to load the tablespace with space id 1390.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"zabbix"."escalations"' in the cache. Attempting to load the tablespace with space id 1437.

恢復步驟:

A, B兩臺controller,均出現上述問題。

0)備份A B兩臺的/var/lib/mysql資料夾

1)A –>強制恢復4,這樣雖然資料庫是readonly的,但是仍然是可以dump的,一個個資料庫進行dump,現場的情況是cmon資料庫無法dump。這樣產生一個dump檔案。

2)B,刪除/var/lib/mysql下的所有檔案,然後進行如下:

--  mysql_install_db --user=mysql

--  mysqld_safe --wsrep-new-cluster &

--  sudo -E /usr/local/security/kolla_security_reset

-- mysql -u root --password="${DB_ROOT_PASSWORD}" -e "GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY '${DB_ROOT_PASSWORD}' WITH GRANT OPTION;"

-- mysql -u root --password="${DB_ROOT_PASSWORD}" -e "GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '${DB_ROOT_PASSWORD}' WITH GRANT OPTION;"

-- mysql -u root --password="${DB_ROOT_PASSWORD}" -e "CREATE USER 'haproxy'@'%' IDENTIFIED BY '';"

--  mysqladmin -uroot -p"${DB_ROOT_PASSWORD}" shutdown

3)B的資料庫重建好了,將第一步驟產生的dump檔案匯入,並啟動資料庫,之所以這樣是因為B沒有各個表的使用者名稱及密碼;

4)將B中的/var/lib/mysql/下的各個database資料夾拷貝至A中,並且chown –R mysql:mysql

5)A的強制恢復去掉,正常啟動/etc/init.d/mysql start --wsrep-new-cluster

6)刪除A的cmon資料夾,進行cmon恢復

-- mysql -u root -p`cat /etc/contrail/mysql.token` -e "CREATE SCHEMA IF NOT EXISTS cmon"

-- mysql -u root -p`cat /etc/contrail/mysql.token` < /usr/share/cmon/cmon_db.sql

-- mysql -u root -p`cat /etc/contrail/mysql.token` < /usr/share/cmon/cmon_data.sql

-- mysql -u root -p`cat /etc/contrail/mysql.token` -e "use cmon; insert into cluster(type) VALUES ('galera')"

7)完成恢復

20180319:
刪除/var/lib/mysql後,動作如下:
1)mysql_install_db --user=mysql

2)mysqld_safe --wsrep-new-cluster &

3)sudo -E /usr/local/security/kolla_security_reset
4)再分別執行上述幾個建表和使用者許可權設定的語句

總結:

上述斷電場景是innodb和cmon庫表文件不一致導致的,主要恢復思路是強制恢復,此時資料庫為只讀,這時候mysql會報告不同步的錯誤,直接忽略,將這些庫表dump出來至一個空的mysql中,利用這個空的mysql產生新的innodb檔案。