mysql導入太慢解決方法
半調子數據科學家又要折騰數據,拿到數據一看,3.6G的zip文件,解壓看看,臥槽12個G的sql文件。好吧,又要折騰sql數據了。第一件事,肯定是搭一個數據庫,導入數據咯。
折騰過sql導入的親們都知道,mysql默認的參數,導入的速度還是很慢的,特別是數據忒多的情況。這次的數據,折騰完了之後,有1000W條那麽多,不用猜也知道,慢的要死,所以需要對數據庫做一些設置。
可以設置的地方有兩個,第一個是innodb_flush_log_at_trx_commit。官方手冊對各個值解釋如下:
Controls the balance between strict ACID compliance for commit operations and higher performance that is possible when commit-related I/O operations are rearranged and done in batches. You can achieve better performance by changing the default value but then you can lose up to a second of transactions in a crash. The default value of 1 is required for full ACID compliance. With this value, the contents of the InnoDB log buffer are written out to the log file at each transaction commit and the log file is flushed to disk. With a value of 0, the contents of the InnoDB log buffer are written to the log file approximately once per second and the log file is flushed to disk. No writes from the log buffer to the log file are performed at transaction commit. Once-per-second flushing is not guaranteed to happen every second due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions with any mysqld process crash. With a value of 2, the contents of the InnoDB log buffer are written to the log file after each transaction commit and the log file is flushed to disk approximately once per second. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions in an operating system crash or a power outage. InnoDB log flushing frequency is controlled by innodb_flush_log_at_timeout, which allows you to set log flushing frequency to N seconds (where N is 1 ... 2700, with a default value of 1). However, any mysqld process crash can erase up to N seconds of transactions. DDL changes and other internal InnoDB activities flush the InnoDB log independent of the innodb_flush_log_at_trx_commit setting. InnoDB crash recovery works regardless of the innodb_flush_log_at_trx_commit setting. Transactions are either applied entirely or erased entirely. For durability and consistency in a replication setup that uses InnoDB with transactions: If binary logging is enabled, set sync_binlog=1. Always set innodb_flush_log_at_trx_commit=1. Caution Many operating systems and some disk hardware fool the flush-to-disk operation. They may tell mysqld that the flush has taken place, even though it has not. In this case, the durability of transactions is not guaranteed even with the setting 1, and in the worst case, a power outage can corrupt InnoDB data. Using a battery-backed disk cache in the SCSI disk controller or in the disk itself speeds up file flushes, and makes the operation safer. You can also try to disable the caching of disk writes in hardware caches.
也就是
- 1 默認值,最慢,每次事務提交都要寫入log並刷新到磁盤上,這是最保險的方式
- 0 最快,每隔1S將log刷新到磁盤,但是不保證。事務提交不會觸發log寫入。很不安全,mysql掛了,那麽上一秒的數據就都丟了。
- 2 折中的一種,事務提交會寫入log,但是log刷新還是每秒一次,不保證。這種時候,就算mysql崩了,但是只要操作系統還在運轉,數據還是會被寫到磁盤上。
這裏提到,有些磁盤系統,就算是刷新也無法保證數據確實被寫入了,筆者就碰到過文件copy到硬盤(機械硬盤)上,機器死掉了,重啟之後,只有不到一半的數據還在。查了才知道,數據只是被寫入硬盤的緩存上了,還沒有寫入硬盤。
這個參數可以在my.ini裏面設置,但是我們只是臨時用一下,而且我本地用的是docker的mysql,弄配置文件比較麻煩,所以直接在mysql命令行裏面設置就可以了。
mysql> set GLOBAL innodb_flush_log_at_trx_commit = 0;
第二個可以設置的地方,在導入sql時候使用的參數:
net_buffer_length
Each client thread is associated with a connection buffer and result buffer. Both begin with a size given by net_buffer_length but are dynamically enlarged up to max_allowed_packet bytes as needed. The result buffer shrinks to net_buffer_length after each SQL statement. This variable should not normally be changed, but if you have very little memory, you can set it to the expected length of statements sent by clients. If statements exceed this length, the connection buffer is automatically enlarged. The maximum value to which net_buffer_length can be set is 1MB.
max_allowed_packet
The maximum size of one packet or any generated/intermediate string, or any parameter sent by the mysql_stmt_send_long_data() C API function. The default is 4MB.
The packet message buffer is initialized to net_buffer_length bytes, but can grow up to max_allowed_packet bytes when needed. This value by default is small, to catch large (possibly incorrect) packets.
You must increase this value if you are using large BLOB columns or long strings. It should be as big as the largest BLOB you want to use. The protocol limit for max_allowed_packet is 1GB. The value should be a multiple of 1024; nonmultiples are rounded down to the nearest multiple.
When you change the message buffer size by changing the value of the max_allowed_packet variable, you should also change the buffer size on the client side if your client program permits it. The default max_allowed_packet value built in to the client library is 1GB, but individual client programs might override this. For example, mysql and mysqldump have defaults of 16MB and 24MB, respectively. They also enable you to change the client-side value by setting max_allowed_packet on the command line or in an option file.
The session value of this variable is read only. The client can receive up to as many bytes as the session value. However, the server will not send to the client more bytes than the current global max_allowed_packet value. (The global value could be less than the session value if the global value is changed after the client connects.)
需要註意的事,需要先確定服務端的設置,客戶端的設置不能大於服務端設置。
mysql>show variables like ‘max_allowed_packet‘;
mysql>show variables like ‘net_buffer_length‘;
事實上,我用的mariadb的docker,這兩個值的設置已經非常大了。而且官方也提到,mysql命令行裏面的默認設置是足夠大的,不過我測試的結果,還是寫上去,速度會快一點,不曉得為啥。
mysql -h127.0.0.1 -uroot -proot123 data_base_name --max_allowed_packet=16777216 --net_buffer_length=16384<your_sql_script.sql
不過,雖說速度快了很多,但是也是幾個小時的功夫才折騰完,這一次的數據文本居多,不知道是不是因為這個,還是有什麽別的設置我不知道的。
順便說一句,後面為了方便還是把數據折騰到mongo裏面了,數據占的空間大了挺多,但是同樣是單線程操作,中間還加了挺多數據處理,但是一小時之內就搞定了。
半調子數據科學家,還要繼續折騰數據。。。
(* ̄︿ ̄)
mysql導入太慢解決方法