MYSQL 5.7 VARCHAR 類型實驗
MYSQL 的VARCHAR 類型字段的最多能存儲多少字符?模糊記得 VARCHAR 最多能存65535個字符,真的嗎?
理論上,一個字符類型能存的字符數量跟選取的編碼字符集和存儲長度限制肯定是有關系的,字符編碼長度越小,長度上限越大,能存的字符就越多。
OK!我們先用字符編碼長度最小的latin1做測試:
[testdb]> create table t5(name varchar(65535)) charset=latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
[testdb]> create table t5(name varchar(65534)) charset=latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
[testdb]> create table t5(name varchar(65533)) charset=latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
[testdb]> create table t5(name varchar(65532)) charset=latin1;
Query OK, 0 rows affected (0.01 sec)
一番折騰下來,我們發現被 Row size 限制了,不過測試結果很明顯,使用 latin1字符編碼時varchar最多能存 65532ge字符,真的如此嗎?
答案是 NO!
這個結論明顯經不起推敲,參考文檔,VARCHAR存儲長度超過255的字符串時,需要使用2個字節的前綴表示存儲字符串占用的存儲空間長度(字節數)。
(2個字節16bit,2^16-1=65535 這也從從另一個層面解釋了65535 字節這個限制)
參考MYSQL 5.7 官檔:
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535.
The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
See Section C.10.4, “Limits on Table Column Count and Row Size”.
In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data.
The length prefix indicates the number of bytes in the value.
A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
那麽,65535-2 =65533 ,但是 create table t5(name varchar(65533)) charset=latin1 依然執行失敗了,why?
因為我們忽略了行格式中的 null 標誌位,因為我們的表只定義了一個字段,所以標誌位需要占用行的一個字節(關於null標誌位這裏不延伸)。
將name字段定義字段為not null 即可以關閉null 標誌位,繼續測試:
root@localhost 17:00: [testdb]> create table t6(name varchar(65534) not null) charset=latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
root@localhost 17:00: [testdb]> create table t6(name varchar(65533) not null) charset=latin1;
Query OK, 0 rows affected (0.01 sec)
OK!測試符合理論!
那麽在 utf8mb4 下最多能存多少個字符呢?
首先我們來看下試驗環境的字符集和行格式相關設置,MYSQL 版本是5.7.22。數據庫默認字符集是 utf8mb4
[testdb]> show variables like ‘%char%‘;
+--------------------------+----------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/mysql-5.7.22-linux-glibc2.12-x86_64/share/charsets/ |
+--------------------------+----------------------------------------------------------------+
8 rows in set (0.00 sec)
創建一個表,指定字段長度為65535:
[testdb]> create table t3(name varchar(65535) primary key);
ERROR 1074 (42000): Column length too big for column ‘name‘ (max = 16383); use BLOB or TEXT instead
根據以上錯誤信息提示,字段長度最大值為16383;為什麽是16383這個值,而不是其他值?
首先依然是被 65,535這個行長度限制了,我們來看看官檔中關於 Row size 的描述。
Row Size Limits
The maximum row size for a given table is determined by several factors:
The internal representation of a MySQL table has a maximum row size limit of 65,535 bytes, even if the storage engine is capable of supporting larger rows.
BLOB and TEXT columns only contribute 9 to 12 bytes toward the row size limit because their contents are stored separately from the rest of the row.
也就是說,即使你的存儲引擎支持更大的行長度,但是MYSQL 依然限制 Row size為65535;
BLOB and TEXT 這兩種類型字段只占用行存儲的9-12個字節,其他的內容分開存儲。
其次創建表時沒有指定表的字符集,所以默認繼承數據庫字符集 utf8mb4;
在utf8mb4 編碼中,字符的最大編碼長度是4,比如中文;
所以為了保證存儲的字符串實際存儲空間小於65535字節,字符串長度不能大於 floor(65535/4)=16383
但是以16383長度再次創建表格,依然提示錯誤,why?
[testdb]> create table t3(name varchar(16383) primary key);
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes
註意看提示信息!這次不再是提示 Column length too big ,而是 Specified key was too long;
Look 下面的官方描述:
Both DYNAMIC and COMPRESSED row formats support index key prefixes up to 3072 bytes.
This feature is controlled by the innodb_large_prefix configuration option, which is enabled by default.
See the innodb_large_prefix option description for more information.
原來 DYNAMIC and COMPRESSED 行格式默認支持索引長度不能超過3072字節.
而我們的 name是聚集索引,整個字段值作為索引鍵值,所以索引長度必然超限。
而且它還告訴我們,可通過 innodb_large_prefix這個變量來控制這個特性。
檢查下我們的試驗環境,行格式剛好是 dynamic :
[testdb]> show variables like ‘%format%‘;
+---------------------------+-------------------+
| Variable_name | Value |
+---------------------------+-------------------+
| binlog_format | ROW |
| date_format | %Y-%m-%d |
| datetime_format | %Y-%m-%d %H:%i:%s |
| default_week_format | 0 |
| innodb_default_row_format | dynamic |
| innodb_file_format | Barracuda |
| innodb_file_format_check | ON |
| innodb_file_format_max | Barracuda |
| time_format | %H:%i:%s |
+---------------------------+-------------------+
3072字節除以 utf8mb4 的最大編碼長度4字節,在主鍵字段上長度上限應該是768,測試如下:
[testdb]> create table t4(name varchar(769) primary key) charset=utf8mb4;
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes
[testdb]> create table t4(name varchar(768) primary key) charset=utf8mb4;
Query OK, 0 rows affected (0.01 sec)
不出所料,769長度字段建表失敗,768長度字段建表成功。
現在拋開索引長度的限制,再次測試:
[testdb]> create table t41(name varchar(16383) not null) charset=utf8mb4;
Query OK, 0 rows affected (0.02 sec)
建表成功!
基於以上理論和實驗:
在utf8 編碼字符集中,字符的最大編碼長度是3字節,比如中文;所以如果 name作為主鍵,這個字段字符長度不能超過 3072/3=1024;
[testdb]> create table t3(name varchar(1025) primary key) charset=utf8;
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes
[testdb]> create table t3(name varchar(1024) primary key) charset=utf8;
Query OK, 0 rows affected (0.01 sec)
在utf8 編碼字符集環境中,如果不使用索引,基於驗證上面的理論 65535/3= 21845:
[testdb]> create table t32(name varchar(21845) not null ) charset=utf8;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
建表語句依然報錯?因為 "VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data."
存儲空間字符串前綴需要占用2個字節,所以創建失敗。
[testdb]> create table t32(name varchar(21844) not null ) charset=utf8;
Query OK, 0 rows affected (0.01 sec)
建表成功了!
結論:
在latin1 編碼字符集中,VARCHAR 類型字段最多能存儲65533 個字符;
在utf8 編碼字符集中,VARCHAR 類型字段最多能存儲21844 個字符;
在utf8mb4 編碼字符集中,VARCHAR 類型字段最多能存儲16383 個字符;
以上是關於VARCHAR 類型字段存儲字符長度,行長度以及索引長度的限制的一個小試驗!
不妥之處歡迎指正!
MYSQL 5.7 VARCHAR 類型實驗