1. 程式人生 > >MySQL基礎知識:Character Set和Collation

MySQL基礎知識:Character Set和Collation

**A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set.** - ```Character Set```: 一套字元及其編碼,即字符集;(文中很多地方也用 ```charset``` 一詞) - ```Collation```:在字符集內用於比較或排序字元的一套規則,即校驗規則。 作業系統環境為```MacOS Catalina```, MySQL版本為: ```8.0.13 MySQL Community Server - GPL```。 # MySQL Charset和Collation簡述 - MySQL伺服器支援多種字符集(Character Set) - 每個字符集至少有一個Collation - 大部分字符集都有多個Collation - 每個字符集都有一個預設的Collation - 兩個不同的字符集不會有相同的Collation - MySQL可以在伺服器、資料庫、表或欄位 級別 指定使用的字符集 ### Collation字尾 |Suffix|Meaning| |----|----| |_ai| Accent-insensitive |_as| Accent-sensitive |_ci| Case-insensitive |_cs| Case-sensitive |_ks| Kana-sensitive |_bin| Binary - 非binary的collation,可以不顯示指定 ```_ai``` 或 ```_as```,```_ci```會同時隱含```_ai```,```_cs```會同時隱含```_as```。如:```latin1_general_ci```和```latin1_general_ai_ci```是一樣的。 - ```utf8mb4_0900_ai_ci``` 中的 0900:Unicode的字符集的Collation需要指定UCA(Unicode Collation Algorithm)版本號, ```0900```即是這個版本號;如果沒有指定,則會使用 ```version-4.0.0 UCA```,如 ```utf8mb4_unicode_ci```。 # MySQL Server的Charset和Collation ### 檢視MySQL Server支援的Charset 有很多種方法可以檢視當前MySQL Server支援的Character Set: ```sql show character set; -- 方法1 show charset; -- 方法2 show char set; -- 方法3 select * from information_schema.character_sets; -- 方法 4 ``` **檢視特定字符集資訊**(主要包含預設的Collation和MAXLEN): ```sql show character set like 'utf%';; -- 方法1 show charset like 'utf%';; -- 方法2 show char set like 'utf%'; -- 方法3 select * from information_schema.character_sets where CHARACTER_SET_NAME like 'utf%'; -- 方法 4 ``` ### 檢視MySQL Server支援的Collation ```sql SHOW COLLATION WHERE Charset = 'utf8mb4'; ``` 或者 ```sql select * from INFORMATION_SCHEMA.COLLATIONS where CHARACTER_SET_NAME='utf8mb4'; ``` ### 檢視MySQL Server當前的Charset和Collation ```sql show variables like 'character_set_server'; ``` ```sql show variables like 'collation_server'; ``` 或者使用語句: ```sql select @@character_set_server, @@collation_server; ``` ### MySQL Server預設的Charset和Collation MySQL官方文件可檢視預設的Charset和Collation: - [<=5.7 doc](https://dev.mysql.com/doc/refman/5.7/en/charset.html) MySQL Server 5.7或之前版本預設的Charset和Collation是: ```latin1``` 和 ```latin1_swedish_ci``` - [8.x doc](https://dev.mysql.com/doc/refman/8.0/en/charset.html) MySQL Server 8.x(當前版本)預設的Charset和Collation是: ```utf8mb4``` 和 ```utf8mb4_0900_ai_ci``` ### 修改MySQL Server預設的Charset和Collation 修改預設值,需要重新編譯原始碼。 ```bash cmake . -DDEFAULT_CHARSET=latin1 ``` 或者 ```bash cmake . -DDEFAULT_CHARSET=latin1 \ -DDEFAULT_COLLATION=latin1_german1_ci ``` ### 指定MySQL Server的Charset和Collation 可以在啟動MySQL Server的時候指定Server的Charset和Collation,下面三種方法是等效的: ```bash mysqld # 預設的charset是utf8mb4, 而且 utf8mb4 預設的collation是 utf8mb4_0900_ai_ci ``` 或 ```bash mysqld --character-set-server=utf8mb4 ``` 或 ```bash mysqld --character-set-server=utf8mb4 \ --collation-server=utf8mb4_0900_ai_ci ``` # 資料庫(Database)的Charset和Collation 建立資料庫的時候,如果沒有指定```character set``` 和 ```collation```,會自動用MySQL Server的 ```character set```和```collation```。 ### 檢視資料的Character Set和Collation ```sql USE db_name; SELECT @@character_set_database, @@collation_database; ``` 不想改變當前資料庫的話,可以使用語句: ```sql SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'db_name'; ``` ### 指定或修改資料庫(Database)的Character Set和Collation 建立db時指定: ```sql CREATE DATABASE db_name CHARACTER SET latin1 COLLATE latin1_swedish_ci; ``` 修改: ```sql ALTER DATABASE db_name CHARACTER SET latin1 COLLATE latin1_swedish_ci; ``` # 表(Table)的Charset和Collation 如果建立表的時候沒有指定**表級別**Charset和Collation,會預設使用資料庫(Database)的Charset和Collation。 ### 查看錶(Table)的Charset和Collation ```sql SELECT t.TABLE_SCHEMA, t.table_name, ccsa.* FROM information_schema.`TABLES` t, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` ccsa WHERE ccsa.collation_name = t.table_collation AND t.table_schema = "db_name" AND t.table_name = "table_name"; ``` ### 指定或修改表(Table)的Character Set和Collation ```sql CREATE TABLE tbl_name (column_list) [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]] ALTER TABLE tbl_name [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name] ``` # 列(Column)的Charset和Collation 如果建立表的時候沒有指定**列的**Charset和Collation,會預設使用表(Table)的Charset和Collation。 ### 檢視列(Column)的Charset和Collation ```sql SELECT * FROM information_schema.`COLUMNS` WHERE table_schema = "db_name" AND table_name = "table_name"; ``` ### 指定或修改列(Column)的Character Set和Collation ```sql CREATE TABLE t1 ( col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci ) CHARACTER SET latin1 COLLATE latin1_bin; ALTER TABLE t1 MODIFY col1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_swedish_ci; ``` # Connection Character Sets and Collations Client和MySQL Server互動前,先建立連線(Connection)。 Client通過建立的connection傳送SQL Statements(查詢、插入等操作)到MySQL Server;MySQL Server則通過建立的connection返回給Client相應的結果(SQL執行結果,或者錯誤資訊)。 ### Client和Server建立連線並設定character set 1. Client在建立連線的時候會指定collation (charset的預設collation); 2. MySQL Server根據collation找到對應的charset; 3. MySQL Server然後charset設定session變數: ```character_set_client```, ```character_set_results```, ```character_set_connection```,```collation_connection```被設定為指定charset的預設collation。 ### 相關session變數 - ```character_set_server``` 和 ```collation_server```: MySQL Server的Character Set 和 Collation; - ```character_set_database``` 和 ```collation_database```:資料庫的Character Set 和 Collation; - ```character_set_client ```:MySQL Server使用此session變數的charset作為client 傳送來的SQL Statements的charset; - ```character_set_connection```: Server會把client傳送的statements字符集從```character_set_client```轉為```character_set_connection```; - ```collation_connection```:這個對字串常量的比較很重要; - ```character_set_results```:Server返回給client的結果(column values, result metadata such as column names, and error messages)對應的字符集; 檢視connection相關的session變數: ```sql SHOW SESSION VARIABLES LIKE 'character\_set\_%'; SHOW SESSION VARIABLES LIKE 'collation\_%'; ``` ### 設定character set和collation ```sql SET NAMES {'charset_name' [COLLATE 'collation_name'] | DEFAULT} ``` SET NAME 會設定三個session變數(session system variables)為指定的charset和collation: - character_set_client - character_set_connection - character_set_results ```sql SET {CHARACTER SET | CHARSET} {'charset_name' | DEFAULT} ``` SET CHARACTER SET會設定 ```character_set_client``` 和 ```character_set_results```為指定的charset; 並把 ```character_set_connection``` 設定為 ```character_set_database``` 的charset。 # 參考文件 1. [What is Collation and Character Set in MySQL?](https://www.geeksforgeeks.org/what-is-collation-and-character-set-in-mysql/) 2. [Character Sets, Collations, Unicode](https://dev.mysql.com/doc/refman/8.0/en/charset.html) 3. [What does character set and collation mean exactly?](https://stackoverflow.com/questions/341273/what-does-character-set-and-collation-mean-exactly) 4. [MySQL Character Set](https://www.mysqltutorial.org/mysql-character-set/) 5. [MySQL Collation](https://www.mysqltutorial.org/mysql-collation/) 6. [Connection Character Sets and Collations](https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html) 原文:[MySQL基礎知識:Character Set和Collation](https://zhuchengliang.com/db/mysql-character-set-and-colla