MySQL基礎知識:Character Set和Collation
阿新 • • 發佈:2021-03-11
**A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set.**
- ```Character Set```: 一套字元及其編碼,即字符集;(文中很多地方也用 ```charset``` 一詞)
- ```Collation```:在字符集內用於比較或排序字元的一套規則,即校驗規則。
作業系統環境為```MacOS Catalina```, MySQL版本為: ```8.0.13 MySQL Community Server - GPL```。
# MySQL Charset和Collation簡述
- MySQL伺服器支援多種字符集(Character Set)
- 每個字符集至少有一個Collation
- 大部分字符集都有多個Collation
- 每個字符集都有一個預設的Collation
- 兩個不同的字符集不會有相同的Collation
- MySQL可以在伺服器、資料庫、表或欄位 級別 指定使用的字符集
### Collation字尾
|Suffix|Meaning|
|----|----|
|_ai| Accent-insensitive
|_as| Accent-sensitive
|_ci| Case-insensitive
|_cs| Case-sensitive
|_ks| Kana-sensitive
|_bin| Binary
- 非binary的collation,可以不顯示指定 ```_ai``` 或 ```_as```,```_ci```會同時隱含```_ai```,```_cs```會同時隱含```_as```。如:```latin1_general_ci```和```latin1_general_ai_ci```是一樣的。
- ```utf8mb4_0900_ai_ci``` 中的 0900:Unicode的字符集的Collation需要指定UCA(Unicode Collation Algorithm)版本號, ```0900```即是這個版本號;如果沒有指定,則會使用 ```version-4.0.0 UCA```,如 ```utf8mb4_unicode_ci```。
# MySQL Server的Charset和Collation
### 檢視MySQL Server支援的Charset
有很多種方法可以檢視當前MySQL Server支援的Character Set:
```sql
show character set; -- 方法1
show charset; -- 方法2
show char set; -- 方法3
select * from information_schema.character_sets; -- 方法 4
```
**檢視特定字符集資訊**(主要包含預設的Collation和MAXLEN):
```sql
show character set like 'utf%';; -- 方法1
show charset like 'utf%';; -- 方法2
show char set like 'utf%'; -- 方法3
select * from information_schema.character_sets
where CHARACTER_SET_NAME like 'utf%'; -- 方法 4
```
### 檢視MySQL Server支援的Collation
```sql
SHOW COLLATION WHERE Charset = 'utf8mb4';
```
或者
```sql
select * from INFORMATION_SCHEMA.COLLATIONS where CHARACTER_SET_NAME='utf8mb4';
```
### 檢視MySQL Server當前的Charset和Collation
```sql
show variables like 'character_set_server';
```
```sql
show variables like 'collation_server';
```
或者使用語句:
```sql
select @@character_set_server, @@collation_server;
```
### MySQL Server預設的Charset和Collation
MySQL官方文件可檢視預設的Charset和Collation:
- [<=5.7 doc](https://dev.mysql.com/doc/refman/5.7/en/charset.html) MySQL Server 5.7或之前版本預設的Charset和Collation是: ```latin1``` 和 ```latin1_swedish_ci```
- [8.x doc](https://dev.mysql.com/doc/refman/8.0/en/charset.html) MySQL Server 8.x(當前版本)預設的Charset和Collation是: ```utf8mb4``` 和 ```utf8mb4_0900_ai_ci```
### 修改MySQL Server預設的Charset和Collation
修改預設值,需要重新編譯原始碼。
```bash
cmake . -DDEFAULT_CHARSET=latin1
```
或者
```bash
cmake . -DDEFAULT_CHARSET=latin1 \
-DDEFAULT_COLLATION=latin1_german1_ci
```
### 指定MySQL Server的Charset和Collation
可以在啟動MySQL Server的時候指定Server的Charset和Collation,下面三種方法是等效的:
```bash
mysqld # 預設的charset是utf8mb4, 而且 utf8mb4 預設的collation是 utf8mb4_0900_ai_ci
```
或
```bash
mysqld --character-set-server=utf8mb4
```
或
```bash
mysqld --character-set-server=utf8mb4 \
--collation-server=utf8mb4_0900_ai_ci
```
# 資料庫(Database)的Charset和Collation
建立資料庫的時候,如果沒有指定```character set``` 和 ```collation```,會自動用MySQL Server的 ```character set```和```collation```。
### 檢視資料的Character Set和Collation
```sql
USE db_name;
SELECT @@character_set_database, @@collation_database;
```
不想改變當前資料庫的話,可以使用語句:
```sql
SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'db_name';
```
### 指定或修改資料庫(Database)的Character Set和Collation
建立db時指定:
```sql
CREATE DATABASE db_name CHARACTER SET latin1 COLLATE latin1_swedish_ci;
```
修改:
```sql
ALTER DATABASE db_name CHARACTER SET latin1 COLLATE latin1_swedish_ci;
```
# 表(Table)的Charset和Collation
如果建立表的時候沒有指定**表級別**Charset和Collation,會預設使用資料庫(Database)的Charset和Collation。
### 查看錶(Table)的Charset和Collation
```sql
SELECT
t.TABLE_SCHEMA,
t.table_name,
ccsa.*
FROM
information_schema.`TABLES` t,
information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` ccsa
WHERE ccsa.collation_name = t.table_collation
AND t.table_schema = "db_name"
AND t.table_name = "table_name";
```
### 指定或修改表(Table)的Character Set和Collation
```sql
CREATE TABLE tbl_name (column_list)
[[DEFAULT] CHARACTER SET charset_name]
[COLLATE collation_name]]
ALTER TABLE tbl_name
[[DEFAULT] CHARACTER SET charset_name]
[COLLATE collation_name]
```
# 列(Column)的Charset和Collation
如果建立表的時候沒有指定**列的**Charset和Collation,會預設使用表(Table)的Charset和Collation。
### 檢視列(Column)的Charset和Collation
```sql
SELECT
*
FROM information_schema.`COLUMNS`
WHERE table_schema = "db_name"
AND table_name = "table_name";
```
### 指定或修改列(Column)的Character Set和Collation
```sql
CREATE TABLE t1
(
col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci
) CHARACTER SET latin1 COLLATE latin1_bin;
ALTER TABLE t1 MODIFY
col1 VARCHAR(5)
CHARACTER SET latin1
COLLATE latin1_swedish_ci;
```
# Connection Character Sets and Collations
Client和MySQL Server互動前,先建立連線(Connection)。
Client通過建立的connection傳送SQL Statements(查詢、插入等操作)到MySQL Server;MySQL Server則通過建立的connection返回給Client相應的結果(SQL執行結果,或者錯誤資訊)。
### Client和Server建立連線並設定character set
1. Client在建立連線的時候會指定collation (charset的預設collation);
2. MySQL Server根據collation找到對應的charset;
3. MySQL Server然後charset設定session變數: ```character_set_client```, ```character_set_results```, ```character_set_connection```,```collation_connection```被設定為指定charset的預設collation。
### 相關session變數
- ```character_set_server``` 和 ```collation_server```: MySQL Server的Character Set 和 Collation;
- ```character_set_database``` 和 ```collation_database```:資料庫的Character Set 和 Collation;
- ```character_set_client ```:MySQL Server使用此session變數的charset作為client 傳送來的SQL Statements的charset;
- ```character_set_connection```: Server會把client傳送的statements字符集從```character_set_client```轉為```character_set_connection```;
- ```collation_connection```:這個對字串常量的比較很重要;
- ```character_set_results```:Server返回給client的結果(column values, result metadata such as column names, and error messages)對應的字符集;
檢視connection相關的session變數:
```sql
SHOW SESSION VARIABLES LIKE 'character\_set\_%';
SHOW SESSION VARIABLES LIKE 'collation\_%';
```
### 設定character set和collation
```sql
SET NAMES {'charset_name'
[COLLATE 'collation_name'] | DEFAULT}
```
SET NAME 會設定三個session變數(session system variables)為指定的charset和collation:
- character_set_client
- character_set_connection
- character_set_results
```sql
SET {CHARACTER SET | CHARSET}
{'charset_name' | DEFAULT}
```
SET CHARACTER SET會設定 ```character_set_client``` 和 ```character_set_results```為指定的charset;
並把 ```character_set_connection``` 設定為 ```character_set_database``` 的charset。
# 參考文件
1. [What is Collation and Character Set in MySQL?](https://www.geeksforgeeks.org/what-is-collation-and-character-set-in-mysql/)
2. [Character Sets, Collations, Unicode](https://dev.mysql.com/doc/refman/8.0/en/charset.html)
3. [What does character set and collation mean exactly?](https://stackoverflow.com/questions/341273/what-does-character-set-and-collation-mean-exactly)
4. [MySQL Character Set](https://www.mysqltutorial.org/mysql-character-set/)
5. [MySQL Collation](https://www.mysqltutorial.org/mysql-collation/)
6. [Connection Character Sets and Collations](https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html)
原文:[MySQL基礎知識:Character Set和Collation](https://zhuchengliang.com/db/mysql-character-set-and-colla