1. 程式人生 > 實用技巧 >python unicode、utf-8、gbk編碼與解碼展示

python unicode、utf-8、gbk編碼與解碼展示

encode():編碼
decode():解碼
repr():返回一個可以用來表示物件的可列印的字串
[oracle@10-248-57-246 ~]$ locale
LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN.UTF-8"  #系統為utf-8編碼設定
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=
[oracle@10-248-57-246 ~]$ python
Python 2.7.10 (default, Aug 24 2020, 16:42:49) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a='中文'
>>> a
'\xe4\xb8\xad\xe6\x96\x87' #utf-8編碼
>>> import chardet
>>> print chardet.detect(a)
{'confidence': 0.7525, 'language': '', 'encoding': 'utf-8'}
>>> s=a.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> s=a.decode('utf-8')
>>> a
'\xe4\xb8\xad\xe6\x96\x87'
>>> s
u'\u4e2d\u6587' #unicode編碼
>>> repr(a)
"'\\xe4\\xb8\\xad\\xe6\\x96\\x87'"
>>> repr(s)
"u'\\u4e2d\\u6587'"
>>> k=s.encode('gbk')
>>> k
'\xd6\xd0\xce\xc4' #gbk編碼
>>> print chardet.detect(k)
{'confidence': 0.682639754276994, 'language': 'Russian', 'encoding': 'KOI8-R'} ##gbk編碼
>>