python unicode、utf-8、gbk編碼與解碼展示
阿新 • • 發佈:2021-01-08
encode():編碼 decode():解碼 repr():返回一個可以用來表示物件的可列印的字串
[oracle@10-248-57-246 ~]$ locale LANG=zh_CN.UTF-8 LC_CTYPE="zh_CN.UTF-8" #系統為utf-8編碼設定 LC_NUMERIC="zh_CN.UTF-8" LC_TIME="zh_CN.UTF-8" LC_COLLATE="zh_CN.UTF-8" LC_MONETARY="zh_CN.UTF-8" LC_MESSAGES="zh_CN.UTF-8" LC_PAPER="zh_CN.UTF-8" LC_NAME="zh_CN.UTF-8" LC_ADDRESS="zh_CN.UTF-8" LC_TELEPHONE="zh_CN.UTF-8" LC_MEASUREMENT="zh_CN.UTF-8" LC_IDENTIFICATION="zh_CN.UTF-8" LC_ALL= [oracle@10-248-57-246 ~]$ python Python 2.7.10 (default, Aug 24 2020, 16:42:49) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> a='中文' >>> a '\xe4\xb8\xad\xe6\x96\x87' #utf-8編碼 >>> import chardet >>> print chardet.detect(a) {'confidence': 0.7525, 'language': '', 'encoding': 'utf-8'} >>> s=a.encode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128) >>> s=a.decode('utf-8') >>> a '\xe4\xb8\xad\xe6\x96\x87' >>> s u'\u4e2d\u6587' #unicode編碼 >>> repr(a) "'\\xe4\\xb8\\xad\\xe6\\x96\\x87'" >>> repr(s) "u'\\u4e2d\\u6587'" >>> k=s.encode('gbk') >>> k '\xd6\xd0\xce\xc4' #gbk編碼 >>> print chardet.detect(k) {'confidence': 0.682639754276994, 'language': 'Russian', 'encoding': 'KOI8-R'} ##gbk編碼 >>