python unicode 轉碼問題詳解

阿新 • • 發佈：2018-12-21

一、unicode與普通string字串相互轉換

字串在Python內部的表示是unicode編碼，因此，在做編碼轉換時，通常需要以unicode作為中間編碼，即先將其他編碼的字串解碼（decode）成unicode，再從unicode編碼（encode）成另一種編碼。

unicodestring = u"Hello world"

“decode” 解碼

將普通Python字串轉化為Unicode

str->unicode: unicode(b, "utf-8") 或 b.decode("utf-8")

plainstring1 = unicode(utf8string, "utf-8") 
plainstring2 = unicode(asciistring, "ascii")

“encode” 編碼

將Unicode轉化為普通Python字串

unicode->str :a.encode("utf-8")

utf8string = unicodestring.encode("utf-8")  
asciistring = unicodestring.encode("ascii")

因此，轉碼的時候一定要先搞明白，字串str是什麼編碼，然後decode成unicode，然後再encode成其他編碼

程式碼中字串的預設編碼與程式碼檔案本身的編碼一致。

以“utf-8”為例

unicode->str :a.encode("utf-8")

str->unicode: unicode(b, "utf-8") 或 b.decode("utf-8")

程式碼如下：

a = u'你好'
b = a.encode("utf-8")
c = unicode(b, "utf-8")
print a, type(a)
print b, type(b)
print c, type(c)
print c + u"hahaha", type(c + u"hahaha")

輸出：
你好 <type ‘unicode’>
你好 <type ‘str’>
你好 <type ‘unicode’>
你好hahaha <type ‘unicode’>

s = u'\u4eba\u751f\u82e6\u77ed\uff0cpy\u662f\u5cb8'
print s

輸出：人生苦短，py是岸

二、判斷字串編碼

isinstance(s, unicode) #用來判斷是否為unicode

用非unicode編碼形式的str來encode會報錯

三、獲得系統預設編碼

#!/usr/bin/env python
#coding=utf-8
import sys
print sys.getdefaultencoding()

該段程式在英文WindowsXP上輸出為：ascii

四、控制檯亂碼問題

在某些IDE中，字串的輸出總是出現亂碼，甚至錯誤，其實是由於IDE的結果輸出控制檯自身不能顯示字串的編碼，而不是程式本身的問題。

如在UliPad中執行如下程式碼：

s=u"中文"
print s

會提示：UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 0-1: ordinal not in range(128)。這是因為UliPad在英文WindowsXP上的控制檯資訊輸出視窗是按照ascii編碼輸出的（英文系統的預設編碼是 ascii），而上面程式碼中的字串是Unicode編碼的，所以輸出時產生了錯誤。

將最後一句改為：print s.encode('gb2312'),則能正確輸出“中文”兩個字。若最後一句改為：print s.encode('utf8'),則輸出：\xe4\xb8\xad\xe6\x96\x87，這是控制檯資訊輸出視窗按照ascii編碼輸出utf8編碼的字串的結果。

unicode(str,‘gb2312’)與str.decode(‘gb2312’)是一樣的，都是將gb2312編碼的str轉為unicode編碼

使用str.__class__可以檢視str的編碼形式

python unicode 轉碼問題詳解

一、unicode與普通string字串相互轉換

“decode” 解碼

“encode” 編碼

以“utf-8”為例

二、判斷字串編碼

三、獲得系統預設編碼

四、控制檯亂碼問題

python unicode 轉碼問題詳解

SSD（single shot multibox detector）算法及Caffe代碼詳解[轉]

selenium+python自動化93-鼠標事件(ActionChains)源碼詳解

selenium+python自動化94-行為事件(ActionChains)源碼詳解

雲計算Python自動化：運算符代碼詳解

機器學習_決策樹Python代碼詳解

Python轉碼&解壓&多程序

Java位元組碼詳解（三）位元組碼指令（轉）

分針網—IT教育：nodeJS新建一個項目及代碼詳解

Python init.py 作用詳解

HTTP狀態碼詳解

四、python之函數詳解

[轉載]Python logging模塊詳解

Java concurrent AQS 源碼詳解

html5中制作loading圖標和圖片預覽代碼詳解

Python itertools模塊詳解

(轉)Java 詳解 JVM 工作原理和流程

Python匿名函數詳解

AJAX 狀態值與狀態碼詳解

dcc源代碼詳解(2)

python unicode 轉碼問題詳解

一、unicode與普通string字串相互轉換

“decode” 解碼

“encode” 編碼

以“utf-8”為例

二、判斷字串編碼

三、獲得系統預設編碼

四、控制檯亂碼問題

相關推薦