python編碼問題——解決python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

阿新 • • 發佈：2018-11-21

從網上抓了一些位元組流，想打印出來結果發生了一下錯誤：

UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\xbb’ in position 8530: illegal multibyte sequence

程式碼

[python] view plain copy print ?

import urllib.request
res=urllib.request.urlopen(’http://www.baidu.com’)
htmlBytes=res.read()
print(htmlBytes.decode(‘utf-8’))

import urllib.request
res=urllib.request.urlopen('http://www.baidu.com')
htmlBytes=res.read()
print(htmlBytes.decode('utf-8'))

錯誤資訊讓人很困惑，為什麼用的是’utf-8’解碼，錯誤資訊卻提示’gbk’錯誤呢？

不僅如此，從百度首頁的html中發現以下程式碼：

[html] view plain copy print ?

<meta http-equiv=“content-type” content=“text/html;charset=utf-8”>

<meta http-equiv="content-type" content="text/html;charset=utf-8">

這說明網頁的確用的是utf-8，為什麼會出現Error呢？

在python3裡，有幾點關於編碼的常識

1.字元就是unicode字元，字串就是unicode字元陣列

如果用以下程式碼測試，

[python] view plain copy print ?

print(‘a’==‘\u0061’)

print('a'=='\u0061')

會發現結果為True，足以說明兩者的等價關係。

2.str轉bytes叫encode，bytes轉str叫decode，如上面的程式碼就是將抓到的位元組流給decode成unicode陣列

我根據上面的錯誤資訊分析了位元組流中出現\xbb的地方，發現有個\xc2\xbb的特殊字元»，我懷疑是它無法被解碼。

用以下程式碼測試後

[python] view plain copy print ?

print(b‘\xc2\xbb’.decode(‘utf-8’))

print(b'\xc2\xbb'.decode('utf-8'))

它果然報錯了: UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\xbb’ in position 0: illegal multibyte sequence

上網找了下utf-8編碼表，發現的確特殊字元»的utf-8形式就是c2bb,unicode是’\u00bb’，為什麼無法解碼呢。。。

仔細看看錯誤資訊，它提示’gbk’無法encode，但是我的程式碼是utf-8無法decode，壓根牛頭不對馬嘴，終於讓我懷疑是print函數出錯了。。於是立即有了以下的測試

[python] view plain copy print ?

print(‘\u00bb’)

print('\u00bb')

結果報錯了： UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\xbb’ in position 0: illegal multibyte sequence

原來是print()函式自身有限制，不能完全列印所有的unicode字元。

知道原因後，google了一下解決方法，其實print()函式的侷限就是Python預設編碼的侷限，因為系統是win7的，python的預設編碼不是’utf-8’,改一下python的預設編碼成’utf-8’就行了

[python] view plain copy print ?

import io
import sys
import urllib.request
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding=’utf8’) #改變標準輸出的預設編碼
res=urllib.request.urlopen(’http://www.baidu.com’)
htmlBytes=res.read()
print(htmlBytes.decode(‘utf-8’))

import io
import sys
import urllib.request
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') #改變標準輸出的預設編碼
res=urllib.request.urlopen('http://www.baidu.com')
htmlBytes=res.read()
print(htmlBytes.decode('utf-8'))

執行後不報錯了，但是居然有好多亂碼（英文顯示正常，中文則顯示亂碼）！！又一陣折騰後發現是控制檯的問題，具體來說就是我在cmd下執行該指令碼會有亂碼，而在IDLE下執行卻很正常。

由此我推測是cmd不能很好地相容utf8，而IDLE就可以，甚至在IDLE下執行，連“改變標準輸出的預設編碼”都不用，因為它預設就是utf8。如果一定要在cmd下執行，那就改一下編碼，比如我換成“gb18030”，就能正常顯示了：

[python] view plain copy print ?

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding=‘gb18030’) #改變標準輸出的預設編碼

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')         #改變標準輸出的預設編碼

最後，附上一些常用的和中文有關的編碼的名稱，分別賦值給encoding，就可以看到不同的效果了：

編碼名稱	用途
utf8	所有語言
gbk	簡體中文
gb2312	簡體中文
gb18030	簡體中文
big5	繁體中文
big5hkscs	繁體中文

                </div>

從網上抓了一些位元組流，想打印出來結果發生了一下錯誤：

UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\xbb’ in position 8530: illegal multibyte sequence

程式碼

[python] view plain copy print ?

import urllib.request
res=urllib.request.urlopen(’http://www.baidu.com’)
htmlBytes=res.read()
print(htmlBytes.decode(‘utf-8’))

import urllib.request
res=urllib.request.urlopen('http://www.baidu.com')
htmlBytes=res.read()
print(htmlBytes.decode('utf-8'))

錯誤資訊讓人很困惑，為什麼用的是’utf-8’解碼，錯誤資訊卻提示’gbk’錯誤呢？

不僅如此，從百度首頁的html中發現以下程式碼：

[html] view plain copy print ?

<meta http-equiv=“content-type” content=“text/html;charset=utf-8”>

<meta http-equiv="content-type" content="text/html;charset=utf-8">

這說明網頁的確用的是utf-8，為什麼會出現Error呢？

在python3裡，有幾點關於編碼的常識

1.字元就是unicode字元，字串就是unicode字元陣列

如果用以下程式碼測試，

[python] view plain copy print ?

print(‘a’==‘\u0061’)

print('a'=='\u0061')

會發現結果為True，足以說明兩者的等價關係。

2.str轉bytes叫encode，bytes轉str叫decode，如上面的程式碼就是將抓到的位元組流給decode成unicode陣列

我根據上面的錯誤資訊分析了位元組流中出現\xbb的地方，發現有個\xc2\xbb的特殊字元»，我懷疑是它無法被解碼。

用以下程式碼測試後

[python] view plain copy print ?

print(b‘\xc2\xbb’.decode(‘utf-8’))

print(b'\xc2\xbb'.decode('utf-8'))

它果然報錯了: UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\xbb’ in position 0: illegal multibyte sequence

上網找了下utf-8編碼表，發現的確特殊字元»的utf-8形式就是c2bb,unicode是’\u00bb’，為什麼無法解碼呢。。。

[python] view plain copy print ?

print(‘\u00bb’)

print('\u00bb')

結果報錯了： UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\xbb’ in position 0: illegal multibyte sequence

原來是print()函式自身有限制，不能完全列印所有的unicode字元。

[python] view plain copy print ?

import io
import sys
import urllib.request
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding=’utf8’) #改變標準輸出的預設編碼
res=urllib.request.urlopen(’http://www.baidu.com’)
htmlBytes=res.read()
print(htmlBytes.decode(‘utf-8’))

import io
import sys
import urllib.request
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') #改變標準輸出的預設編碼
res=urllib.request.urlopen('http://www.baidu.com')
htmlBytes=res.read()
print(htmlBytes.decode('utf-8'))

[python] view plain copy print ?

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding=‘gb18030’) #改變標準輸出的預設編碼

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')         #改變標準輸出的預設編碼

最後，附上一些常用的和中文有關的編碼的名稱，分別賦值給encoding，就可以看到不同的效果了：

編碼名稱	用途
utf8	所有語言
gbk	簡體中文
gb2312	簡體中文
gb18030	簡體中文
big5	繁體中文
big5hkscs	繁體中文

                </div>

python編碼問題——解決python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

從網上抓了一些位元組流，想打印出來結果發生了一下錯誤： UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\xbb’ in position 8530: illegal multibyte sequenc

[python]UnicodeEncodeError: 'gbk' codec can't encode character 解決方法

通過 gbk error har 代碼獲取網頁 out can 解析在windows下面編寫python腳本，編碼問題很嚴重。將網絡數據流寫入文件時時，我們會遇到幾個編碼： 1： #encoding=‘XXX‘ 這裏(也就是python文件第一行的內容)的編碼是指該p

Python用Scrapy爬蟲報錯UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' ，解決方案

錯誤：UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 7: illegal multibyte sequence 解決：import io import sys sys.st

[Python除錯] 'gbk' codec can't encode character xxx in position的錯誤解決&編碼與解碼的思考探究

錯誤出現使用request模組爬取網頁,將頁面原始檔res.text儲存到檔案get.html時, import request res = requests.get('http://weibo.com') with open(r'd:\get.html', 'w') as f

解決python3問題UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

從網上抓了一些位元組流，想打印出來結果發生了一下錯誤： UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 8530: illegal multibyte sequence 程式碼

UnicodeEncodeError: 'gbk' codec can't encode character 'xa0' in position 9865: illegal multibyte sequence 解決辦法

ack mov rep pos encode string flow char ng- 解決辦法 ; a=r.text.replace(‘\xa0‘,‘ ‘)詳情參見https://stackoverflow.com/questions/10993612/python-r

UnicodeEncodeError: 'gbk' codec can't encode character '\xc4' 這類問題之解決

問題描述在看書籍《Mark Lutz. Learning Python. 5ed. O’Reilly 2013》的106頁的一個例子時，如下： >>> print('sp\xc4m') 在我的win7電腦的cmd console中以互動式的方式執行，不僅得

Python3 編碼問題：UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f621' in posit

錯誤描述及分析最近跑程式遇到一個很神奇的問題，程式在輸出的時候，前面都是正常的，但是中間同樣的code在執行的時候卻報錯： UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘\U0001f621’ in posit

python程式設計中中文輸出亂碼UnicodeEncodeError: 'ascii' codec can't encode character解決方案

問題是這樣的我用的jupyter，下圖是我的原始碼我知道由於未把ASCII轉為utf8，但是我按照網上的程式碼修改後直接沒有output了我加上 import sys reload(sys) sys.setdefaultencoding('utf-8')

解決在pycharm中出現UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position XXX

轉載：https://www.cnblogs.com/themost/p/6603409.html 使用Python寫檔案的時候，或者將網路資料流寫入到本地檔案的時候，大部分情況下會遇到：UnicodeEncodeError: 'gbk' codec can't encode character

python 寫入日誌的問題 UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 0: illegal multibyte sequence UnicodeEnco

最近，使用python的logging模組，因為這個寫入日誌寫完後就沒有管它。在儲存日誌資訊的時候，一直提示：　　 UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 0: illegal multibyte

python編碼問題——解決python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

python編碼問題——解決python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

[python]UnicodeEncodeError: 'gbk' codec can't encode character 解決方法

Python用Scrapy爬蟲報錯UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' ，解決方案

[Python除錯] 'gbk' codec can't encode character xxx in position的錯誤解決&編碼與解碼的思考探究

解決python3問題UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

UnicodeEncodeError: 'gbk' codec can't encode character 'xa0' in position 9865: illegal multibyte sequence 解決辦法

UnicodeEncodeError: 'gbk' codec can't encode character '\xc4' 這類問題之解決

Python3 編碼問題：UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f621' in posit

python程式設計中中文輸出亂碼UnicodeEncodeError: 'ascii' codec can't encode character解決方案

解決在pycharm中出現UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position XXX

python 寫入日誌的問題 UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 0: illegal multibyte sequence UnicodeEnco

解決python2.7 執行報 UnicodeEncodeError: 'gbk' codec can't encode character u'\xa9' in position 0: illega

Python3 UnicodeEncodeError: 'gbk' codec can't encode character '\U0001f495' in position 16: illegal

Python 爬起數據時 'gbk' codec can't encode character 'xa0' 的問題

'gbk' codec can't encode character 'xa5' in position 4546: illegal multibyte sequence錯誤解決

python報錯問題解決：'ascii' codec can't encode character

UnicodeEncodeError: 'gbk' codec can't encode character 'xa0' in position 1987: illegal multibyte sequence

【轉】python mysql數據庫 'latin-1' codec can't encode character錯誤問題解決

UnicodeEncodeError: 'gbk' codec can't encode character '\ufffd' in position 89151: illegal multibyte

python3.6 'gbk' codec can't encode character

python編碼問題——解決python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX

相關推薦