使用BeautifulSoup讀取網頁時發生錯誤的處理方法
阿新 • • 發佈:2018-11-01
剛開始學習BeautifulSoup在讀取網頁後解析網頁內容時發生錯誤,先上一段執行程式碼:
#!/usr/bin/python
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
from urllib2 import urlopen
WebSite='http://www.weather.com.cn/weather/101010100.shtml'
soup = BeautifulSoup(WebSite,"html.parser")#"html.parser",,from_encoding="utf-8"
print soup.prettify()
我是想把給定網頁的內容顯示一下,但執行程式時出現如下錯誤:
/usr/lib/python2.7/dist-packages/bs4/__init__.py:282: UserWarning: "http://www.weather.com.cn/weather/101010100.shtml" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client like requests to get the document behind the URL, and feed that document to Beautiful Soup.
' that document to Beautiful Soup.' % decoded_markup
http://www.weather.com.cn/weather/101010100.shtml
最後在stackoverflow上找到了答案,網址:https://stackoverflow.com/questions/24768858/beautifulsoup-responses-with-error
出現上述問題是因為程式中這條語句:soup = BeautifulSoup(WebSite,"html.parser")是有問題的,應該為:soup = BeautifulSoup(urlopen(WebSite),"html.parser")
正確的完整程式碼如下:
#!/usr/bin/python # -*- coding: UTF-8 -*- from bs4 import BeautifulSoup from urllib2 import urlopen WebSite='http://www.weather.com.cn/weather/101010100.shtml' soup = BeautifulSoup(urlopen(WebSite),"html.parser")#"html.parser",,from_encoding="utf-8" print soup.prettify()