用HTMLParser解析html時報錯:No module named ‘htmlentitydefs‘
阿新 • • 發佈:2019-02-28
ror python links href and htm ref over request
python3.6用HTMLParser解析html時報錯
No module named ‘htmlentitydefs‘或No module named ‘markupbase‘
先上代碼
from HTMLParser import HTMLParser import urllib.request class myhtml(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.flag = 0 self.links = [] def handle_starttag(self,tag,attrs): iftag == "a": for name,value in attrs: if name == "href": self.links.append(name) if __name__ == "__main__": parser = myhtml() myurl = "https://www.cnblogs.com/pinpin" html = urllib.request.urlopen(myurl) html_connect =html.read() html_connect = bytes.decode(html_connect) parser.feed(html_connect) print(parser.links)
錯誤如下:
TypeError: No module named ‘htmlentitydefs‘
簡單來說 就是一個導包錯誤,沒有就下載導入一個唄~~~,但是這個庫安裝不了,所以繼續找了
百度結論:‘htmlentitydefs‘應該是在python3以後棄用了
那怎麽辦,最後通過努力,找到了個很簡單的方法
靈感來自:
http://stackoverflow.max-everyday.com/2018/06/python3-importerror-no-module-named-htmlparser/
from HTMLParser import HTMLParser #python2可這麽寫
from html.parser import HTMLParser #python3建議都這麽寫後,問題解決了
用HTMLParser解析html時報錯:No module named ‘htmlentitydefs‘