1. 程式人生 > >用HTMLParser解析html時報錯:No module named ‘htmlentitydefs‘

用HTMLParser解析html時報錯:No module named ‘htmlentitydefs‘

ror python links href and htm ref over request

python3.6用HTMLParser解析html時報錯
No module named ‘htmlentitydefs‘或No module named ‘markupbase‘

先上代碼

from HTMLParser import HTMLParser
import urllib.request

class myhtml(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.flag = 0
        self.links = []

    def handle_starttag(self,tag,attrs):
        if
tag == "a": for name,value in attrs: if name == "href": self.links.append(name) if __name__ == "__main__": parser = myhtml() myurl = "https://www.cnblogs.com/pinpin" html = urllib.request.urlopen(myurl) html_connect =html.read() html_connect = bytes.decode
(html_connect) parser.feed(html_connect) print(parser.links)

錯誤如下:

TypeError: No module named ‘htmlentitydefs‘

簡單來說 就是一個導包錯誤,沒有就下載導入一個唄~~~,但是這個庫安裝不了,所以繼續找了

百度結論:‘htmlentitydefs‘應該是在python3以後棄用了

那怎麽辦,最後通過努力,找到了個很簡單的方法

靈感來自:

http://stackoverflow.max-everyday.com/2018/06/python3-importerror-no-module-named-htmlparser/

from HTMLParser import HTMLParser #python2可這麽寫

from html.parser import HTMLParser #python3建議都這麽寫後,問題解決了

用HTMLParser解析html時報錯:No module named ‘htmlentitydefs‘