【Python】BeautifulSoup的簡單使用
阿新 • • 發佈:2021-07-02
BeautifulSoup是解析網頁的基本庫之一。簡單用法如下:
這裡選取class為“nav-list site-nav fl”的div標籤,如上圖所示
from bs4 import BeautifulSoup div = '<div class="nav-list site-nav fl"><ul><li class="site"><a class="pin-logo" href="//www.qidian.com" data-eid="qd_A43"><span class="third-remove"></span></a><div class="dropdown third-remove"><a href="//www.qdmm.com" target="_blank" data-eid="qd_A44">起點女生網</a><a href="http://chuangshi.qq.com" target="_blank" data-eid="qd_A45">創世中文網</a><a href="http://yunqi.qq.com" target="_blank" data-eid="qd_A46">雲起書院</a></div></li><li><a href="//www.qidian.com/xuanhuan" target="_blank" data-eid="qd_A47">玄幻</a></li><li><a href="//www.qidian.com/dushi" target="_blank" data-eid="qd_A48">都市</a></li><li><a href="//www.qidian.com/xianxia" target="_blank" data-eid="qd_A49">仙俠</a></li><li><a href="//www.qidian.com/kehuan" target="_blank" data-eid="qd_A50">科幻</a></li><li><a href="//www.qidian.com/youxi" target="_blank" data-eid="qd_A56">遊戲</a></li><li><a href="//www.qidian.com/lishi" target="_blank" data-eid="qd_A52">歷史</a></li><li><a href="//www.qidian.com/rank" target="_blank" data-eid="qd_A53">排行</a></li><li class="more"><a href="javascript:" id="top-nav-more" target="_blank" data-eid="qd_A54">更多<span></span></a><div class="dropdown"><a href="//www.qidian.com/all" target="_blank" data-eid="qd_A169">全部作品</a><a href="//www.qidian.com/2cy" target="_blank" data-eid="qd_A55">輕小說</a><a href="//www.qidian.com/qihuan" target="_blank" data-eid="qd_A51">奇幻</a><a href="//www.qidian.com/wuxia" target="_blank" data-eid="qd_A57">武俠</a><a href="//www.qidian.com/lingyi" target="_blank" data-eid="qd_A58">懸疑</a><a href="//www.qidian.com/junshi" target="_blank" data-eid="qd_A59">軍事</a><a href="//www.qidian.com/xianshi" target="_blank" data-eid="qd_A60">現實</a><a href="//www.qidian.com/tiyu" target="_blank" data-eid="qd_A61">體育</a><a href="//www.qidian.com/duanpian" target="_blank" data-eid="qd_A196">短篇</a></div></li></ul></div>' soup = BeautifulSoup(div,"lxml")
print(soup.ul.find("li",{"class":"site"}))#第一個li標籤
輸出結果為:
print(soup.ul.find("li",{"class":"site"}).text) #標籤內容 #輸出結果:起點女生網創世中文網雲起書院 print(soup.ul.find("li",{"class":"site"}).a)#第一個li標籤的a標籤 #輸出結果:<a class="pin-logo" data-eid="qd_A43" href="//www.qidian.com"><span class="third-remove"></span></a>print(soup.ul.find("li",{"class":"site"}).a['href'])#a標籤的href #輸出結果://www.qidian.com print(soup.ul.find("li",{"class":"site"}).find("div",{"class":"dropdown third-remove"}).a) #輸出結果:<a data-eid="qd_A44" href="//www.qdmm.com" target="_blank">起點女生網</a> print(soup.ul.find("li",{"class":"site"}).find("div",{"class":"dropdown third-remove"}).a.text)#a標籤內容 #輸出結果:起點女生網 childs = soup.ul.children #ul的子標籤 for child in childs: print(child.text)#逐個列印其標籤內容 print(soup.ul.find_all("li"))#ul下所有li標籤,輸出為列表
輸出結果: