python利用beautifulsoup多頁面爬蟲
阿新 • • 發佈:2019-02-09
pla .html info play 分享圖片 itl open 標簽 imp
利用了beautifulsoup進行爬蟲,解析網址分頁面爬蟲並存入文本文檔:
結果:
源碼:
from bs4 import BeautifulSoup from urllib.request import urlopen with open("熱門標題.txt","a",encoding="utf-8") as f: for i in range(2): url = "http://www.ltaaa.com/wtfy-{}".format(i)+".html" html = urlopen(url).read() soup = BeautifulSoup(html,"html.parser") titles = soup.select("div[class = ‘dtop‘ ] a") # CSS 選擇器 for title in titles: print(title.get_text(),title.get(‘href‘))# 標簽體、標簽屬性 f.write("標題:{}\n".format(title.get_text()))
python利用beautifulsoup多頁面爬蟲