1. 程式人生 > >爬取校園新聞首頁的新聞

爬取校園新聞首頁的新聞

att text mage port htm pos sele time 爬取

import requests
from bs4 import BeautifulSoup

url = http://news.gzcc.cn/html/xiaoyuanxinwen/
res = requests.get(url)
res.encoding = utf-8
soup = BeautifulSoup(res.text, html.parser)
for news in soup.select(li):
    if len(news.select(.news-list-title)) > 0:
        title = news.select(.news-list-title
)[0].text time = news.select(.news-list-info)[0].contents[0].text a = news.select(a)[0].attrs[href] print(a,title,time) break res1 = requests.get(a) res1.encoding = utf-8 soup1 = BeautifulSoup(res1.text, html.parser) sp1 = soup1.select(#content)[0].text info = soup1.select(
.show-info)[0].text print(info) dt = info.lstrip(發布時間:)[1:20] print(dt) ly = info.find(來源:) if ly>0: s = info[info.find(來源:):].split()[0].lstrip(來源:) print(s) ly = info.find(攝影:) if ly>0: s = info[info.find(攝影:):].split()[0].lstrip(攝影:) print(s) from datetime import datetime str
= dt da = datetime.strptime(str,%Y-%m-%d %H:%M:%S) now = datetime.now() type(now) print(now.strftime(%Y-%m-%d %H:%M:%S))

技術分享圖片

爬取校園新聞首頁的新聞