中國大學排名定向爬蟲

阿新 • • 發佈：2018-01-13

展示中國判斷點列屏幕 .cn axu ise ()

功能描述：
輸入：大學排名URL鏈接
輸出：大學排名信息的屏幕輸出（排名，大學名稱，總分）
技術路線：requests+bs4
定向爬蟲：僅對輸入URL進行爬取，不擴展爬取

程序的結構設計：
步驟1：從網絡上獲取大學排名網頁內容
getHTMLText()
步驟2：提取網頁內容中信息到合適的數據結構
fillUnivList
步驟3：利用數據結構展示並輸出結果
printUnivList()

import requests
from bs4 import BeautifulSoup
import bs4
def getHTMLText(url):
    try:
        r = requests.get(url, timeout = 30)
        r.raise_for_status()
        r.encoding  
= r.apparent_encoding
        return r.text
    except:
        return ""
def fillUnivList(uList, html):
    soup = BeautifulSoup(html, "html.parser")
    for tr in soup.find(‘tbody‘).children:  # 遍歷tbody子節點列表
        if isinstance(tr, bs4.element.Tag): # 判斷兩個類型是否相同
            tds = tr(‘td‘)  # 等價於tds = tr.find_all(‘td‘),返回一個列表 

            uList.append([tds[0].string, tds[1].string, tds[3].string])
def printUnivList(uList, num):
    tplt = "{0:^10}\t{1:{3}^8}\t{2:10}"   # 解決中文字符對齊問題
    print(tplt.format("排名", "學校名稱", "總分", chr(12288)))
    for i in range(num):
        u = uList[i]
        print(tplt.format(u[0], u[1], u[2], chr(12288)))
 
def main():
    uinfo = []
    url = ‘http://www.zuihaodaxue.cn/zuihaodaxuepaiming2016.html‘
    html = getHTMLText(url)
    fillUnivList(uinfo, html)
    printUnivList(uinfo, 10)
main()

輸出：

技術分享圖片

中國大學排名定向爬蟲

展示中國判斷點列屏幕 .cn axu ise () 功能描述：輸入：大學排名URL鏈接輸出：大學排名信息的屏幕輸出（排名，大學名稱，總分）技術路線：requests+bs4定向爬蟲：僅對輸入URL進行爬取，不擴展爬取程序的結構設計：步驟1：從網絡上獲取大學排名網頁

中國大學排名定向爬蟲

中國大學排名定向爬蟲

爬蟲：中國大學排名定向爬蟲例項

中國大學排名定向爬取

【Python爬蟲】從html裏爬取中國大學排名

中國大學排名爬蟲

python 爬蟲例項爬取中國大學排名

爬蟲入門，中國大學排名

python爬蟲學習中國大學排名顯示及儲存檔案 DAY3

python爬蟲入門---第二篇：獲取2019年中國大學排名

re-bs4 例項：中國大學排名

爬取中國大學排名

使用Python爬取中國大學排名，並格式化對其輸出內容

[筆記]python網路爬蟲：一個簡單的定向爬取大學排名資訊示例

爬取軟科中國最好大學排名

Python爬蟲例項：爬取“最好大學網”大學排名

Python網路爬蟲與資訊提取（中國大學mooc）

《python網路爬蟲——大學排名》

4爬蟲例項----大學排名

Python爬蟲——爬取中國高校排名前100名並寫入MySQL

python3.x爬蟲：爬取大學排名資料

中國大學排名定向爬蟲

相關推薦