爬蟲入門，中國大學排名

阿新 • • 發佈：2018-12-22

此爬蟲本人初學所寫，僅能實現較少功能

ps:需要使用到beautifulsoup, requests庫

安裝方法：pip install beautifulsoup4

import requests
from bs4 import BeautifulSoup
import bs4

def getHTmLText(url):
    try:
        r = requests.get(url, timeout = 30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""

def fillUnivList(ulist, html):
    soup = BeautifulSoup(html, "html.parser")
    for tr in soup.find('tbody').children:
        if isinstance(tr, bs4.element.Tag):
            tds = tr('td')
            ulist.append([tds[0].string, tds[1].string, tds[3].string])
    pass

def printUnivList(ulist, num):
    tplt = "{0:^10}\t{1:{3}^10}\t{2:^10}"
    print(tplt.format("排名","學校名稱","總分",chr(12288)))
    for i in range(num):
        u = ulist[i]
        print(tplt.format(u[0], u[1], u[2],chr(12288)))

def main():
    uninfo =[]
    url = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2018.html'
    html = getHTmLText(url)
    fillUnivList(uninfo, html)
    printUnivList(uninfo, 100)

main()

列印時使用了chr(12288)進行了中文對齊，美觀一些

爬蟲入門，中國大學排名

此爬蟲本人初學所寫，僅能實現較少功能 ps:需要使用到beautifulsoup, requests庫安裝方法：pip install beautifulsoup4 import requests from bs4 import BeautifulSoup impor

python爬蟲入門---第二篇：獲取2019年中國大學排名

time 中國 form htm sts odin 代碼網站 stat 我們需要爬取的網站：最好大學網我們需要爬取的內容即為該網頁中的表格部分：該部分的html關鍵代碼為：其中整個表的標簽為<tbody>標簽，每行的標簽為<tr&

中國大學排名定向爬蟲

展示中國判斷點列屏幕 .cn axu ise () 功能描述：輸入：大學排名URL鏈接輸出：大學排名信息的屏幕輸出（排名，大學名稱，總分）技術路線：requests+bs4定向爬蟲：僅對輸入URL進行爬取，不擴展爬取程序的結構設計：步驟1：從網絡上獲取大學排名網頁

【Python爬蟲】從html裏爬取中國大學排名

ext 排名所有一個 requests 空格創建 .text request from bs4 import BeautifulSoupimport requestsimport bs4 #bs4.element.Tag時用的上#獲取網頁頁面HTMLdef

中國大學排名爬蟲

com 網絡 text tab 千分位 main fin fill 功能功能描述：輸入：大學排名URL鏈接：http://www.zuihaodaxue.cn/zuihaodaxuepaiming2016.html 輸出：大學排名信息的屏幕輸出（排名，大

爬蟲：中國大學排名定向爬蟲例項

例項最好大學排名http://www.zuihaodaxue.cn/zuihaodaxuepaiming2016.html 功能描述：輸入大學排名URL連結輸出：大學排名資訊的螢幕輸出（排名，大學名稱，總分）技術路線：requests-bs4 定向爬蟲：僅對輸入URL進

python 爬蟲例項爬取中國大學排名

import requests from bs4 import BeautifulSoup import bs4 def gegHTMLText(url): try: r = requests.get(url) r.raise_for_status()

python爬蟲學習中國大學排名顯示及儲存檔案 DAY3

import requests from bs4 import BeautifulSoup import bs4 def getHTMLText(url): try: r = requests.get(url, timeout=30)

使用Python爬取中國大學排名，並格式化對其輸出內容

首先，我們需要注意幾點 1.可以使用isinstance語句配合bs4庫中的bs4.element.Tag判斷獲取到的物件是不是標籤物件. 2.輸出內容並且要求他用空白補齊時，系統預設用的是英文空白

re-bs4 例項：中國大學排名

import requests from bs4 import BeautifulSoup import bs4 ''' 功能描述輸入：大學排名URL連結輸出：大學排名資訊的螢幕輸出（排名，大學名稱，總分）技術路線：requests‐bs4 定向爬蟲：僅對輸入URL進行爬取，不擴

中國大學排名定向爬取

import requests from bs4 import BeautifulSoup import bs4 def getHTMLText(url): try: r = requests.get(url, timeout=30) r.raise_for_

一片文章教你爬蟲入門，學習原來這麼簡單！

好多朋友在入門python的時候都是以爬蟲入手，而網路爬蟲是近幾年比較流行的概念，特別是在大資料分析熱門起來以後，學習網路爬蟲的人越來越多，哦對，現在叫資料探勘了！其實，一般的爬蟲具有2個功能：取資料和存資料！好像說了句廢話。。。而從這2個功能拓展，需要的知識就很

Python網路爬蟲入門，帶你領略Python爬蟲的樂趣！

前段時間小編寫了一篇有關於Python入門的文章，我覺得寫的還是不夠好，所以我特地補上一篇Python爬蟲的入門的，本文特別適合Python小白，剛學習爬蟲不久。接下來就讓我們一起來寫第一個例子吧！

爬蟲入門，從第一個爬蟲建立起做蟲師的心，爬蟲簡單的入門庫fake_useragent,偽造隨機的請求頭，簡單用法-案例篇（4）

from urllib.request import Request,urlopen from fake_useragent import UserAgent url ='https://www.sxt

爬蟲入門，從第一個爬蟲建立起做蟲師的心，爬蟲的編譯器的安裝，pycharm第三方庫的安裝和pip的安裝，爬蟲的認知篇（5）

Python之所以強大並逐漸流行起來，一部分原因要歸功於Python強大的第三方庫。這樣使用者就不用瞭解底層的思想，用最少的程式碼寫出最多的功能。在PyCharm中安裝

爬蟲入門，爬蟲簡單的入門庫Beautifulsoup庫,解析網頁，簡單用法-案例篇（5）

BeautifulSoup 庫是一個非常流行的Python的模組。通過BeautifulSoup 庫可以輕鬆的解析請求庫請求的網頁，並把網頁原始碼解析為湯文件，以便過濾提取資料

爬蟲入門，爬取酷狗歌單top500，簡單爬蟲案例

import requests from bs4 import BeautifulSoup import time headers = { 'User-Agent': 'Mozilla/5.0

爬蟲入門，模擬登入，動態token，攜帶cookie,密文加密，登入拉勾網，簡單爬蟲案例

import requests import re #匯入相應的庫檔案 headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64

Python爬蟲——爬取中國高校排名前100名並寫入MySQL

　　本篇分享講爬取中國高校排名前100名並將其寫入MySQL，這樣做的好處是：1.將資料存入資料庫，能永久利用；2.能利用資料庫技術做一些其他操作。爬取的網頁是:http://gaokao.xdf.cn/201702/10612921.html, 截圖如下（部分

爬取中國大學排名

看到結果真是傷心~~~~~。 import requests from bs4 import BeautifulSoup import bs4 def get_html(url): #獲取網頁內容 try: page=requests

爬蟲入門，中國大學排名

相關推薦