python簡單爬蟲程式碼，python入門

阿新 • • 發佈：2019-02-08

python爬取慕課網首頁課程標題與內容介紹

效果圖：
這裡寫圖片描述
思路：
獲取頁面內容存入html –>
利用正則表示式獲取所有課程塊的div盒子存入everydiv –>
在每個課程塊中抓取標題與介紹存入列表classinfo –>
將列表存入info.txt檔案中 –>
最後檢查抓取到的內容

知識點：
1. re 模組（Regular Expression 正則表示式）提供各種正則表示式的匹配操作，適合文字解析、複雜字串分析和資訊提取時使用
2. Requests ，基於 urllib，但比 urllib 更加方便。自動的把返回資訊有Unicode解碼，且自動儲存返回內容，所以你可以讀取多次
3. sys模組包括了一組非常實用的服務，內含很多函式方法和變數，用來處理Python執行時配置以及資源，從而可以與前當程式之外的系統環境互動

python原始碼即粘即用

#-*_coding:utf8-*-
import requests
import re
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

class func(object):
    def __init__(self):
        print u'開始爬取內容。。。'

#getsource獲取網頁原始碼
    def getsource(self,url):
        html = requests.get(url)
        #print str(html.text)   可以在此列印，來檢查是否抓到內容 

        return html.text

#geteverydiv抓取每個課程塊的資訊
    def geteverydiv(self,source):
        everydiv = re.findall('(<div class="moco-course-wrap".*?</div>)',source,re.S)
        return everydiv

#getinfo從每個課程塊中提取出課程標題和內容描述
    def getinfo(self,eachclass):
        info = {}
        info['title' 
] = re.search('<h3>(.*?)</h3>',eachclass,re.S).group(1)
        info['content'] = re.search('<p>(.*?)</p>',eachclass,re.S).group(1)
        #print info  可以在此列印，來檢查是否抓到內容
        return info

#saveinfo用來儲存結果到info.txt檔案中
    def saveinfo(self,classinfo):
        f = open('info.txt','a')
        for each in classinfo:
            f.writelines('title:' + each['title'] + '\n')
            f.writelines('content:' + each['content'] + '\n\n')
        f.close()
        print "write file finished"

#主函式
if __name__ == '__main__':
    classinfo = []
    url = 'http://www.imooc.com/'
    testspider = func()
    print u'正在處理頁面：' + url
    html = testspider.getsource(url)
    everydiv = testspider.geteverydiv(html)
    for each in everydiv:
        info = testspider.getinfo(each)
        classinfo.append(info)
    testspider.saveinfo(classinfo)

如果您有什麼意見或建議，歡迎留言……^.^

python簡單爬蟲程式碼，python入門

python爬取慕課網首頁課程標題與內容介紹

python簡單爬蟲程式碼，python入門

python:簡單爬蟲示例，含分析文件，建庫，程式程式碼

python簡單爬蟲程式碼示例2

python搭建簡單爬蟲框架，爬取獵聘網的招聘職位資訊

一個簡單的Python日誌程式程式碼，支援按天滾動，限制備份保留個數

#python python簡單爬蟲示例——爬取自己的所有部落格，並將所有的部落格匯出到一個網頁

【一】，python簡單爬蟲實現

python 簡單爬蟲

python簡單爬蟲筆記

python簡單爬蟲

起薪2萬的爬蟲工程師，Python需要學到什麼程度才可以就業？

3行程式碼，Python資料預處理提速6倍！

FPGA 之 python 編寫FPGA程式碼，pyverilog寫verilog的程式碼

史上最詳細的爬蟲教程，Python採集全網最受歡迎的 500 本書！

只需一行程式碼，python實現docx文件轉html頁面！

python的mock功能，感覺入門了~~~

python的簡潔程式碼，實現炫酷的座標圖！

一張圖讓你掌握Python所有基礎知識，Python入門一張圖足矣！

7個Python實戰專案程式碼，讓你分分鐘晉級大神！

python 實戰爬蟲專案，學會這個32個專案天下無敵 python 爬蟲------32個專案（學會了你就牛了哈哈）

python簡單爬蟲程式碼，python入門

python爬取慕課網首頁課程標題與內容介紹

相關推薦