number 存儲文件 spa 當前頁列表 ima lan rfi 編碼格式

案例：使用XPath的爬蟲

現在我們用XPath來做一個簡單的爬蟲，我們嘗試爬取某個貼吧裏的所有帖子，並且將該這個帖子裏每個樓層發布的圖片下載到本地。

# tieba_xpath.py


#!/usr/bin/env python
# -*- coding:utf-8 -*-

import os
import urllib
import urllib2
from lxml import etree

class Spider:
    def __init__(self):
        self.tiebaName = raw_input("請需要訪問的貼吧：")
        self.beginPage = int(raw_input("請輸入起始頁："))
        self.endPage = int(raw_input("請輸入終止頁："))

        self.url = ‘http://tieba.baidu.com/f‘
        self.ua_header = {"User-Agent" : "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1 Trident/5.0;"}

        # 圖片編號
        self.userName = 1

    def tiebaSpider(self):
        for page in range(self.beginPage, self.endPage + 1):
            pn = (page - 1) * 50 # page number
            word = {‘pn‘ : pn, ‘kw‘: self.tiebaName}

            word = urllib.urlencode(word) #轉換成url編碼格式（字符串）
            myUrl = self.url + "?" + word

            # 示例：http://tieba.baidu.com/f? kw=%E7%BE%8E%E5%A5%B3 & pn=50
            # 調用 頁面處理函數 load_Page
            # 並且獲取頁面所有帖子鏈接,
            links = self.loadPage(myUrl)  # urllib2_test3.py

    # 讀取頁面內容
    def loadPage(self, url):
        req = urllib2.Request(url, headers = self.ua_header)
        html = urllib2.urlopen(req).read()

        # 解析html 為 HTML 文檔
        selector=etree.HTML(html)

        #抓取當前頁面的所有帖子的url的後半部分，也就是帖子編號
        # http://tieba.baidu.com/p/4884069807裏的 “p/4884069807”
        links = selector.xpath(‘//div[@class="threadlist_lz clearfix"][email protected]

python XML實例

案例：使用XPath的爬蟲

python XML實例

python 之實例屬性和類屬性

python-切片實例

Python爬蟲實例（一）爬取百度貼吧帖子中的圖片

Python爬蟲實例（二）使用selenium抓取鬥魚直播平臺數據

Python爬蟲實例（三）代理的使用

Python爬蟲實例（四）網站模擬登陸

python爬蟲實例

tomcat文件中server.xml 實例說明

Python 爬蟲實例（7）—— 爬取新浪軍事新聞

Python 爬蟲實例（10）—— 四行代碼實現刷博客園閱讀數量

Python 爬蟲實例（12）—— python selenium 爬蟲

【python】實例屬性的顯示方法-dir、dict

Python的實例定屬性和方法或類綁定方法

Python爬蟲實例動態ip+抓包+驗證碼自動識別

Python之實例對象的增刪改查

Python 練習實例1 記錄

python：實例化configparser模塊讀寫配置文件

Python爬蟲實例：爬取B站《工作細胞》短評——異步加載信息的爬取

SpringAOP的xml實例、註解形式實例、概念理解以及execution表達式實例與概念說明

python XML實例

案例：使用XPath的爬蟲

相關推薦