Python 爬蟲實例（12）—— python selenium 爬蟲

阿新 • • 發佈：2018-02-11

bsp ide doc wid environ att fig exc title


# coding:utf-8
from common.contest import *


def spider():

　　url = "http://www.salamoyua.com/es/subasta.aspx?origen=subastas&subasta=79"

 　　chromedriver = ‘C:/Users/xuchunlin/AppData/Local/Google/Chrome/Application/chromedriver.exe‘
    chome_options = webdriver.ChromeOptions()
　　　
　　#使用代理　
    # proxies = r.get(‘4‘)
    # chome_options.add_argument((‘--proxy-server=http://‘ + proxies))

    os.environ["webdriver.chrome.driver"] = chromedriver
    driver = webdriver.Chrome(chromedriver, chrome_options=chome_options)

    for i in range(1,100):

       print "正在爬取第" + str(i) + "頁的數據"

       if i ==1:
           # 請求url
           driver.get(session_url)
           result = driver.page_source
       else:
          try:
             # 將頁面滾動條拖到底部
             js = "var q=document.documentElement.scrollTop=10000"
             driver.execute_script(js)
             driver.find_element_by_id(‘ctl00_phContenidos_lbSiguiente‘).click()
                    
             # 得到爬取頁面的結果
             result = driver.page_source
             time.sleep(3)
           except:
               result = ""

      soup = BeautifulSoup(result, ‘html.parser‘)
      result_div = soup.find_all(‘figure‘, attrs={"class": "Lotes fade"})
      # print len(result_div)
      for i in result_div:

　　　　　　　　　result_replace = replace(i)
                print result_replace

                item_url = re.findall(‘<figure class="Lotes fade"><a href="(.*?)" id=‘,result_replace)[0]
                item_url = "http://www.salamoyua.com/es/" + item_url.replace(‘‘,‘‘)

                item_imgurl = re.findall(‘<img id=".*?" src="..(.*?)" style="border-width:0px‘, result_replace)[0]
                item_imgurl = "http://www.salamoyua.com" + item_imgurl.replace(‘‘, ‘‘)

                if "Remate" not in result_replace:
                    sold_price = ""
                else:
                    sold_price = re.findall(‘<p><strong>Remate:(.*?)</strong></p></figcaption>‘, result_replace)[0]
                    sold_price = sold_price.replace(‘ ‘,‘‘)

                try:

                    item_lotnum = re.findall(‘title="Lote vendido"><span id=".*?">(.*?)</span>‘, result_replace)[0]
                    item_lotnum = item_lotnum.replace(‘Lote‘,‘‘).replace(‘ ‘,‘‘)
                except:
                    item_lotnum = re.findall(‘<span id=".*?">(.*?)</span></header>‘,result_replace)[0]
                    item_lotnum = item_lotnum.replace(‘Lote‘, ‘‘).replace(‘ ‘, ‘‘)

                print item_url
                print item_lotnum
                print item_imgurl
                print sold_price

　　



   



spider()

Python 爬蟲實例（12）—— python selenium 爬蟲

bsp ide doc wid environ att fig exc title # coding:utf-8 from common.contest import * def spider():　　url = "http://www.salamoyua.

Python爬蟲實例（二）使用selenium抓取鬥魚直播平臺數據

def 獲取平臺 es2017 抓取設置 log ips driver 程序說明：抓取鬥魚直播平臺的直播房間號及其觀眾人數，最後統計出某一時刻的總直播人數和總觀眾人數。過程分析：一、進入鬥魚首頁http://www.douyu.com/directory/all 進

Python爬蟲實例（一）爬取百度貼吧帖子中的圖片

選擇圖片查看負責 targe mpat wid agent html headers 程序功能說明：爬取百度貼吧帖子中的圖片，用戶輸入貼吧名稱和要爬取的起始和終止頁數即可進行爬取。思路分析：一、指定貼吧url的獲取例如我們進入秦時明月吧，提取並分析其有效url如下

Python爬蟲實例（三）代理的使用

pen .sh strong list blank 寫入禁止 bsp open() 一些網站會有相應的反爬蟲措施，例如很多網站會檢測某一段時間某個IP的訪問次數，如果訪問頻率太快以至於看起來不像正常訪客，它可能就會會禁止這個IP的訪問。所以我們需要設置一些代理服務器，每隔

Python爬蟲實例（四）網站模擬登陸

opener 運行 webkit zh-cn head window targe Coding 破解一、獲取一個有登錄信息的Cookie模擬登陸下面以人人網為例，首先使用自己的賬號和密碼在瀏覽器登錄，然後通過抓包拿到cookie，再將cookie放到請求之中發送請求即可

Python 爬蟲實例（7）—— 爬取新浪軍事新聞

secure host agen cat hand .com cati ica sts 我們打開新浪新聞，看到頁面如下，首先去爬取一級 url，圖片中藍色圓圈部分第二zh張圖片，顯示需要分頁，

Python 爬蟲實例（10）—— 四行代碼實現刷博客園閱讀數量

體會博客 http log 實例代碼 port 代碼實現 ive 代碼很少，自己去體會 from selenium import webdrever driver = webdrever.Chrome() url = "http://www.cnblo

Python爬蟲框架Scrapy實例（二）

head sports spi 工作目錄 http 鏈接進入效果 tex 目標任務：使用Scrapy框架爬取新浪網導航頁所有大類、小類、小類裏的子鏈接、以及子鏈接頁面的新聞內容，最後保存到本地。大類小類如下圖所示：點擊國內這個小類，進入頁面後效果如下圖（部分截圖）

python學習--標準庫之os 實例（3）

import 創建 int pri format mat dir pat env #!/usr/bin/env python3 # -*- coding: utf-8 -*- #列出當前目錄下文件的大小和創建日期及文件名，相當於ls -l命令 from datetime

Python - Django - ORM 實例（二）

cut 獲得 put con 修改 thead 不存在技術分享 cts 在 app01/models.py 中添加 Book 類對象表 from django.db import models # Create your models here. #

Boost Python官方樣例（一）

library OS get hpa mkdir 成員 ubun int AR 配置環境 $ cat /etc/os-release NAME="Ubuntu" VERSION="16.04 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=deb

python爬蟲實戰（四）：selenium爬蟲抓取阿里巴巴採購批發商品

一、前言二、學習資料（感謝分享）三、開始爬取 1、先分析目標網址，為什麼選擇selenium 在搜尋中輸入女裝，用F12檢視原始碼，看看網頁顯示的內容是不是Ajax。點選Network，選擇下面的XHR，按F5重新整理頁面，下

python web py入門（12）- 實現使用者登入論壇

前面已經介紹了怎麼註冊使用者，當用戶註冊成功之後，就需要使用帳號進行登入，這個登入過程是怎麼樣實現的呢？首先是要有一個登入的連線，通過首頁點選登入，就會進入登入的URL連線：http://127.0.0.1:8080/login，當你在網頁上點選之後，就會在WEBPY的應用處

Python 影象處理 OpenCV （12）： Roberts 運算元、 Prewitt 運算元、 Sobel 運算元和 Laplacian 運算元邊緣檢測技術

![](https://cdn.geekdigging.com/opencv/opencv_header.png) 前文傳送門： [「Python 影象處理 OpenCV （1）：入門」](https://www.geekdigging.com/2020/05/17/5513454552/) [「Pyt

nodejs+mysql入門實例（增）

ava value query 連接 var ssa func add blog var userAddSql = ‘INSERT INTO userinfo(id,username,pwd) VALUES(0,?,?)‘; var userAddSql_Params

nodejs+mysql入門實例（改）

end cheng console req name create brush script *** //連接數據庫 var mysql = require(‘mysql‘); var connection = mysql.createConnection({ h

轉-Vue.js2.0從入門到放棄---入門實例（一）

命令行今天初始化手動 pre ref cnpm 簡單介紹收藏 http://blog.csdn.net/u013182762/article/details/53021374 標簽： Vue.jsVue.js 2.0Vue.js入門實例Vue.js 2.0教

JAVA基礎實例（一）

actor oid 其它返回 prime i++ con bsp factorial 1寫一個方法，用一個for循環打印九九乘法表 /** *一個for循環打印九九乘法表 */ public void nineNineMultiTable() { for

Node.js 博客實例（三）添加文件上傳功能

文件 ace direct 上傳文件 file form parser rec mark 原教程 https://github.com/nswbmw/N-blog/wiki/_pages的第三章上傳文件眼下有三種方法：使用 Express 自帶的文件上傳功能，不涉

10.model/view實例（2）

code == splay stat 根據表格例子修改顯示任務：顯示一個2x3的表格，將表格中的數據顯示如下：　　　　思考： 1.如何顯示數據和上個例子一樣。 2.但是每個單元格的數據都是有角色劃分的。 Qt::ItemDataRole 3.View從

Python 爬蟲實例（12）—— python selenium 爬蟲

相關推薦