爬蟲之selenium-無介面瀏覽器使用

阿新 • • 發佈：2018-12-20

selenium+phantomjs selenium是什麼？是一個瀏覽器的自動化測試工具，就是通過寫程式碼去操作瀏覽器，讓瀏覽器做一些自動化的工作 selenium如何操作谷歌瀏覽器安裝selenium，pip install selenium 步驟：selenium操作谷歌瀏覽器，其實是操作谷歌瀏覽器的驅動，由驅動再去驅動瀏覽器谷歌瀏覽器驅動下載地址 http://chromedriver.storage.googleapis.com/index.html http://npm.taobao.org/mirrors/chromedriver/ http://blog.csdn.net/huilan_same/article/details/51896672

headlesschrome phantomjs是無介面瀏覽器谷歌無介面模式 from selenium.webdriver.chrome.options import Options chrome_options = Options() chrome_options.add_argument(’–headless’) chrome_options.add_argument(’–disable-gpu’) selenium驅動火狐瀏覽器 下載火狐驅動 https://github.com/mozilla/geckodriver/releases 版本對映 https://blog.csdn.net/yinshuilan/article/details/79730239

firefox_options = webdriver.FirefoxOptions() firefox_options.set_headless() firefox_options.add_argument(’–disable-gpu’) 在這裡我爬取的是JD,程式碼如下:

from lxml import etree
from selenium import webdriver
import json
import time
from selenium.webdriver.chrome.options import Options



    # 獲取沒頁面內容
def save_content(driver, fp):

    response1 = driver.page_source
    tree1 = etree.HTML(response1)
    a_href_list = tree1.xpath('//ul[@class="gl-warp clearfix"]/li//div[@class="p-img"]/a/@href')
    chrome_options1 = Options()
    chrome_options1.add_argument('--headless')
    chrome_options1.add_argument('--disable-gpu')
    driver1 = webdriver.Chrome(executable_path=r'F:\chromedriver.exe',
                              chrome_options=chrome_options1)
    for a_href in a_href_list:
        a_href = 'http:' + a_href
        driver1.get(url=a_href)
        response2 = driver1.page_source
        tree2 = etree.HTML(response2)
        # get_attribute 獲取屬性
        # text 獲取文字
        C_name = driver1.find_element_by_xpath('.//div[@class="sku-name"]').text  # 獲取電腦基本資訊
        C_price = driver1.find_element_by_xpath('.//div[@class="dd"]/span/span[2]').text  # 獲取電腦報價
        C_style = driver1.find_element_by_xpath('.//div[@id="store-prompt"]/strong').text  # 獲取商品狀態
        C_image = driver1.find_element_by_xpath('.//div[@id="preview"]//img').get_attribute(
                    'src')  # 獲取電腦圖片
        # 商品品牌
        C_brand = tree2.xpath('//ul[@id="parameter-brand"]/li/@title')[0]
        # 商品編號
        C_styleid = tree2.xpath('//ul[@class="parameter2 p-parameter-list"]/li[2]/text()')[0].strip('商品編號：')
        # 商品產地
        C_origin = tree2.xpath('//ul[@class="parameter2 p-parameter-list"]/li[4]/text()')[0].strip('商品產地：')
        item = {
                '商品圖片': C_image,
                '商品資訊概括': C_name,
                '商品品牌': C_brand,
                '商品價格': C_price,
                '商品編號': C_styleid,
                '商品產地': C_origin,
                '商品狀態': C_style,
        }
        string = json.dumps(item, ensure_ascii=False)
        fp.write(string + '\n')
        print('正在下載%s' % C_name)
def run():
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(executable_path=r'F:\第四階段\day06\day06_pm\ziliao\chromedriver.exe',
                                   chrome_options=chrome_options)
    driver.get("https://list.jd.com/list.html?cat=670%2C671%2C672&go=0")
    time.sleep(3)  # 每次傳送完請求等待三秒，等待頁面載入完成
    # 請求首頁
    # 1.傳送首頁的請求
    # 2.獲取第一頁的資訊
    fp = open("jd.txt", "w", encoding='utf8')
    # 儲存內容
    save_content(driver,fp)
    # 3.迴圈  點選下一頁按鈕，知道下一頁對應的class名字不再是"pn-next"
    while driver.find_element_by_class_name("pn-next"):  # 判斷有沒有下一頁
        # 點選下一頁的按鈕
        driver.find_element_by_class_name("pn-next").click()  #
        # 4.繼續獲取下一頁的內容,儲存內容
        save_content(driver,fp)
    # 走到這認為沒有下一頁,關閉檔案
    fp.close()


if __name__ == "__main__":
    run()

爬蟲之selenium-無介面瀏覽器使用

selenium+phantomjs selenium是什麼？是一個瀏覽器的自動化測試工具，就是通過寫程式碼去操作瀏覽器，讓瀏覽器做一些自動化的工作 selenium如何操作谷歌瀏覽器安裝selenium，pip install selenium 步驟：sel

{轉載儲存}selenium+java使用方法及無介面瀏覽器使用

http://www.cnblogs.com/sincoolvip/p/7451652.html https://www.cnblogs.com/sincoolvip/category/1068774.html 基於python的Selenium部落格專欄

Python爬蟲之selenium的使用（八）

Python爬蟲之selenium的使用一、簡介二、安裝三、使用一、簡介 Selenium 是自動化測試工具。它支援各種瀏覽器，包括 Chrome，Safari，Firefox 等主流介面式瀏覽器，如果你在這些瀏

Python爬蟲之Selenium

目錄安裝安裝selenium pip install selenium 安裝webdriver 去 https://blog.csdn.net/huilan_same/article/details/51896672 檢視w

Python爬蟲之selenium庫使用詳解

Python爬蟲之selenium庫使用詳解什麼是Selenium selenium 是一套完整的web應用程式測試系統，包含了測試的錄製（selenium IDE）,編寫及執行（Selenium Remote Control）和測試的並行處理（Selenium Grid）。Seleni

python3 + selenium + 無界瀏覽器

網上有很多教程是關於PhantomJS的，可是，在2018.3.4日，git開源專案上，ariya宣佈暫停更新，具體時間另行通知，截止到2018.12.26日，還沒訊息。。。不過，谷歌瀏覽器支援無界模式。參考於：https://www.cnblogs.com/z-x-y/p/902

【專欄】- Python爬蟲之Selenium+Phantomjs+CasperJS

作者：楊秀璋學歷：本科-北京理工大學碩士-北京理工大學現任教於貴財財經大學資訊學院 http://www.eastmountyxz.com 簡介：自幼受貴州大山的薰陶，養成了誠實質樸的性格。經過寒窗苦讀，考入BIT，為完成自己的教師夢，放棄IT、航天等工

爬蟲之selenium模組chrome版本對映表

驅動及版本對應關係如下: 驅動下載路徑見底部: chromedriver版本　　支援的Chrome版本v2.43 　　　　　　　　　　v69-71v2.42 　　　　　　　　　　v68-70v2.41 　　　　　　　　　　v67-69v2.40 　　　　　　　　　　v66-68v2.39 　　　　　　　　

爬蟲之selenium和PhantomJS

---恢復內容開始--- selenium selenium是什麼？是Python的一個第三方庫，對外提供的介面可以操作瀏覽器，然後讓瀏覽器完成自動化的操作環境搭建 1.安裝： pip install selenium 2.獲取對應的驅動：以谷歌為例 2.1.檢視谷歌瀏覽

3、爬蟲之selenium模塊

下拉防止設置 time hone 中國搜索系列 rip selenium模塊什麽是selenium？selenium是Python的一個第三方庫，對外提供的接口可以操作瀏覽器，然後讓瀏覽器完成自動化的操作。　 selenium最初是一個自動化測試工具,而爬蟲中使用

爬蟲之selenium模塊

cto .cn nload browser 無法 filename document exe try 引入 selenium最初是一個自動化測試工具,而爬蟲中使用它主要是為了解決requests無法直接執行JavaScript代碼的問題 selenium本質是通過驅動瀏

python 爬取鬥魚 Ajax動態載入js分頁使用phontomjs無介面瀏覽器

python2.7版本 #coding:utf8 import unittest from selenium import webdriver from bs4 import BeautifulSo

Python爬蟲之谷歌瀏覽器無介面啟動

from selenium import webdriver import os url = 'http://jandan.net/ooxx' chrome_options = webdriver.

使用selenium+谷歌瀏覽器在centeos7無GUI部署爬蟲cookie更新

highlight 防止為我 tab info mys 關於配置 root 環境安裝 python3 安裝selenium　　 pip3 install selenium 安裝chrome瀏覽器+chromedriver驅動一.配置yum源 1.在目

selenium模擬滑動瀏覽器有無介面

from selenium import webdriver ‘’‘無介面瀏覽器’’’ chrome_options = webdriver.ChromeOptions() chrome_options.add_argument(’–headless’) bro

爬蟲之圖片懶加載, selenium , phantomJs, 谷歌無頭瀏覽器

ext htm ora cep eight html_ all strong 自動化一.圖片懶加載什麽是圖片懶加載？案例分析：抓取站長素材http://sc.chinaz.com/中的圖片數據 #!/usr/bin/env python # -*-

[Python爬蟲] 之十五：Selenium +phantomjs根據微信公眾號抓取微信文章

頭部 drive lac 過程標題操作函數軟件測試 init 　　借助搜索微信搜索引擎進行抓取　　抓取過程　　1、首先在搜狗的微信搜索頁面測試一下，這樣能夠讓我們的思路更加清晰　　　　　　在搜索引擎上使用微信公眾號英文名進行“搜公眾號&r

python爬蟲從入門到放棄（八）之 Selenium庫的使用

自動 .com 程序 png 都是例子等待點擊哪些一、什麽是Selenium selenium 是一套完整的web應用程序測試系統，包含了測試的錄制（selenium IDE）,編寫及運行（Selenium Remote Control）和測試的並行處理（Sele

[Python爬蟲] 之十九：Selenium +phantomjs 利用 pyquery抓取超級TV網數據

images 判斷 nco dex onf etc lac lin 利用　　一、介紹　　　　本例子用Selenium +phantomjs爬取超級TV（http://www.chaojitv.com/news/index.html）的資訊信息，輸入給定關鍵字抓取

爬蟲實例之selenium爬取淘寶美食

獲取 web tex 匹配 ive cati def presence dea 這次的實例是使用selenium爬取淘寶美食關鍵字下的商品信息，然後存儲到MongoDB。首先我們需要聲明一個browser用來操作，我的是chrome。這裏的wait是在後面的判斷元素是

爬蟲之selenium-無介面瀏覽器使用

相關推薦