1. 程式人生 > >chrome-headless使用示例(Python)——開啟百度

chrome-headless使用示例(Python)——開啟百度

谷歌支援無頭瀏覽器已經一段時間了,目前逐漸替代PhantomJS成為爬蟲程式猿的摯愛了。

以下為程式碼樣例,供猿猿們參考。

一、參考

二、環境

  • MacOS == 10.12.6 (16G29)
  • Chrome == 61.0.3163.100 (正式版本) (64 位)
  • selenium == 3.6.0
  • Python == 2.7.14

三、步驟

3.1 啟動chromedriver

$ chromedriver 
Starting ChromeDriver 2.33.506106 (8a06c39c4582fbfbab6966dbb1c38a9173bfb1a2) on port 9515
Only local connections are allowed.

3.2 程式碼

#!/usr/bin/env python
# -*- coding:utf-8 -*-

from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy, ProxyType


options = webdriver.ChromeOptions()

# tell selenium to use the dev channel version of chrome
# NOTE: only do this if you have a good reason to
# options.binary_location = '/usr/bin/google-chrome-unstable' # path to google Chrome bin options.add_argument('headless') # set the window size options.add_argument('window-size=1200x600') # with proxy proxy_url = 'ip:port' proxy = Proxy({ 'proxyType': ProxyType.MANUAL, 'httpProxy': proxy_url, 'sslProxy'
: proxy_url # 需要信任代理伺服器CA證書 }) desired_capabilities = options.to_capabilities() proxy.add_to_capabilities(desired_capabilities) # initialize the driver # driver = webdriver.Chrome(chrome_options=options) driver = webdriver.Chrome(chrome_options=options, desired_capabilities=desired_capabilities) driver.get('https://www.baidu.com') # wait up to 10 seconds for the elements to become available driver.implicitly_wait(10) driver.get_screenshot_as_file('baidu_index.png') # use css selectors to grab the search inputs text = driver.find_element_by_css_selector('#kw') search = driver.find_element_by_css_selector('#su') text.send_keys('headless chrome') driver.get_screenshot_as_file('baidu_main-page.png') # search search.click() driver.get_screenshot_as_file('search-result.png') results = driver.find_elements_by_xpath('//div[@class="result c-container "]') for result in results: res = result.find_element_by_css_selector('a') title = res.text link = res.get_attribute('href') print 'Title: %s \nLink: %s\n' % (title, link)

輸出:

Title: Headless Chrome入門 - 簡書 
Link: http://www.baidu.com/link?url=VxjEiEVtl5fZX-AhWqc-AuoRP9Xy_uXIG1cqs43UbiSacUTqH0j7lDYsnYUpOXrC

Title: 技能樹升級——Chrome Headless模式 - 全棧客棧 - SegmentFault 
Link: http://www.baidu.com/link?url=CDylpWK8vIuZ8p60MUi_3KlThi-zxPw3bSr5AGPg2QsmTfoathDvfZGnEV2IZejOjw0cF5N4o0exxX1cqf9R-q

Title: 使用Headless Chrome 進行頁面渲染 - 知乎專欄 
Link: http://www.baidu.com/link?url=IyI0z_PmzMzH6mrw0-YndTwp7WiKmhVF-_ZuXMuPnfyF2MEaBB0BCit0BXpcrfsX

Title: 初探Headless Chrome - WEB前端 - 伯樂線上 
Link: http://www.baidu.com/link?url=sw2qqcurzmwTu9n0orvk_LKIvMmiaWlCxlPtvuyOgsKzzxaV3Car6zbRRdpZumDX

Title: 初探Headless Chrome - 知乎專欄 
Link: http://www.baidu.com/link?url=6nOyOVHD5AoBjugMoJTxDXhw5EBSYpF9fQMQfbu8WgCf0E_Wbalq6Hbj-KqBGwgm

Title: 通過Headless Chrome執行Selenium指令碼 - CSDN部落格 
Link: http://www.baidu.com/link?url=WSKRO7xRvGfbRIUKKnULwE0FeYNvyjLnEtiHWj108kxsQ7MUd1zPNXLph7WSkYXkiRLh8B3DBYSW8GNdI8wGBq

Title: Web自動化之Headless Chrome開發工具庫-圖靈社群 
Link: http://www.baidu.com/link?url=jZletPMcLn7z_liopLphjzknRWshmbsrCUr0K25MY7pbk5smOObJahHbvUrHz_2qnZdEUzcEm8IK0QriythwZa

Title: 在headless模式下執行selenium - 曾經的自己 - SegmentFault 
Link: http://www.baidu.com/link?url=jbe9GNh-2nDbd1KiMkh64EwQD6JvBXdQ_ndtkl-z_Hy2mn8GGnftg2BDnMn3x1rUMwkdwkwuo7dqMZMnVAtHGq

Title: linux 安裝 Headless Chrome - bambooleaf - CSDN部落格 
Link: http://www.baidu.com/link?url=jruVom6bFUCrLluHA4aN8ITgq3HlBlR3rYNYC36VlqIBjuFRocIewfKVvw6pleX3v1l2joOaO3-f9NxrVGjUdq