Selenium獲取51job招聘資料

阿新 • • 發佈：2018-12-18

在這裡，Selenium結合lxml來獲取51job招聘網站西安地區自動化測試招聘的公司名稱，薪資範圍，職位要求和招聘的Title，具體實現的思路是訪問為：

www.51job.com後，在搜尋輸入框輸入搜尋的關鍵字“自動化測試”，點選搜尋按鈕
獲取第一頁（列表頁顯示的資訊包含了各個公司的招聘資訊）的原始碼，獲取原始碼後
使用lxml來對原始碼進行解析，獲取每個公司招聘詳情資訊的URL，也就是連結地址
然後點選跳轉到每個公司招聘的詳情頁面，再獲取詳情頁面的原始碼，再使用lxml進行解析，獲取到具體招聘單位的公司名稱，招聘Title，職位要求，薪資範圍

如上是實現的思路，具體見實現的程式碼。這裡Selenium版本是3.13版本，Chrome瀏覽器版本號是68，Python使用的版本是Python3.6的版本。

首先匯入需要的庫，見原始碼：

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from lxml import etree
import requests
import time as t
import re
import csv

然後定義Job類，編寫建構函式，以及開啟www.51job.com後，關鍵字搜尋，獲取到第一頁的列表原始碼並且解析，獲取每個公司招聘的詳情頁面的連結地址，見實現的原始碼：

class Job(object):
    '''selenium結合網路爬蟲獲取5job西安地區招聘自動化測試工程師的薪資和要求'''
    def __init__(self):
        self.info=[]
        self.driver=webdriver.Chrome()
        self.url='http://www.51job.com'

    def parse_list_page(self):
        self.driver.implicitly_wait(30)
        self.driver.maximize_window
        self.driver.get(self.url)
        #輸入搜尋的關鍵字
    self.driver.find_element_by_id('kwdselectid').send_keys('自動化測試工程師')
        #點選搜尋按鈕
    self.driver.find_element_by_xpath('/html/body/div[3]/div/div[1]/div/button').click()
        source=self.driver.page_source
        #對搜尋後的原始碼進行解析
    html=etree.HTML(source)
        #獲取所有招聘資訊的詳情URL
        links=html.xpath('//div[@class="dw_table"]/div[@class="el"]//span[not(@class="t2")]/a/@href')
        for link in links:
            t.sleep(3)
            #對詳情頁面的資料進行分析
         self.request_detail_page(link)

然後編寫方法request_detail_page解析每個公司招聘詳情頁面的資料，這地方同時涉及到多視窗的操作（這部分的知識不做解釋，如有疑問，可檢視本人寫的Selenium的文章），在方法requets_detail_page中，點選每個詳情的連結地址跳轉到詳情頁，見原始碼：

def request_detail_page(self,url):
    '''在列表頁點選跳轉到詳情頁面'''
    #切換視窗
  self.driver.execute_script("window.open('%s')"%url)
    #切換到詳情頁
  self.driver.switch_to_window(self.driver.window_handles[1])
    #獲取詳情頁的頁面資源
  source=self.driver.page_source
    #解析詳情頁面後關閉頁面
  self.driver.close()
    #繼續切換到列表頁
  self.driver.switch_to_window(self.driver.window_handles[0])
    self.parse_detail_page(source)

然後是方法parse_detail_page，該方法是指到詳情頁面後，獲取詳情的原始碼，對原始碼進行解析並且獲取到招聘單位的公司名稱，招聘Title，職位要求，薪資範圍，並且把這些資料放到一個字典中，見parse_detail_page方法的原始碼：

def parse_detail_page(self,source):
    '''對招聘詳情頁的資料進行解析'''
    #對詳情頁的資料進行分析
  html=etree.HTML(source)
    #獲取公司基本資訊
  infos=html.xpath('//div[@class="cn"]')
    for info in infos:
        #獲取公司名稱
    companyName=info.xpath('//p[@class="cname"]/a/@title')[0].strip()
        #獲取招聘title
        title=info.xpath('./h1/text()')[0].strip()
        #獲取招聘薪資
   salary=info.xpath('./strong/text()')[0]
        #獲取職位資訊
    position=html.xpath('//div[@class="bmsg job_msg inbox"]/p/text()')
        position=''.join(position).strip()
    jobInfo={
        '公司名稱':companyName,
        '招聘職位':title,
        '薪資範圍':salary,
        '職位資訊':position
    }
    print(jobInfo)

最後寫一個方法run，在該方法裡面呼叫方法parse_list_page的方法就可以了，見完整的原始碼：

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from lxml import etree
import requests
import time as t
import re
import csv

class Job(object):
    '''selenium結合網路爬蟲獲取5job西安地區招聘自動化測試工程師的薪資和要求'''
    def __init__(self):
        self.info=[]
        self.driver=webdriver.Chrome()
        self.url='http://www.51job.com'


    def run(self):
        self.parse_list_page()
        self.driver.quit()

    def parse_list_page(self):
        self.driver.implicitly_wait(30)
        self.driver.maximize_window
        self.driver.get(self.url)
        #輸入搜尋的關鍵字
    self.driver.find_element_by_id('kwdselectid').send_keys('自動化測試工程師')
        #點選搜尋按鈕
    self.driver.find_element_by_xpath('/html/body/div[3]/div/div[1]/div/button').click()
        source=self.driver.page_source
        #對搜尋後的原始碼進行解析
    html=etree.HTML(source)
        #獲取所有招聘資訊的詳情URL
        links=html.xpath('//div[@class="dw_table"]/div[@class="el"]//span[not(@class="t2")]/a/@href')
        for link in links:
            t.sleep(3)
            #對詳情頁面的資料進行分析
     self.request_detail_page(link)

    def request_detail_page(self,url):
        '''在列表頁點選跳轉到詳情頁面'''
        #切換視窗
    self.driver.execute_script("window.open('%s')"%url)
        #切換到詳情頁
    self.driver.switch_to_window(self.driver.window_handles[1])
        #獲取詳情頁的頁面資源
    source=self.driver.page_source
        #解析詳情頁面後關閉頁面
    self.driver.close()
        #繼續切換到列表頁
    self.driver.switch_to_window(self.driver.window_handles[0])
        self.parse_detail_page(source)

    def parse_detail_page(self,source):
        '''對招聘詳情頁的資料進行解析'''
        #對詳情頁的資料進行分析
    html=etree.HTML(source)
        #獲取公司基本資訊
    infos=html.xpath('//div[@class="cn"]')
        for info in infos:
            #獲取公司名稱
     companyName=info.xpath('//p[@class="cname"]/a/@title')[0].strip()
            #獲取招聘title
            title=info.xpath('./h1/text()')[0].strip()
            #獲取招聘薪資
     salary=info.xpath('./strong/text()')[0]
        #獲取職位資訊
    position=html.xpath('//div[@class="bmsg job_msg inbox"]/p/text()')
        position=''.join(position).strip()
        jobInfo={
            '公司名稱':companyName,
            '招聘職位':title,
            '薪資範圍':salary,
            '職位資訊':position
        }
        print(jobInfo)
if __name__ == '__main__':
    job=Job()
    job.run()

執行如上的程式碼後，就會獲取到最初設計的資料，這裡對這些資料就不顯示了，實在是資料太多。後期可以對薪資範圍這部分進行資料分析。

Selenium獲取51job招聘資料

Selenium獲取51job招聘資料

selenium獲取動態網站資料

python下利用Selenium獲取動態頁面資料

python selenium 獲取動態網頁資料

python獲取無憂網的招聘資料

【ML專案】基於網路爬蟲和資料探勘演算法的web招聘資料分析（一）——資料獲取與處理

selenium獲取元素信息方法(轉載)

Selenium 獲取動態js的網頁

Python Selenium 獲取不到彈出框的另外一種思路

用selenium獲取cookies

selenium 獲取屬性方法

Selenium獲取頁面指定元素個數

Oracle中獲取Date型別資料，沒有時分秒

小程式學習之旅----表單元件 button checkbox form input label radio slider switch textarea 獲取表單資料

Struts2框架自學之路——Action獲取表單資料的方式以及表單資料的封裝

利用selenium獲取cookies後，使用requests登陸獲取頁面

mybatis 獲取最新插入資料的id

IOS獲取伺服器JSON資料並動態顯示到UITableView列表

Struts2_day02---結果頁面配置、獲取表單資料、封裝資料到集合

獲取股票歷史資料[日線][Python][Tushare]

Selenium獲取51job招聘資料

相關推薦