拉勾網 selenium 模擬爬取

阿新 • • 發佈：2019-01-10

#encoding: utf-8

from selenium import webdriver
from selenium.webdriver.support.ui import Select,WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from lxml import etree
import time
import re
import csv

class LagouSpider(object 
):
    # chromedriver的絕對路徑
    driver_path ='/Users/mac126/chromedriver'
    def __init__(self):
        # 初始化一個driver，並且指定chromedriver的路徑
        self.driver = webdriver.Chrome(executable_path=self.driver_path)
        self.positions = []
        self.fp = open('lagou.csv','a',encoding='utf-8',newline='' 
)
        self.writer = csv.DictWriter(self.fp,['title','salary','city','work_years','education',"company_website",'desc','acquire','origin_url'])
        self.writer.writeheader()

    def run(self):
        #執行
        url = 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput= 
'#路徑
        self.driver.get(url)#獲取路徑
        while True:#死迴圈
            WebDriverWait(driver=self.driver,timeout=10).until(
                EC.presence_of_element_located((By.XPATH,"//span[contains(@class,'pager_next')]"))
            )
            resource = self.driver.page_source
            self.parse_list_page(resource)
            next_btn = self.driver.find_element_by_xpath("//span[contains(@class,'pager_next')]")
            if "pager_next_disabled" in next_btn.get_attribute('class'):
                break
            next_btn.click()
            time.sleep(1)

    def parse_list_page(self,resource):
        html = etree.HTML(resource)
        links = html.xpath("//a[@class='position_link']/@href")
        for link in links:
            self.parse_detail_page(link)
            time.sleep(1)

    def parse_detail_page(self,url):
     
        self.driver.execute_script("window.open('"+url+"')")
        self.driver.switch_to.window(self.driver.window_handles[1])
        WebDriverWait(self.driver,timeout=10).until(
            EC.presence_of_element_located((By.XPATH,"//dd[@class='job_bt']"))
        )
        resource = self.driver.page_source
        html = etree.HTML(resource)
        title = html.xpath("//span[@class='name']/text()")[0]
        company = html.xpath("//h2[@class='fl']/text()")[0].strip()
        job_request_span = html.xpath("//dd[@class='job_request']//span")
        salary = job_request_span[0].xpath(".//text()")[0]
        salary = salary.strip()
        city = job_request_span[1].xpath(".//text()")[0]
        city = re.sub(r"[/\s]","",city)
        work_years = job_request_span[2].xpath(".//text()")[0]
        work_years = re.sub(r"[/\s]","",work_years)
        education = job_request_span[3].xpath(".//text()")[0]
        education = re.sub(r"[/\s]","",education)
        company_website = html.xpath("//ul[@class='c_feature']/li[last()]/a/@href")[0]
        position_desc = "".join(html.xpath("//dd[@class='job_bt']/div//text()"))
        position = {
            'title': title,
            'city': city,
            'salary': salary,
            'company': company,
            'company_website': company_website,
            'education': education,
            'work_years': work_years,
            'desc': position_desc,
            'origin_url': url
        }
        self.driver.close()
        self.driver.switch_to.window(self.driver.window_handles[0])
        self.write_position(position)

    def write_position(self,position):
        if len(self.positions) >= 100:
            self.writer.writerows(self.positions)
            self.positions.clear()
        self.positions.append(position)
        print(position)

def main():
    spider = LagouSpider()
    spider.run()

if __name__ == '__main__':
    main()

拉勾網 selenium 模擬爬取

#encoding: utf-8 from selenium import webdriver from selenium.webdriver.support.ui import Select,WebDriverWait from selenium.webdriver.common.by import

pyspider爬蟲框架之拉勾網招聘資訊爬取

需求遍歷所有職位目錄點選職位分類，進入之後按照地區抓取，職位名稱，釋出時間，薪酬，工作年限要求，學歷要求，招聘公司，所屬行業，所處輪次進入職位詳情頁，抓取HR聊天意願（用時），簡歷處理，活躍時段。程式碼程式碼有詳細的註解，就不一步一步講解了，

scrapy由淺入深(三) selenium模擬爬取ajax動態頁面(智聯招聘)

爬取智聯招聘的網址：https://sou.zhaopin.com/?pageSize=60&jl=489&kw=python&kt=3 上一篇部落格爬取了前程無憂的職位招聘資訊，總體來說前程無憂的網站資訊並不難爬取，前程無憂的網站並

python爬取拉勾網之selenium

重點程式碼解釋： 1.呼叫lxml的etree實現xpath方法呼叫，xpath相對正則比較簡單，可以不在使用Beauitfulsoup定位 from lxml import etree 2.介面的可視話與否，對於你的執行資源只能用減少 opt=webdri

Python 爬蟲-模擬登入知乎-爬取拉勾網職位資訊

用Python寫爬蟲是很方便的,最近看了xlzd.me的文章，他的文章寫的很到位，提供了很好的思路。因為他的文章部分程式碼省略了。下面是基於他的文章的三個程式碼片段: 基於Python3,Python2的話需要修改下input輸入函式和print的用法。爬取豆瓣電影top250 爬取拉勾網職位資訊模擬

拉勾網爬取全國python職位並數據分析薪資，工作經驗，學歷等信息

add with color palette 谷歌瀏覽器 tor item imp 文件中首先前往拉勾網“爬蟲”職位相關頁面確定網頁的加載方式是JavaScript加載通過谷歌瀏覽器開發者工具分析和尋找網頁的真實請求，確定真實數據在position.Ajax開頭的鏈

用python爬取拉勾網招聘資訊並以CSV檔案儲存

爬取拉勾網招聘資訊 1、在網頁原始碼中搜索資訊，並沒有搜到，判斷網頁資訊使用Ajax來實現的 2、檢視網頁中所需的資料資訊，返回的是JSON資料； 3、條件為北京+資料分析師的公司一共40087家，而實際拉勾網展示的資料只有 15條/頁 * 30頁 = 450條，所以需要判斷

HttpClient爬取拉勾網招聘資訊

1.匯入jar包 <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>htt

python爬取拉勾網網際網路大資料職業情況

爬取拉勾網資訊資料處理製圖所需知識只有一點點（畢竟是個小白）： requests基礎部分 json pyecharts wordcloud 接下來開始敲程式碼了，程式碼分成了3個部分：爬取、製圖、生成詞雲爬取部分：首先要說明的是，拉勾網有反爬

Python爬蟲：爬取拉勾網資料分析崗位資料

1 JSON介紹 JSON（JavaScript Object Notation）已經成為通過HTTP請求在Web瀏覽器和其他應用程式之間傳送資料的標準格式之一。比CSV格式更加靈活。Json資料格式，非常接近於有效的Pyhton程式碼，其特點是：JSON物件所

Python爬取拉勾網招聘資訊存入資料庫

先抓包分析我們想要獲取的資料，很明顯都是動態資料，所以直接到Network下的XHR裡去找，這裡我們找到具體資料後，就要去尋分析求地址與請求資訊了。還有需要提交的表單資訊分析完畢之後，我們就可以開始寫我們的爬蟲專案了。一.編寫Itemitem編寫比較簡單# 拉鉤職位資訊 cl

Python爬取拉勾網招聘資訊

此程式碼執行建議Python3，省卻中文編碼的麻煩遇到的幾個問題：（1）拉鉤網的資料是通過js的ajax動態生成，所以不能直接爬取，而是通過post’http://www.lagou.com/jobs/positionAjax.json?needAddt

【圖文詳解】scrapy爬蟲與動態頁面——爬取拉勾網職位資訊（1）

5-14更新注意：目前拉勾網換了json結構，之前是content - result 現在改成了content- positionResult - result,所以大家寫程式碼的時候要特別注意加上

python爬取拉勾網資料儲存到mysql資料庫

環境:python3 相關包:requests , json , pymysql 思路:1.通過chrome F12找到拉鉤請求介面,分析request的各項引數 2.模擬瀏覽器請求拉鉤介面 3.預設返回的json不是標準格式 ,

Python爬取拉勾網招聘資訊並可視化分析

需求: 1:獲取指定崗位的招聘資訊 2:對公司地區,公司待遇,學歷情況,工作經驗進行簡單分析並可視化展示視覺化分析: 公司地區:柱狀圖,地圖公司待遇:雲圖公司-學歷情況:餅圖公司工作經

Python scrapy 爬取拉勾網招聘資訊

週末折騰了好久，終於成功把拉鉤網的招聘資訊爬取下來了。現在總結一下！環境： windows 8.1 + python 3.5.0 首先使用 scrapy 建立一個專案： E:\mypy> scrapy startproject lgjob 建立後目錄結構：

scrapy爬取拉勾網python職位+Mysql+視覺化

在進行爬取目標網站中為遇到一個問題，爬取5頁資料之後會出錯，設定了每一次請求的隨機超時間10-20->time.sleep(random.randint(10, 20))，同樣會被拉勾網禁止請求資料，可能被輕度判定為爬取，所以可以設定每一次的隨機超時間為20-30秒，就可以解決這個問題。

python3 利用requests爬取拉勾網資料

學習python，瞭解了一點爬蟲的知識，成功的對拉勾網的招聘資訊進行了爬取，將爬取心得記錄下來，和大家一起學習進步。準備工作： python3 requests pandas 谷歌瀏覽器(或者火狐瀏覽器、qq瀏覽器)

【爬蟲相關】爬蟲爬取拉勾網的安卓招聘資訊

我爬取了30頁拉勾上安卓的招聘資料告訴你安卓崗位究竟要一個什麼樣的人我知道沒圖你們是不會看的如圖：以上是抓取了30頁拉勾上關於招聘安卓相關的內容然後根據詞頻製作出詞雲圖出現最多的詞是開發經驗整體流程總共分為2步 1.爬蟲爬取相關的招聘資訊 2.根

Python 爬蟲入門-爬取拉勾網實戰

這幾天學習了 python 爬蟲的入門知識，也遇到很多坑，開個貼記錄一下基本原理 Python 爬蟲基本要具備以下功能：（參考此回答）向伺服器傳送請求，伺服器響應你的請求。（你可能需要了解：網頁的基本知識）從抓取到的網頁中提取出需要

拉勾網 selenium 模擬 爬取

相關推薦

拉勾網 selenium 模擬爬取