session和selenium+PhantomJS模擬登陸v2ex

阿新 • • 發佈：2018-12-15

session方式登入

def parse_url(self, url):
    response = requests.get(url=url, headers=self.headers)
    return response.content


start = time.time()
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
    'accept': 'image/webp,image/apng,image/*,*/*;q=0.8', 'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'zh-CN,zh;q=0.9', 'referer': 'https://www.v2ex.com/signin'}
url = r'https://www.v2ex.com/signin'

session = requests.session()
resp_content = session.get(url, headers=headers).content
resp_html = html.fromstring(resp_content)

names = resp_html.xpath('//*[@id="Main"]/div[2]/div[2]/form/table[1]//input[1]//@name')

once_url = resp_html.xpath(r'//*[@id="Main"]/div[2]/div[2]/form/table[1]//tr//@style')[0]
once_value = re.match(".*once=([0-9]*)(.*)", once_url).group(1)
verifyUrl = "https://www.v2ex.com/_captcha?once=" + once_value

resp = requests.get(verifyUrl, headers=headers, cookies=session.cookies.get_dict())
verifyBytes = resp.content

verify_start = time.time()
dama = indetifyCode(verifyBytes)
verify_end = time.time()
print("識別驗證碼耗時：", verify_end - verify_start)

encode_verify = str(dama, encoding="utf-8")
data = {
    names[0]: 'xxx',
    names[1]: 'xxx',
    names[2]: encode_verify,
    'once': once_value,
    'next': r'/'
}

login_url = 'https://www.v2ex.com/signin'
final_resp = session.post(url=login_url, data=data)
print(final_resp.status_code)

end = time.time()
print("總耗時: ", end - start)

v2ex中，由於賬號密碼欄的名稱是變化的，所以只有先載入得到html後再通過xpath進行匹配，匹配成功後獲取對應post請求時所需填充的資料，並且由於其中的驗證碼是和cookie繫結通過once引數進行獲取的，所以獲取驗證碼一定要加上cookie引數。 2. selenium

browser = webdriver.Chrome("D:/ChromeDownload/chromedriver_win32/chromedriver.exe")
url = r'https://www.v2ex.com/signin'
browser.get(url=url)

username = browser.find_element_by_xpath(r'//input[@placeholder="使用者名稱或電子郵箱地址"]')
username.send_keys('xxx')

pwd = browser.find_element_by_xpath(r'//tbody[1]/tr[2]/td[2]/input')
pwd.send_keys('xxx')

# 驗證碼
style = browser.find_element_by_xpath(r'//tbody[1]/tr[3]/td[2]/div[1]').get_attribute("style")
once = re.match(".*once=([0-9]*)\"\);(.*)", style).group(1)
verifyUrl = "https://www.v2ex.com/_captcha?once=" + once

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
    'accept': 'image/webp,image/apng,image/*,*/*;q=0.8', 'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'zh-CN,zh;q=0.9', 'referer': 'https://www.v2ex.com/signin'}
cookies = {i["name"]: i["value"] for i in browser.get_cookies()}
resp = requests.get(verifyUrl, headers=headers, cookies=cookies)
verifyBytes = resp.content
print(resp.status_code)
dama = indetifyCode(verifyBytes)
verify = browser.find_element_by_xpath(r'//input[@placeholder="請輸入上圖中的驗證碼"]')
#這裡不能直接用verify.send_keys(dama),型別不匹配會報錯
#入參要求是str，這種獲取的是byte
#報錯資訊：TypeError: sequence item 0: expected str instance, int found
verify.send_keys(str(dama, encoding="utf-8"))

browser.find_element_by_xpath(r'//input[@value="登入"]').click()

以上是完整的程式碼，註釋中記錄了部分自己遇到的坑，用的是yundama平臺的驗證碼識別，其中的api需要改一部分，官網給的打碼api是要傳入參路徑呼叫的是YDM_DecodeByByPath，而我們這裡可以直接通過請求獲取到二維碼圖片的二進位制流，無需儲存到本地，直接把其中呼叫的api改成YDM_DecodeByByBytes。通過以上程式碼測試，發現確實如果直接用selenium+PhantomJS耗時會比session耗時多很多，測試過程（出去驗證碼識別階段）session登入整體流程耗時2秒左右，而selenium則需要接近7秒。

session和selenium+PhantomJS模擬登陸v2ex

session和selenium+PhantomJS模擬登陸v2ex

9、使用selenium + phantomjs 模擬瀏覽器登錄網站

使用selenium進行模擬登陸豆瓣

我是如何實現用selenium進行模擬登陸的

Python配置Selenium+Chrome環境和Selenium+PhantomJS環境

selenium+python模擬登陸163郵箱

python全棧系列之---cookie模擬登陸和模擬session原理

關於模擬登陸的小結-抓包、cookie、session和token

爬蟲知識點（ajax非同步載入，JavaScript 動態重新整理，phantomjs + selenium模擬登陸）

編寫一個模擬註冊用戶和驗證用戶登陸的程序

scrapy使用PhantomJS和selenium爬取數據

selenium模擬登陸淘寶網並且將‘衣服’相關資訊下載儲存在mysql資料庫

selenium自動化模擬訪問、谷歌及無屏phantomJs

python3 使用selenium模擬登陸天眼查抓取資料

ThinkPHP5.0用Session和Cookie實現登陸/記住登陸/防止重複登陸/防止未登陸的非法操作

爬取攝圖網裡的音樂和視訊攝圖網模擬登陸

python下，phantomjs和selenium的安裝

python 爬蟲：利用selenium抓取cookie進行模擬登陸

使用selenium模擬登陸oschina

selenium抓取cookie進行模擬登陸

session和selenium+PhantomJS模擬登陸v2ex

相關推薦