1. 程式人生 > >Python實現Splash爬取網頁

Python實現Splash爬取網頁

先開啟splash:

sudo docker run -p 8050:8050 scrapinghub/splash

.py程式碼:

import requests
from urllib.parse import quote
from requests import ConnectionError
lua = '''
function main(splash)
    splash:go("https://www.baidu.com")
    input = splash:select("#kw")
    input:send_text("Python")
    submit = splash:select("#su")
    submit:mouse_click()
    splash:wait(3)
    return splash:jpeg()
end
'''
#將lua指令碼轉換為url格式並與url地址拼接 url = "http://localhost:8050/execute?lua_source=" + quote(lua) try: #請求url response = requests.get(url) print(response.status_code) #將返回的資訊寫入檔案 with open('baidu.jpg', 'wb') as f: f.write(response.content) except ConnectionError as e: print(e)

其中: lua為lua語言編寫的指令碼, url中execute為splash中的API.

結果:

這裡寫圖片描述