Python實現Splash爬取網頁
阿新 • • 發佈:2019-01-29
先開啟splash:
sudo docker run -p 8050:8050 scrapinghub/splash
.py程式碼:
import requests
from urllib.parse import quote
from requests import ConnectionError
lua = '''
function main(splash)
splash:go("https://www.baidu.com")
input = splash:select("#kw")
input:send_text("Python")
submit = splash:select("#su")
submit:mouse_click()
splash:wait(3)
return splash:jpeg()
end
'''
#將lua指令碼轉換為url格式並與url地址拼接
url = "http://localhost:8050/execute?lua_source=" + quote(lua)
try:
#請求url
response = requests.get(url)
print(response.status_code)
#將返回的資訊寫入檔案
with open('baidu.jpg', 'wb') as f:
f.write(response.content)
except ConnectionError as e:
print(e)
其中: lua為lua語言編寫的指令碼, url中execute為splash中的API.