requests爬取貓眼電影403錯誤解決方法
阿新 • • 發佈:2019-01-10
原始碼如下:
import requests from requests.exceptions import RequestException def one_page_code(url): try: page = requests.get(url) if page.status_code == 200: return page.text print("Failed\n狀態碼為%d"%(page.status_code)) except RequestException: print("Exception") def main(): url = 'http://maoyan.com' print(one_page_code(url)) if __name__ == '__main__': main()
這個程式碼無論是請求百度、淘寶還是豆瓣都能正常的顯示出網頁原始碼,但是在爬取貓眼時卻返回403錯誤
原來請求網頁的過程中,忽略了很重要的一點,就是請求頭
我們在瀏覽器檢查元素中把network中的請求頭複製出來,新增到請求函式中
import requests from requests.exceptions import RequestException def one_page_code(url): try: header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'} page = requests.get(url,headers = header) if page.status_code == 200: return page.text print("Failed\n狀態碼為%d"%(page.status_code)) except RequestException: print("Exception") def main(): url = 'http://maoyan.com/board/4' print(one_page_code(url)) if __name__ == '__main__': main()
就可以正常獲取到網頁的原始碼了