Scrapy爬蟲 -- 編寫下載中介軟體,實現隨機User-Agent
阿新 • • 發佈:2018-11-16
Scrapy爬蟲 -- 編寫下載中介軟體,實現隨機User-Agent
實現步驟:
1. 在middlewares.p中,新建一個下載中介軟體;
2. 建立process_request方法(引擎傳送request物件到下載器時的回撥函式),實現隨機User-Agent的功能;
3. 在settings.py檔案中,配置新建的下載中介軟體。
實現隨機User-Agent的中介軟體程式碼如下:
# middlewares.py import random class RandomUserAgentDownloaderMiddleware(object): """隨機user-agent--下載中介軟體""" def process_request(self, request, spider): first_num = random.randint(55, 62) third_num = random.randint(0, 3200) fourth_num = random.randint(0, 140) os_type = [ '(Windows NT 6.1; WOW64)', '(Windows NT 10.0; WOW64)', '(X11; Linux x86_64)', '(Macintosh; Intel Mac OS X 10_12_6)' ] chrome_version = 'Chrome/{}.0.{}.{}'.format(first_num, third_num, fourth_num) user_agent = ' '.join(['Mozilla/5.0', random.choice(os_type), 'AppleWebKit/537.36', '(KHTML, like Gecko)', chrome_version, 'Safari/537.36'] ) # 把每個request請求都設定為隨機user_agent request.headers['User-Agent'] = user_agent return None # 返回值為None, 表示繼續請求