關於Python Scrapy框架 yield scrapy.Request(next_url, call_back="")無法翻頁情況解決
阿新 • • 發佈:2019-02-19
錯誤的程式碼:
class XXSpider(scrapy.Spider):
name = 'xxspider'
allowed_domains = ['https://www.xx.com']
start_urls = ['https://www.xx.com/ask/highlight/']
正確的程式碼:
class XXSpider(scrapy.Spider): name = 'xxspider' allowed_domains = ['www.xx.com'] start_urls = ['https://www.xx.com/ask/highlight/']
這裡, allowed_domains中域名設定問題, Request需要的是一組域名而不是一組url
還有一情況也會導致yield scrapy.Request()失效:
系統don't_filter將該Url過濾掉了
解決方案:
yield scrapy.Request(next_url, call_back=self.parse, dont_filter=True)