1. 程式人生 > >關於Python Scrapy框架 yield scrapy.Request(next_url, call_back="")無法翻頁情況解決

關於Python Scrapy框架 yield scrapy.Request(next_url, call_back="")無法翻頁情況解決

錯誤的程式碼:


class XXSpider(scrapy.Spider):
    name = 'xxspider'
    allowed_domains = ['https://www.xx.com']
    start_urls = ['https://www.xx.com/ask/highlight/']

 正確的程式碼:

class XXSpider(scrapy.Spider):
    name = 'xxspider'
    allowed_domains = ['www.xx.com']
    start_urls = ['https://www.xx.com/ask/highlight/']

這裡, allowed_domains中域名設定問題, Request需要的是一組域名而不是一組url

還有一情況也會導致yield scrapy.Request()失效:

    系統don't_filter將該Url過濾掉了

解決方案: 

yield scrapy.Request(next_url, call_back=self.parse, dont_filter=True)