1. 程式人生 > >scrapy 自定義擴充套件

scrapy 自定義擴充套件

1、新建一個擴充套件檔案,定義一個類,必須包含from_crawler方法:

from scrapy import signals


class MyExtend:

    def __init__(self, crawler):
        self.crawler = crawler
        # 給鉤子掛操作
        crawler.signals.connect(self.start, signals.engine_started)

    @classmethod
    def from_crawler(cls, crawler):
        
return cls(crawler) def start(self): # 自定義操作 print('signals.engine_started')

2、設定settings

EXTENSIONS = {
    'day96.extensions.MyExtend': 300,
}

 

3、可以掛鉤子的地方

# 引擎開始執行的時候
engine_started = object()
# 引擎結束執行的時候
engine_stopped = object()

spider_opened = object()
spider_idle 
= object() spider_closed = object() spider_error = object() request_scheduled = object() request_dropped = object() response_received = object() response_downloaded = object() # yield Item的時候 item_scraped = object() # Item丟棄的時候 item_dropped = object()