1. 程式人生 > >筆記-scrapy-signal

筆記-scrapy-signal

筆記-scrapy-signal

 

1.      scrapy singal

1.1.    訊號機制

scrapy的訊號機制主要由三個模組完成

signals.py 定義訊號量

signalmanager.py 管理

utils/signal.py      真正幹活的

 

scrapy自帶一些內建的訊號,定義在signals.py下:

engine_started = object()

engine_stopped = object()

spider_opened = object()

spider_idle = object()

spider_closed = object()

spider_error = object()

request_scheduled = object()

request_dropped = object()

response_received = object()

response_downloaded = object()

item_scraped = object()

item_dropped = object()

 

# for backwards compatibility

stats_spider_opened = spider_opened

stats_spider_closing = spider_closed

stats_spider_closed = spider_closed

 

item_passed = item_scraped

 

request_received = request_scheduled

 

scrapy定義了這些訊號,並在相關時刻觸發訊號,下面就是其中一個案例:

yield self.signals.send_catch_log_deferred(signal=signals.engine_started)

至於這些訊號的含義和觸發時刻參考文件:https://docs.scrapy.org/en/latest/topics/signals.html

 

1.2.    scrapy訊號使用

scrapy已經定義了常用的訊號,開發人員可以在擴充套件類/spider/pipeline中對這些訊號做關聯。

下面是一個擴充套件類中使用訊號的例子:spider_open_s.py

#coding:utf-8

import logging

from scrapy import signals

logger = logging.getLogger(__name__)

 

class spider_open(object):

    @classmethod

    def from_crawler(cls, crawler):

        ext = cls()

        crawler.signals.connect(ext.spider_open_log, signal=signals.spider_opened)

        return ext

 

    def spider_open_log(self, spider):

        logger.info('spider is opened!')

        input('input a number to go on:')

 

非常簡單,希望在spider開啟後有一個提示或操作,那麼在擴充套件類中將spider_opened訊號與要進行的操作函式關聯起來,scrapy在初始化spider時會觸發spider_opened訊號,然後執行關聯的函式。

 

1.3.    signal深入

scrapy的訊號處理底層使用的是dispatch模組:

from pydispatch import dispatcher

如果想要更細緻的操作訊號,scrapy也提供了介面,scrapy是通過signalmanager類操作訊號的:

classscrapy.signalmanager.SignalManager(sender=_Anonymous)

常用方法

  1. connect(receiver, signal, **kwargs)

Connect a receiver function to a signal.

The signal can be any object, although Scrapy comes with some predefined signals that are documented in the Signals section.

Parameters:

receiver (callable) – the function to be connected

signal (object) – the signal to connect to

  1. disconnect(receiver, signal, **kwargs)

Disconnect a receiver function from a signal. This has the opposite effect of the connect()method, and the arguments are the same.

  1. disconnect_all(signal, **kwargs)

Disconnect all receivers from the given signal.

Parameters:

signal (object) – the signal to disconnect from

  1. send_catch_log(signal, **kwargs)

Send a signal, catch exceptions and log them.

The keyword arguments are passed to the signal handlers (connected through the connect()method).

  1. send_catch_log_deferred(signal, **kwargs)

Like send_catch_log() but supports returning deferreds from signal handlers.

Returns a Deferred that gets fired once all signal handlers deferreds were fired. Send a signal, catch exceptions and log them.

The keyword arguments are passed to the signal handlers (connected through the connect()method).