筆記-scrapy-signal
筆記-scrapy-signal
1. scrapy singal
1.1. 訊號機制
scrapy的訊號機制主要由三個模組完成
signals.py 定義訊號量
signalmanager.py 管理
utils/signal.py 真正幹活的
scrapy自帶一些內建的訊號,定義在signals.py下:
engine_started = object()
engine_stopped = object()
spider_opened = object()
spider_idle = object()
spider_closed = object()
spider_error = object()
request_scheduled = object()
request_dropped = object()
response_received = object()
response_downloaded = object()
item_scraped = object()
item_dropped = object()
# for backwards compatibility
stats_spider_opened = spider_opened
stats_spider_closing = spider_closed
stats_spider_closed = spider_closed
item_passed = item_scraped
request_received = request_scheduled
scrapy定義了這些訊號,並在相關時刻觸發訊號,下面就是其中一個案例:
yield self.signals.send_catch_log_deferred(signal=signals.engine_started)
至於這些訊號的含義和觸發時刻參考文件:https://docs.scrapy.org/en/latest/topics/signals.html
1.2. scrapy訊號使用
scrapy已經定義了常用的訊號,開發人員可以在擴充套件類/spider/pipeline中對這些訊號做關聯。
下面是一個擴充套件類中使用訊號的例子:spider_open_s.py
#coding:utf-8
import logging
from scrapy import signals
logger = logging.getLogger(__name__)
class spider_open(object):
@classmethod
def from_crawler(cls, crawler):
ext = cls()
crawler.signals.connect(ext.spider_open_log, signal=signals.spider_opened)
return ext
def spider_open_log(self, spider):
logger.info('spider is opened!')
input('input a number to go on:')
非常簡單,希望在spider開啟後有一個提示或操作,那麼在擴充套件類中將spider_opened訊號與要進行的操作函式關聯起來,scrapy在初始化spider時會觸發spider_opened訊號,然後執行關聯的函式。
1.3. signal深入
scrapy的訊號處理底層使用的是dispatch模組:
from pydispatch import dispatcher
如果想要更細緻的操作訊號,scrapy也提供了介面,scrapy是通過signalmanager類操作訊號的:
classscrapy.signalmanager.SignalManager(sender=_Anonymous)
常用方法
- connect(receiver, signal, **kwargs)
Connect a receiver function to a signal.
The signal can be any object, although Scrapy comes with some predefined signals that are documented in the Signals section.
Parameters: |
receiver (callable) – the function to be connected signal (object) – the signal to connect to |
- disconnect(receiver, signal, **kwargs)
Disconnect a receiver function from a signal. This has the opposite effect of the connect()method, and the arguments are the same.
- disconnect_all(signal, **kwargs)
Disconnect all receivers from the given signal.
Parameters: |
signal (object) – the signal to disconnect from |
- send_catch_log(signal, **kwargs)
Send a signal, catch exceptions and log them.
The keyword arguments are passed to the signal handlers (connected through the connect()method).
- send_catch_log_deferred(signal, **kwargs)
Like send_catch_log() but supports returning deferreds from signal handlers.
Returns a Deferred that gets fired once all signal handlers deferreds were fired. Send a signal, catch exceptions and log them.
The keyword arguments are passed to the signal handlers (connected through the connect()method).