scrapy之parallel

阿新 • • 發佈：2018-04-04

ng- userinfo next cli overload trim data ber ddc

Limiting Parallelism

: jcalderone; May 22nd, 2006

This blog has moved! Read this post and its comments at its new home. Concurrency can be a great way to speed things up, but what happens when you have too much concurrency? Overloading a system or a network can be detrimental to performance. Often there is a peak in performance at a particular level of concurrency. Executing a particular number of tasks in parallel will be easier than ever with Twisted 2.5 and Python 2.5:

from twisted.internet import defer, task

def parallel(iterable, count, callable, *args, **named):
    coop = task.Cooperator()
    work = (callable(elem, *args, **named) for elem in iterable)
    return defer.DeferredList([coop.coiterate(work) for i in xrange(count)])

Here‘s an example of using this to save the contents of a bunch of URLs which are listed one per line in a text file, downloading at most fifty at a time:

from twisted.python import log
from twisted.internet import reactor
from twisted.web import client

def download((url, fileName)):
    return client.downloadPage(url, file(fileName, ‘wb‘))

urls = [(url, str(n)) for (n, url) in enumerate(file(‘urls.txt‘))]
finished = parallel(urls, 50, download)
finished.addErrback(log.err)
finished.addCallback(lambda ign: reactor.stop())
reactor.run()

[Edit: The original generator expression in this post was of the form ((yield foo()) for x in y). The yield here is completely superfluous, of course, so I have removed it.]

from twisted.internet import defer, reactor, task
l=[3,4,5,6]
def f(a):
    print a
work = (f(elem) for elem in l)
for i in range(3):
    work.next()

coop  
= task.Cooperator()
#work = (callable(elem, *args, **named) for elem in iterable)
d=[coop.coiterate(work) for _ in range(5)]
print d

[<Deferred at 0x1aa0c88 waiting on Deferred at 0x1aa0d50>, <Deferred at 0x1aa0dc8 waiting on Deferred at 0x1aa0e90>, <Deferred at 0x1aa0f30 waiting on Deferred at 0x1aa4030>, <Deferred at 0x1aa40d0 waiting on Deferred at 0x1aa4198>, <Deferred at 0x1aa4238 waiting on Deferred at 0x1aa4300>]

scrapy之parallel

ng- userinfo next cli overload trim data ber ddc Limiting ParallelismjcalderoneMay 22nd, 2006 This blog has moved! Read this post and its

scrapy之parallel

scrapy之parallel

初識Scrapy之再續火影情緣

python---scrapy之MySQL同步存儲

python爬蟲scrapy之如何同時執行多個scrapy爬行任務

python爬蟲scrapy之rules的基本使用

【轉載】python3安裝scrapy之windows32位爬坑

scrapy之小試身手

ORACLE HINT 之 PARALLEL

Python的scrapy之爬取鏈家網房價資訊並儲存到本地

Python的scrapy之爬取鏈家網房價信息並保存到本地

C# 多執行緒七之Parallel

Scrapy 之配置檔案 setting.py

Python的scrapy之爬取boss直聘

Python的scrapy之爬取6毛小說網

scrapy之POST請求

Python的scrapy之爬取6毛小說網的聖墟

Python爬蟲框架 scrapy之xpath選擇器 css選擇器

Python的scrapy之爬取boss直聘網站

[Vue CLI 3] 配置解析之 parallel

Python爬蟲 --- 2.5 Scrapy之汽車之家爬蟲實踐

scrapy之parallel

相關推薦