scrapy之parallel
阿新 • • 發佈:2018-04-04
ng- userinfo next cli overload trim data ber ddc
- Limiting Parallelism
from twisted.internet import defer, task def parallel(iterable, count, callable, *args, **named): coop = task.Cooperator() work = (callable(elem, *args, **named) for elem in iterable) return defer.DeferredList([coop.coiterate(work) for i in xrange(count)])
from twisted.python import log from twisted.internet import reactor from twisted.web import client def download((url, fileName)): return client.downloadPage(url, file(fileName, ‘wb‘)) urls = [(url, str(n)) for (n, url) in enumerate(file(‘urls.txt‘))] finished = parallel(urls, 50, download) finished.addErrback(log.err) finished.addCallback(lambda ign: reactor.stop()) reactor.run()
[Edit: The original generator expression in this post was of the form ((yield foo()) for x in y). The yield here is completely superfluous, of course, so I have removed it.]
from twisted.internet import defer, reactor, task l=[3,4,5,6] def f(a): print a work = (f(elem) for elem in l) for i in range(3): work.next() coop
[<Deferred at 0x1aa0c88 waiting on Deferred at 0x1aa0d50>, <Deferred at 0x1aa0dc8 waiting on Deferred at 0x1aa0e90>, <Deferred at 0x1aa0f30 waiting on Deferred at 0x1aa4030>, <Deferred at 0x1aa40d0 waiting on Deferred at 0x1aa4198>, <Deferred at 0x1aa4238 waiting on Deferred at 0x1aa4300>]
This blog has moved! Read this post and its comments at its new home. Concurrency can be a great way to speed things up, but what happens when you have too much concurrency? Overloading a system or a network can be detrimental to performance. Often there is a peak in performance at a particular level of concurrency. Executing a particular number of tasks in parallel will be easier than ever with Twisted 2.5 and Python 2.5:
scrapy之parallel