1. 程式人生 > >python_scrapy_TypeError: 'LuboavSpider' object is not iterable問題及解決

python_scrapy_TypeError: 'LuboavSpider' object is not iterable問題及解決

問題描述:在使用scrapy進行網路爬蟲的時候,在pipelines處理結果,並儲存到db中的時候出現了TypeError: 'LuboavSpider' object is not iterable錯誤

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy.exceptions import DropItem
import pymongo


class DemoPipeline(object):
    limit = 8

    def process_item(self, item, spider):
        if item:
            if len(item["tittle"]) > self.limit:
                item["tittle"] = item["tittle"][:self.limit].rstrip() + "..."
            else:
                pass
            return item
        else:
            return DropItem("item missing")


class MongoPipeling(object):
    def __init__(self, mongouri, mondb):
        self.mongouri = mongouri
        self.mongodb = mondb

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongouri=crawler.settings.get('MONGOURI'),
            mondb=crawler.settings.get('MONGODB')
        )

    def open_spider(self, spider):
        self.client = pymongo.MongoClient(self.mongouri)
        self.db = self.client[self.mongodb]

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, spider, item):
        self.db["luboav_scrapy"].insert(dict(item))
        return item

原因分析:

1.程式碼中管道檔案使用了兩個類來對排程器中傳過來的item進行處理,一個為對某欄位的長度進行限制,另一個將將資料儲存到mongodb中,所以在兩個類的處理結果方法中,都必須進行專案的返回操作,將結果傳遞給下一個處理方法或者返回給排程器

2.MongoPipeling類的process_item方法,第二個引數是spider,第三個引數是item,也就是說spider是從上一個類傳過來的專案,雖然它是spider。

修改後程式碼

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy.exceptions import DropItem
import pymongo


class DemoPipeline(object):
    limit = 8

    def process_item(self, item, spider):
        if item:
            if len(item["tittle"]) > self.limit:
                item["tittle"] = item["tittle"][:self.limit].rstrip() + "..."
            else:
                pass
            return item
        else:
            return DropItem("item missing")


class MongoPipelin(object):
    def __init__(self, mongouri, mondb):
        self.mongouri = mongouri
        self.mongodb = mondb

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongouri=crawler.settings.get('MONGOURI'),
            mondb=crawler.settings.get('MONGODB')
        )

    def open_spider(self, spider):
        self.client = pymongo.MongoClient(self.mongouri)
        self.db = self.client[self.mongodb]

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, item, spider):
        self.db["luboav_scrapy"].insert(dict(item))
        return item

再次執行程式,執行成功: