python_scrapy_TypeError: 'LuboavSpider' object is not iterable問題及解決
阿新 • • 發佈:2019-01-08
問題描述:在使用scrapy進行網路爬蟲的時候,在pipelines處理結果,並儲存到db中的時候出現了TypeError: 'LuboavSpider' object is not iterable錯誤
# -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html from scrapy.exceptions import DropItem import pymongo class DemoPipeline(object): limit = 8 def process_item(self, item, spider): if item: if len(item["tittle"]) > self.limit: item["tittle"] = item["tittle"][:self.limit].rstrip() + "..." else: pass return item else: return DropItem("item missing") class MongoPipeling(object): def __init__(self, mongouri, mondb): self.mongouri = mongouri self.mongodb = mondb @classmethod def from_crawler(cls, crawler): return cls( mongouri=crawler.settings.get('MONGOURI'), mondb=crawler.settings.get('MONGODB') ) def open_spider(self, spider): self.client = pymongo.MongoClient(self.mongouri) self.db = self.client[self.mongodb] def close_spider(self, spider): self.client.close() def process_item(self, spider, item): self.db["luboav_scrapy"].insert(dict(item)) return item
原因分析:
1.程式碼中管道檔案使用了兩個類來對排程器中傳過來的item進行處理,一個為對某欄位的長度進行限制,另一個將將資料儲存到mongodb中,所以在兩個類的處理結果方法中,都必須進行專案的返回操作,將結果傳遞給下一個處理方法或者返回給排程器
2.MongoPipeling類的process_item方法,第二個引數是spider,第三個引數是item,也就是說spider是從上一個類傳過來的專案,雖然它是spider。
修改後程式碼
# -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html from scrapy.exceptions import DropItem import pymongo class DemoPipeline(object): limit = 8 def process_item(self, item, spider): if item: if len(item["tittle"]) > self.limit: item["tittle"] = item["tittle"][:self.limit].rstrip() + "..." else: pass return item else: return DropItem("item missing") class MongoPipelin(object): def __init__(self, mongouri, mondb): self.mongouri = mongouri self.mongodb = mondb @classmethod def from_crawler(cls, crawler): return cls( mongouri=crawler.settings.get('MONGOURI'), mondb=crawler.settings.get('MONGODB') ) def open_spider(self, spider): self.client = pymongo.MongoClient(self.mongouri) self.db = self.client[self.mongodb] def close_spider(self, spider): self.client.close() def process_item(self, item, spider): self.db["luboav_scrapy"].insert(dict(item)) return item
再次執行程式,執行成功: