scrapy 出現４０４處理

阿新 • • 發佈：2018-10-31

第一種解決策略：

from scrapy.http import Request
from scrapy.spider import BaseSpider


class MySpider(BaseSpider):
    handle_httpstatus_list = [404, 500]
    name = "my_crawler"

    start_urls = ["http://github.com/illegal_username"]

    def parse(self, response):
        if response.status in self.handle_httpstatus_list 
:
            return Request(url="https://github.com/kennethreitz/", callback=self.after_404)

    def after_404(self, response):

print response.url

轉載至　stackoverflow

http://stackoverflow.com/questions/16909106/scrapyin-a-request-fails-eg-404-500-how-to-ask-for-another-alternative-reque

第二種解決策略：

from 
 scrapy.spider import BaseSpider
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals

class MySpider(BaseSpider):
    handle_httpstatus_list = [404] 
    name = "myspider"
    allowed_domains = ["example.com"]
    start_urls = [
        'http://www.example.com/thisurlexists.html',
         
'http://www.example.com/thisurldoesnotexist.html',
        'http://www.example.com/neitherdoesthisone.html'
    ]

    def __init__(self, category=None):
        self.failed_urls = []

    def parse(self, response):
        if response.status == 404:
            self.crawler.stats.inc_value('failed_url_count')
            self.failed_urls.append(response.url)

    def handle_spider_closed(spider, reason):
        self.crawler.stats.set_value('failed_urls', ','.join(spider.failed_urls))

    def process_exception(self, response, exception, spider):
        ex_class = "%s.%s" % (exception.__class__.__module__, exception.__class__.__name__)
        self.crawler.stats.inc_value('downloader/exception_count', spider=spider)
        self.crawler.stats.inc_value('downloader/exception_type_count/%s' % ex_class, spider=spider)

    dispatcher.connect(handle_spider_closed, signals.spider_closed)

地址為http://stackoverflow.com/questions/13724730/how-to-get-the-scrapy-failure-urls

scrapy 出現４０４處理

第一種解決策略： from scrapy.http import Request from scrapy.spider import BaseSpider class MySpider(BaseSpider): handle_httpstatus_list = [404, 5

python2 安裝scrapy出現錯誤提示解決辦法~

man 提示正常 html中 command fail 重新解決問題資料首先：set STATICBUILD=true && pip install lxml 安裝環境： windows7操作系統，已經正確安裝python,pip. 使用

Linux root目錄下的.gvfs出現異常處理

localhost 掛載 als 內容服務處理 ftp 遠程提示最近一段突然有收到一些磁盤告警，去看下結果告警內容是說在root的用戶的家目錄下有一個沒有權限訪問的目錄.gvfs，期間使用cp復制rm刪除都是提示沒有權限，看到也覺得很奇怪，都是使用root用戶

ubuntu16.04 pip install scrapy 報錯處理

libs site libffi span ESS for -i ssl col Failed building wheel for Twisted inculde/site/python3.5/Twisted failed with error code 1 in tm

如何解決 IIS 空白畫面或執行ASP.NET 時出現【處理常式 "PageHandlerFactory

通常在新的環境，發布ASP.NET網頁到 wwwroot 之後最常見的異常狀況有兩種: IIS 空白畫面處理常式 "PageHandlerFactory-I

scrapy-redis 之處理異常

今天心情不好不想多打字自己看註釋吧 1 from scrapy.http import HtmlResponse 2 from twisted.internet import defer 3 from twisted.internet.error import TimeoutEr

Scrapy Pipeline之處理CPU密集型或阻塞型操作

Twisted框架的reactor適合於處理短的、非阻塞的操作。但是如果要處理一些複雜的、或者包含阻塞的操作又該怎麼辦呢？Twisted提供了執行緒池來在其他的執行緒而不是主執行緒（Twisted的reactor執行緒）中執行慢的操作——使用reactor.ca

在用c++讀取xml檔案時，執行時出現以下錯誤：出現未處理的“System.Xml.XmlException”型別的異常出現在 system.xml.dll 中

各位高手，我剛剛開始接觸c++,使用的是Microsoft Visual Studio 2003版本，在使用c++讀取xml檔案時，編譯已經通過，但是執行.exe程式時，老是出現這個錯誤：“未處理的“System.Xml.XmlException”型別的異常出現在 syste

git初次提交程式碼到coding出現錯誤處理方法

git的安裝與配置請詳細閱讀：http://blog.csdn.net/renfufei/article/details/41647875 原文地址：http://www.open-open.com/lib/view/open1366080269265.html 在

安裝 scrapy 出現 error: Microsoft Visual C++ 14.0 is required

安裝測試繼續 red failed led rap 技術分享 bin microsoft 問題描述使用 pip install scrapy 安裝 scrapy 時出現以下錯誤： error: Microsoft Visual C++ 14.0 is required

pyecharts使用過程中出現錯誤處理

Traceback (most recent call last): File "D:/pyprograms/spider/visiondata.py", line 34, in <module> visual_split_number=4) Fi

無法驗證資料。執行當前 Web 請求期間，出現未處理的異常。請檢查堆疊跟蹤資訊

無法驗證資料。說明: 執行當前 Web 請求期間，出現未處理的異常。請檢查堆疊跟蹤資訊，以瞭解有關該錯誤以及程式碼中導致錯誤的出處的詳細資訊。異常詳細資訊: System.Web.HttpException: 無法驗證資

servlet-向頁面輸出中文出現亂碼處理方式

esp public oge pri exce etc ont prot 查詢 package cn.lijun .content; import java.io.IOException;import java.io.PrintWriter; import javax.

Windows Mysql啟動出現1069錯誤“由於登錄失敗而無法啟動服務”的處理方法

測試 hack src 服務 window bsp 安全性失敗技術分享問題現象 windows下mysql服務無法啟動，報1069錯誤。問題原因如果Mysql啟動用戶的密碼或者權限錯誤，會導致Windows服務器Mysql啟動時出現"由於登錄失敗而無法啟動服務"

關於qt png圖片出現警告信息處理

select 可能 png filename 出現 print ret filter turn 警告信息如下: libpng warning: iCCP: known incorrect sRGB profile 可能原因: png相關的庫版本更新所導致. 處理方法

恢復Oracle數據庫鏡像時出現的錯誤及處理方式

tmpfs oot memory support tab acl con rac mode 出現錯誤　　　　ORA-01034 ORACLE not available 　　　　ORA-27101 Shared memory realm does not exist 　

新導入項目出現Java compiler level does not match the version of the installed java project facet問題處理

ima project 編譯分享 face bsp 導入 ets 操作　　在使用eclipse開發java類項目的時候，免不了會在不同的設備上開發編譯同一個項目，那麽就會出現Java compiler level does not match the version o

程序發布出現：服務器無法處理請求--->無法生成臨時類（result = 1）。錯誤CS2001：未能找到源文件“C： Windows TEMP lph54vwf.0.cs”

win 臨時生成 color 無法添加權限 web windows 服務器上發布的web服務程序出錯：服務器無法處理請求--->無法生成臨時類（result = 1）。錯誤CS2001：未能找到源文件“C：\ Windows \ TEMP \ l

CMD下出現 . 點不是內部或外部命令，也不是可運行的程序或批處理文件

light alt 技術分享 class src 命令 png roo div 在cmd下鍵入命令，不識別點 >./bin/mysql -u root -p ‘.‘ 不是內部或外部命令，也不是可運行的程序或批處理文件。然後把斜杠變成反斜杠就OK了。完畢！

java出現以下警告：WARN No appenders;WARN Please initialize the log4j的處理方法

att 想去 log4j配置 gic bte log 沒有 prior 調整編譯java或引用別的代碼時出現以下警告： log4j:WARN No appenders could be found for logger (org.apache.zookeeper.Zoo

scrapy 出現４０４處理

相關推薦