遍歷抽屜的頁碼

阿新 • • 發佈：2018-11-05

# -*- coding: utf-8 -*-
import scrapy
import sys
import io
sys.stout = io.TextIOWrapper(sys.stdout.buffer,encoding="gb18030")
from scrapy.selector import Selector,HtmlXPathSelector
from pyquery import PyQuery
from scrapy.http import Request

class ChoutiSpider(scrapy.Spider):
    name = 'chouti'
    allowed_domains = ['chouti.com']
    start_urls = ['http://dig.chouti.com/']
    visited_list = set()#集合 防止重複的網頁

    def parse(self, response):
        content = str(response.body, encoding="utf-8")
        pq = PyQuery(content)
        # item = pq.find("#content-list .item")
        # for i in item.items():
        #     print(i.find(".show-content ").text().strip())


        # hsx = Selector(response=response).xpath('//div[@id="content-list"]/div[@class="item"]')
        # for obj in hsx:
        #     a = obj.xpath('.//a[@class="show-content color-chag"]/text()').extract_first().strip()
        #     print(a)
        pages = pq.find("#dig_lcpage li:gt(0)")
        for page in pages.items():
            index_web = page.find("a").attr("href")
            web = "https://dig.chouti.com%s" % index_web
            if web in self.visited_list or index_web == None:
                pass
            else:
                self.visited_list.add(web)
                print(web)
                yield Request(url=web, callback=self.parse)#給排程器用回撥函式解析

(venv) D:\shan>scrapy crawl chouti --nolog
https://dig.chouti.com/all/hot/recent/2
https://dig.chouti.com/all/hot/recent/3
https://dig.chouti.com/all/hot/recent/4
https://dig.chouti.com/all/hot/recent/5
https://dig.chouti.com/all/hot/recent/6
https://dig.chouti.com/all/hot/recent/7
https://dig.chouti.com/all/hot/recent/8
https://dig.chouti.com/all/hot/recent/9
https://dig.chouti.com/all/hot/recent/10
https://dig.chouti.com/all/hot/recent/1
https://dig.chouti.com/all/hot/recent/11
https://dig.chouti.com/all/hot/recent/12
https://dig.chouti.com/all/hot/recent/13
https://dig.chouti.com/all/hot/recent/14

如果要限制遞迴的層數可以在settings檔案裡設定DEPTH_LIMIT=你要限制的層數，

新增請求頭也在settings裡。

遍歷抽屜的頁碼

# -*- coding: utf-8 -*- import scrapy import sys import io sys.stout = io.TextIOWrapper(sys.stdout.buffer,encoding="gb18030") from scrapy.selector imp

遍歷map

sys htm shm key .com ext hashmap iter system public static void main(String[] args) { Map<String, String> map = new HashMap<St

算法 - 遍歷二叉樹- 遞歸和非遞歸

main tor out ash nbsp null args class ring import java.util.Stack; import java.util.HashMap; public class BinTree { private

二維數組遍歷

filepath add path i++ 讀取 length emp alt -1 從列表中讀取二維數組 Object[][] ss = ExcelUtil.getTestData(Constant.TestDataExcelFilePath, Constant.Tes

樹的創建和遍歷

樹#include <stdio.h>#include <stdlib.h>struct node{ char data; struct node* left; struct node* right;};void preorder(struct node* root)

ng-repeat循環遍歷的用法

-i -a 打印用法 ini in use bin 循環 bsp ng-repeat循環遍歷的用法 <script src="../angular-1.5.5/angular.min.js"></script></head>&l

Map遍歷四種常用方法

map.entry pri ext try set next() 方式 keys println Map常用四種遍歷方式一：　　Map<String,String> map = new HashMap<String,String>();

JavaScript遍歷IP段內所有IP

hive write ora cor script list urn get ray 思路：將兩個IP轉換為數字進行比較，小的那個慢慢加一，直到變成大的那個IP所轉換的數字，將這其中的數字再轉換為IP地址即為IP段內所有的IP。 1 //IP轉數字 2 functio

數據結構與算法第10周作業——二叉樹的創建和遍歷算法

技術分享 truct order traverse eof 結構後序遍歷 lib void 一、二叉樹的創建算法（遞歸方式）二、二叉樹的先序、中序和後序遍歷算法 #include<stdio.h>#include<stdlib.h>typedef

數據結構-第10周作業（二叉樹的創建和遍歷算法）

樹的創建創建 -1 數據結構二叉分享 com jpg 遍歷算法數據結構-第10周作業（二叉樹的創建和遍歷算法）

二叉樹的遍歷實現

size 非遞歸算法沒有 con nod order reorder 實現 traverse 二叉樹的先序遍歷//先序遍歷二叉樹的遞歸實現 void PreOrderTraverse(BiTree T) { if(T) { printf("%2c",T->

關於前端遍歷td並且看checkBox是否選中問題

現在 row 數據 function val 後臺這一 tex substr <table id="detailTable"> <thead> <th><input style="text-align: c

C/C++遍歷目錄下的所有文件（Windows/Linux篇，超詳細）

檢查 msd 字符 size tro 也會結構 () alt 前面的一篇文章我們講了用Windows API遍歷一個目錄下的所有文件，這次我們講用一種Windows/Linux通用的方法遍歷一個目錄下的所有文件。 Windows/Linux的IDE都會提供一個頭文件—

Java中如何利用File類遞歸的遍歷指定目錄中的所有文件和文件夾

generated class pan 目錄遍歷 tabs () tab java package cuiyuee; import java.io.File; import java.util.ArrayList; import java.util.List; pu

SQL Server遍歷表的幾種方法

都是遍歷 lec delet -s rop itl 想要 temp 　在數據庫開發過程中，我們經常會碰到要遍歷數據表的情形，一提到遍歷表，我們第一印象可能就想到使用遊標，使用遊標雖然直觀易懂，但是它不符合面向集合操作的原則，而且性能也比面向集合低。當然，從面向集合操作的角

POJ 1849 Two(遍歷樹)

bold cost spa align div col sizeof 最小 turn POJ 1849 Two(遍歷樹) http://poj.org/problem?id=1849 題意: 有一顆n個結點的帶權的無向樹, 在s結點放兩個機器人,

foreach遍歷 < 創建表 >練習題

html idt 蘋果 col spa 葡萄效果圖 source array 1 </head> 2 3 <body> 4 <table border="1" width="500" height="300"/> 5 &

集合的三種遍歷方式

叠代器 whl print 循環下一個 sys 三種 iterator for 1、for循環代碼實現： for(int i=0;i<list.size();i++){ product p=list.get(i); System.println(p); } 2、叠

java實現二叉樹的構建以及3種遍歷方法

輸出 for () 如果順序 bintree 參考 oca gpl 轉載自http://ocaicai.iteye.com/blog/1047397 大二下學期學習數據結構的時候用C介紹過二叉樹，但是當時熱衷於java就沒有怎麽鳥二叉樹，但是對二叉樹的構建及遍歷一

二叉樹層次遍歷

!= problem splay color list gif 二叉樹層次遍歷 eno empty http://www.lintcode.com/zh-cn/problem/binary-tree-level-order-traversal/# 錯誤點：queue是抽象的

遍歷 抽屜的頁碼

相關推薦

遍歷抽屜的頁碼