『Scrapy』終端調用&選擇器方法

阿新 • • 發佈：2017-08-25

selector 我們 resp 必須數據結構 tor ipy lec 結合

Scrapy終端

技術分享

示例，輸入如下命令後shell會進入Python（或IPython）交互式界面：

scrapy shell "http://www.itcast.cn/channel/teacher.shtml"

有一點註意的是必須是雙引號，單引號會報錯。

之後會顯示當前保存的數據結構以供查詢，這和我們編寫py腳本時的數據結構完全相同，可以直接使用相關方法，

技術分享

諸如：

技術分享

Scrapy Selectors

技術分享

如下所示，

>>> response.xpath(‘//title/text()‘)
[<Selector (text) xpath=//title/text()>]
>>> response.css(‘title::text‘)
[<Selector (text) xpath=//title/text()>]

這兩種方式提取的都是節點型數據，所以都可以使用.extract()或者.extract_first()方法提取data部分

技術分享

以下面的源碼為例進行提取示範：

<html>
 <head>
  <base href=‘http://example.com/‘ />
  <title>Example website</title>
 </head>
 <body>
  <div id=‘images‘>
   <a href=‘image1.html‘>Name: My image 1 <br /><img src=‘image1_thumb.jpg‘ /></a>
   <a href=‘image2.html‘>Name: My image 2 <br /><img src=‘image2_thumb.jpg‘ /></a>
   <a href=‘image3.html‘>Name: My image 3 <br /><img src=‘image3_thumb.jpg‘ /></a>
   <a href=‘image4.html‘>Name: My image 4 <br /><img src=‘image4_thumb.jpg‘ /></a>
   <a href=‘image5.html‘>Name: My image 5 <br /><img src=‘image5_thumb.jpg‘ /></a>
  </div>
 </body>
</html>

提取標簽屬性，

>>> response.xpath(‘//base/@href‘).extract()
[u‘http://example.com/‘]

>>> response.css(‘base::attr(href)‘).extract()
[u‘http://example.com/‘]

對提取目標路徑的標簽進行篩選，contains(@href, "image")表示href熟悉需要包含image字符，css同理，

response.xpath(‘//a[contains(@href, "image")]/@href‘).extract()
Out[1]: [‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]

response.xpath(‘//a[contains(@href, "image1")]/@href‘).extract()
Out[2]: [‘image1.html‘]

response.css(‘a[href*=image]::attr(href)‘).extract()
Out[3]: [‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]

esponse.css(‘a[href*=image2]::attr(href)‘).extract()
Out[4]: [‘image2.html‘]

結合兩者，

>>> response.xpath(‘//a[contains(@href, "image")]/img/@src‘).extract()
[u‘image1_thumb.jpg‘,
 u‘image2_thumb.jpg‘,
 u‘image3_thumb.jpg‘,
 u‘image4_thumb.jpg‘,
 u‘image5_thumb.jpg‘]

>>> response.css(‘a[href*=image] img::attr(src)‘).extract()
[u‘image1_thumb.jpg‘,
 u‘image2_thumb.jpg‘,
 u‘image3_thumb.jpg‘,
 u‘image4_thumb.jpg‘,
 u‘image5_thumb.jpg‘]

內置了正則表達式re和re_first方法，

response.xpath(‘//a[contains(@href, "image")]/text()‘)
Out[8]: 
[<Selector xpath=‘//a[contains(@href, "image")]/text()‘ data=‘Name: My image 1 ‘>,
 <Selector xpath=‘//a[contains(@href, "image")]/text()‘ data=‘Name: My image 2 ‘>,
 <Selector xpath=‘//a[contains(@href, "image")]/text()‘ data=‘Name: My image 3 ‘>,
 <Selector xpath=‘//a[contains(@href, "image")]/text()‘ data=‘Name: My image 4 ‘>,
 <Selector xpath=‘//a[contains(@href, "image")]/text()‘ data=‘Name: My image 5 ‘>]


response.xpath(‘//a[contains(@href, "image")]/text()‘).re(r‘Name:\s*(.*)‘)
Out[7]: [‘My image 1 ‘, ‘My image 2 ‘, ‘My image 3 ‘, ‘My image 4 ‘, ‘My image 5 ‘]

response.xpath(‘//a[contains(@href, "image")]/text()‘).re_first(r‘Name:\s*(.*)‘)
Out[9]: ‘My image 1 ‘

『Scrapy』終端調用&選擇器方法

selector 我們 resp 必須數據結構 tor ipy lec 結合 Scrapy終端示例，輸入如下命令後shell會進入Python（或IPython）交互式界面： scrapy shell "http://www.itcast.cn/channel/t

pyqt 調用顏色選擇器

選擇 setw self argv itl sys.argv show odin pan # -*- coding: utf-8 -*- from PyQt5.QtWidgets import QApplication, QPushButton, QColorDia

Unity SLua 如何調用Unity中C#方法

都差不多 public title 接口如何 1.原理就是通常在Lua框架中所說的，開放一個C#的web接口，或者叫做在Slua框架中註冊函數。2.作用在Lua中調用C#中的方法，這個是在做熱更新中很常用的一種方法，無論是slua，還是lua，都差不多、這

『TensorFlow』單隱藏層自編碼器設計

ase numpy loss 分享 examples sum write 對象 plt 計算圖設計很簡單的實踐，多了個隱藏層沒有上節的高斯噪聲網絡寫法由上節的面向對象改為了函數式編程，其他沒有特別需要註意的，實現如下： import numpy as np

抽象類,子類調用弗雷的方法,super

經典 clas 沒有 meta 文本優先 base pri bst 1\ 抽象類子類必須與子類一樣的函數名, 限制了子類名必須與父類名一樣,就定義了一個標準,做統一, 抽象類,不能被實例化,作用就是定義標準,不用具體實例化 1 Python本身不提供抽象類和接

Webservice客戶端動態調用服務端功能方法

test 分享 exceptio client size 缺點 efi 末尾 bindings 一、發布WebService服務方式一:在服務端生成wsdl文件，下方客戶端直接引用即可優點：針對要發布的方法生成一個wsdl文件即可，無需多余配置。缺點：每

ISE調用第三方編輯器

ras soft href tps 空格 ogr verilog word download 1、Edit->Perference 選擇Custom，然後輸入Notepad程序的路徑，註意斜杠是/。然後註意路徑用大括號括起來，然後$1和大括號之間的空格，然後-n$2

反射Reflect基礎今天主要在練習調用 Class的各種方法，以及getField() getMethod()方法.

class對象 scl struct rim 類型地址是否 .class 變量反射，動態調用類，方法，對象. 三種方法調用類的對象.Class c1=Person.class/Class c2=(Person)p.getclass/Class c3=Class.for

Python__子類調用父類的方法

name dict tin clas ict ldb per 父類 title # class OldboyPeople:# school = ‘oldboy‘# def __init__(self,name,age,sex):# self.

QT下實現對Linux Shell調用的幾種方法

nes running qprocess -o test main new rest ring 使用QProcess QThread ============================================ #include <QProcess&

在頁面調用wp編輯器 tinyMCE以及遇到的問題

初始化了解 http 包含默認值以及 -name 編輯器 fault 有多了wp後臺的編輯器覺得還不錯，由於項目需要在前端頁面調用wp編輯器，在這裏簡單說明下。 $content 編輯器初始化內容，相當於默認值， $editor_di textarea的ID ，

虛析構函數，派生類調用基類構造方法

div pub urn new turn col blog 徹底 () #include <iostream> using namespace std; class A{ public: A() { cout<<"construct A

python 調用shell命令的方法

mman status 不包含 shel 標準庫函數 cal adl lin commands 在python程序中調用shell命令，是件很酷且常用的事情…… 1. os.system(command) 此函數會啟動子進程，在子進程中執行command，並返回

Python3接口自動化調用運行器

pytho http () 結果 alt html TTT images unittest 如下圖：當無測試報告的時候，使用runner = unitttest.TextTestRunner()可以執行用例，用例結果打印在控制臺，當有了測試報告，不需要該行代碼也可以執行測試

python子類調用父類的方法

mod from 顯式調用面向對象類的方法使用 pytho trac from:http://www.crazyant.net/1303.html python和其他面向對象語言類似，每個類可以擁有一個或者多個父類，它們從父類那裏繼承了屬性和方法。如果一個方法在子類

Java中使用HttpRequest調用RESTfull的DELETE方法接口提示：How to fix HTTP method DELETE doesn't support output

del spring -m blog view bugs gpo pri not 說明：無論是Spring框架還是Spring Boot的Feign形式的客戶端，以下的解決方法都適用。解決方法：直接升級JDK 1.8，這個問題是1.7的BUG。參考： htt

C#關於多線程之線程中打開並調用窗體內的方法實例

dstar read 調用 regular app default bject object mail 第一步：如何在線程中打開窗體 SendEmailProgress progress=new SendEmailProgress();

java基礎靜態 static 問在多態中，子類靜態方法覆蓋父類靜態方法時，父類引用調用的是哪個方法？

xtend java pos main 靜態方法 show extends ava pri 多態 package com.swift.jiekou; public class Jicheng_Tuotai_jingtai_diaoyong { publ

整理類的調用方式和構造方法

-c .com per reading 實現 super 變量 image next 前言：簡單版：類加括號執行__init__()對象加括號執行__call__()，　　　　全版：類是type創建的，創建類的時候type的__init__()方法自動執行，類加括號的時候

layer通過父頁面調用子頁面的方法及屬性

ade als ont clas java title class 提交 orm 引言在使用layer.js的過程中，需要通過layer.open()以iframe的形式打開特定的頁面，同時需要用layer的按鈕對打開的頁面進行提交及重置操作，但是苦於不知如何在父頁面調用

『Scrapy』終端調用&選擇器方法

Scrapy終端

Scrapy Selectors

相關推薦