爬蟲系列---Scrapy框架學習

阿新 • • 發佈：2018-04-02

產生 follow everyone 頁面 pos per iso select -s

項目的需求需要爬蟲某網的商品信息，自己通過Requests,BeautifulSoup等編寫了一個spider，把抓取的數據存到數據庫裏面。

跑起來的感覺速度有點慢，尤其是進入詳情頁面抓取信息的時候，小白入門，也不知道應該咋個整，反正就是跟著學嘛。

網上的爬蟲框架還是挺多的，現在打算學習spcrapy重新寫。

下面是記錄官方文檔的一些學習notes.

scrapy的環境是在anaconda裏面搞得，所以子啊pycharm裏面的 preject interpreter 選擇anaconda下面的python.exe.

很多時候自己老是要忘記設置這個，會導致很多包都import不進來，，因為我很多包都是通過anaconda環境裝的。

技術分享圖片

下面是給的第一個測試例子

 1 class QuotesSpider(scrapy.Spider):
 2     name = "quotes"
 3     start_urls = [
 4         ‘http://quotes.toscrape.com/tag/humor/‘,
 5     ]
 6 
 7     def parse(self, response):
 8         for quote in response.css(‘div.quote‘):
 9             yield {
10                 ‘text‘: quote.css(‘ 
span.text::text‘).extract_first(),
11                 ‘author‘: quote.xpath(‘span/small/text()‘).extract_first(),
12             }
13 
14         next_page = response.css(‘li.next a::attr("href")‘).extract_first()
15         if next_page is not None:
16             yield response.follow(next_page, self.parse)

在anaconda 的prompt裏面輸入命令

scrapy runspider quote_spider.py -o quote.json

註意要在文件所在的路徑下面哦

運行成功後，會生成一個quote.json的文件

[
{"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen"},
{"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin"},
{"text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d", "author": "Garrison Keillor"},
{"text": "\u201cBeauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.\u201d", "author": "Jim Henson"},
{"text": "\u201cAll you need is love. But a little chocolate now and then doesn‘t hurt.\u201d", "author": "Charles M. Schulz"},
{"text": "\u201cRemember, we‘re madly in love, so it‘s all right to kiss me anytime you feel like it.\u201d", "author": "Suzanne Collins"},
{"text": "\u201cSome people never go crazy. What truly horrible lives they must lead.\u201d", "author": "Charles Bukowski"},
{"text": "\u201cThe trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.\u201d", "author": "Terry Pratchett"},
{"text": "\u201cThink left and think right and think low and think high. Oh, the thinks you can think up if only you try!\u201d", "author": "Dr. Seuss"},
{"text": "\u201cThe reason I talk to myself is because I\u2019m the only one whose answers I accept.\u201d", "author": "George Carlin"},
{"text": "\u201cI am free of all prejudice. I hate everyone equally. \u201d", "author": "W.C. Fields"},
{"text": "\u201cA lady‘s imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d", "author": "Jane Austen"}
]

當你執行scrapy runspider quote_spider.py -o quote.json這條命令的時候，Scrapy會在這個文件裏面去look for Spider的定義，找到後用scrapy的crawler engine運行。

通過向start_urls 屬性中定義的URL發送請求，並調用默認回調方法parse，將響應對象作為參數傳遞，從而開始爬網。在parse回調中，我們使用CSS Selector循環引用元素，產生一個帶有提取的引用文本和作者的Python字典，查找指向下一頁的鏈接，並使用與parse回調相同的方法安排另一個請求

爬蟲系列---Scrapy框架學習

產生 follow everyone 頁面 pos per iso select -s 項目的需求需要爬蟲某網的商品信息，自己通過Requests,BeautifulSoup等編寫了一個spider，把抓取的數據存到數據庫裏面。跑起來的感覺速度有點慢，尤其是進入詳情頁

爬蟲系列---Scrapy框架學習

爬蟲系列---Scrapy框架學習

Python 爬蟲 (六) -- Scrapy 框架學習

皇冠體育二代信用盤帶手機版網絡爬蟲之scrapy框架詳解

爬蟲之scrapy框架

2018 - Python 3.7 爬蟲之 Scrapy 框架的安裝及配置（一）

Scrapy框架學習（一）Scrapy框架介紹

Scrapy框架學習（二）Scrapy入門

python爬蟲中scrapy框架是否安裝成功及簡單建立

scrapy框架學習，理解不深得到的問題，我遇到的 from avimageitems.items import AvimageItem ModuleNotFoundError: No module named 'scrapy name'

Python3 Scrapy框架學習一：爬取貓眼Top100榜

Python3 Scrapy框架學習二：爬取豆瓣電影Top250

Python3 Scrapy框架學習三：爬取煎蛋網加密妹子圖片(全爬)

Python3 Scrapy框架學習四：爬取的資料存入MongoDB

Python3 Scrapy框架學習五：使用crawl模板爬取豆瓣Top250，並存入MySql、MongoDB

python爬蟲：scrapy框架xpath和css選擇器語法

Python爬蟲：Scrapy框架基礎框架結構及騰訊爬取

16.Python網路爬蟲之Scrapy框架（CrawlSpider）

股票資料爬蟲（Scrapy框架與requests-bs4-re技術路線）

Python爬蟲之scrapy框架爬蟲步驟

18、python網路爬蟲之Scrapy框架中的CrawlSpider詳解

爬蟲系列---Scrapy框架學習

相關推薦