python中scrapy處理專案資料的例項分析

阿新 • • 發佈：2020-11-24

在我們處理完資料後，習慣把它放在原有的位置，但是這樣也會出現一定的隱患。如果因為新資料的加入或者其他種種原因，當我們再次想要啟用這個檔案的時候，小夥伴們就會開始著急卻怎麼也翻不出來，似乎也沒有其他更好的蒐集辦法，而重新進行資料整理顯然是不現實的。下面我們就一起看看python爬蟲中scrapy處理專案資料的方法吧。

1、拉取專案

$ git clone https://github.com/jonbakerfish/TweetScraper.git

$ cd TweetScraper/

$ pip install -r requirements.txt #add '--user' if you are not root

$ scrapy list

$ #If the output is 'TweetScraper',then you are ready to go.

2、資料持久化

通過閱讀文件，我們發現該專案有三種持久化資料的方式，第一種是儲存在檔案中，第二種是儲存在Mongo中，第三種是儲存在MySQL資料庫中。因為我們抓取的資料需要做後期的分析，所以，需要將資料儲存在MySQL中。

抓取到的資料預設是以Json格式儲存在磁碟 ./Data/tweet/ 中的，所以，需要修改配置檔案 TweetScraper/settings.py 。

ITEM_PIPELINES = {  # 'TweetScraper.pipelines.SaveToFilePipeline':100,#'TweetScraper.pipelines.SaveToMongoPipeline':100,# replace `SaveToFilePipeline` with this to use MongoDB
  'TweetScraper.pipelines.SavetoMySQLPipeline':100,# replace `SaveToFilePipeline` with this to use MySQL
}
#settings for mysql
MYSQL_SERVER = "18.126.219.16"
MYSQL_DB   = "scraper"
MYSQL_TABLE = "tweets" # the table will be created automatically
MYSQL_USER  = "root"    # MySQL user to use (should have INSERT access granted to the Database/Table
MYSQL_PWD  = "admin123456"    # MySQL user's password

內容擴充套件：

scrapy.cfg是專案的配置檔案

from scrapy.spider import BaseSpider
class DmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
  "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/","http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self,response):
  filename = response.url.split("/")[-2]
  open(filename,'wb').write(response.body)

到此這篇關於python中scrapy處理專案資料的例項分析的文章就介紹到這了,更多相關python爬蟲中scrapy如何處理專案資料內容請搜尋我們以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援我們！

python中scrapy處理專案資料的例項分析

python中scrapy處理專案資料的例項分析

python中如何處理json資料

python爬蟲用scrapy獲取影片的例項分析

在python中操作mongodb，資料分析概念，工作流程，三劍客，ipython模組，jupyter模組和Anaconda軟間

【Python環境】Python中的結構化資料分析利器-Pandas簡介

python shutil檔案操作工具使用例項分析

Python + Requests + Unittest介面自動化測試例項分析

python迭代器常見用法例項分析

Python 靜態方法和類方法例項分析

Python 類的魔法屬性用法例項分析

Python中的引用和拷貝例項解析

Python aiohttp百萬併發極限測試例項分析

Python 單例設計模式用法例項分析

python中如何實現將資料分成訓練集與測試集的方法

Python中turtle庫的使用例項

python中count函式簡單的例項講解

mysql中GROUP_CONCAT的使用方法例項分析

Python中常用的高階函式例項詳解

YII2框架中操作資料庫的方式例項分析

python針對Oracle常見查詢操作例項分析

python中scrapy處理專案資料的例項分析

相關推薦