安裝及使用scrapy爬蟲框架

阿新 • • 發佈：2018-12-18

1、安裝命令：pip install scrapy mac電腦使用這個命令應該能直接安裝成功，但是windows系統和Ubuntu系統在使用安裝命令會抱一個錯誤

building ‘twisted.test.raiser’ extension 
error: Microsoft Visual C++ 14.0 is required. Get it with “Microsoft Visual C++ Build Tools”: http://landinghub.visualstudio.com/visual-cpp-build-tools

然後安裝Twisted， pip install 路徑/Twsited-18.9.0-cp36-cp36m-win_amd64.whl 如：pip install C:\Users\Local\Programs\Python36\Scripts\Twsited-18.9.0-cp36-cp36m-win_amd64.whl

執行完之後再執行pip install scrapy應該就沒有問題了。

2、安裝完之後匯入scrapy就能使用了

import scrapy
# 定義一個類 繼承scrapy.Spider
class MinimalSpider(scrapy.Spider):
    # 爬蟲名
    name = 'minimalspider'
	# start_requests傳送請求
    def start_requests(self):
        print('1.start_request...')
        urls = [‘網頁地址’]

        # 非阻塞
        for url in urls:
            req = scrapy.Request(url,callback=None)
            yield req
        
        # 阻塞
        # resp = []
        # for url in urls:
        #     # callback為None時預設回撥parse方法
        #     req = scrapy.Request(url=url, callback=None)
        #     resp.append(req)
        # return resp
	# 返回response進行解析
    def parse(self, response):
        print('2.parse...')
        print(response)
        print(response.body)

執行命令：scrapy runspider 檔名稱.py windows執行的時候下可能會報一個錯誤，ImportError: No module named win32con或者ImportError: No module named win32api，這是因為windows沒有win32api的庫，Python沒有自帶訪問Windows系統API的庫，需要自行下載。這時候則安裝一下這個庫，名字叫做pywin32 執行pip install pywin32命令即可如果還不行，就到 https://sourceforge.net/projects/pywin32/files/pywin32/ 下載自己版本的pywin32

3、用scrapy建立一個爬蟲專案 1、建立專案： scrapy startproject project_name 2、建立spider cd project_name scrapy genspider spider_name www.xxxx.com(爬取的網址) 3、目錄結構

.
├── hangzhounews   -- 專案根目錄
│   ├── __init__.py
│   ├── __pycache__  --python執行臨時檔案  pyc
│   │   ├── __init__.cpython-36.pyc
│   │   └── settings.cpython-36.pyc
│   ├── items.py     -- 用來定義爬取哪些內容  （類似Django中的models）
│   ├── middlewares.py  --中介軟體
│   ├── pipelines.py    --管道，用來處理爬取的資料
│   ├── settings.py     --配置檔案
│   └── spiders         --自定義爬蟲包
│       ├── __init__.py
│       ├── __pycache__
│       │   └── __init__.cpython-36.pyc
│       └── hangzhou.py --一個爬蟲檔案
└── scrapy.cfg   -- 部署時候用的配置檔案

在專案下的終端輸入：scrapy 可檢視一些命令

Available commands:
  bench         Run quick benchmark test
  check         Check spider contracts
  crawl         Run a spider
  edit          Edit spider
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  list          List available spiders
  parse         Parse URL (using its spider) and print the results
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command

安裝及使用scrapy爬蟲框架

Python之Scrapy爬蟲框架安裝及簡單使用

安裝及使用scrapy爬蟲框架

Python3環境安裝Scrapy爬蟲框架過程及常見錯誤

Python3環境安裝Scrapy爬蟲框架過程

Scrapy爬蟲框架的安裝和使用

安裝Scrapy爬蟲框架

怎樣解決安裝scrapy爬蟲框架失敗的問題（圖文教程）？

scrapy爬蟲框架簡紹與安裝使用

scrapy爬蟲框架windows下的安裝問題

Windows下Scrapy爬蟲框架的安裝

window7系統下安裝scrapy爬蟲框架

Kali 安裝Scrapy爬蟲框架

Anaconda安裝Scrapy爬蟲框架

安裝scrapy 爬蟲框架

Python 和 Scrapy 爬蟲框架部署

Scrapy 爬蟲框架入門案例詳解

scrapy爬蟲框架

scrapy爬蟲框架實例之一

python爬蟲—使用scrapy爬蟲框架

2017.07.26 Python網絡爬蟲之Scrapy爬蟲框架

安裝及使用scrapy爬蟲框架

相關推薦