python之Scrapy框架的第一個爬蟲
阿新 • • 發佈:2018-11-25
執行:
D:\pycodes\python123demo>scrapy crawl demo
scrapy crawl demo
學習筆記:
程式碼:
D:\pycodes>scrapy startproject python123demo New Scrapy project 'python123demo', using template directory 'c:\\users\\hwp\\appdata\\local\\programs\\python\\python37\\lib\\site-packages\\scrapy\\templates\\project', created in: D:\pycodes\python123demo You can start your first spider with: cd python123demo scrapy genspider example example.com
D:\pycodes>
D:. └─python123demo │ scrapy.cfg │ └─python123demo │ items.py │ middlewares.py │ pipelines.py │ settings.py │ __init__.py │ ├─spiders │ │ __init__.py │ │ │ └─__pycache__ └─__pycache__
程式碼:
D:\pycodes\python123demo>scrapy genspider demo python123.io
Created spider 'demo' using template 'basic' in module:
python123demo.spiders.demo
會生成一個檔案:demo.py
程式碼:
*# -- coding: utf-8 --* import scrapy class DemoSpider(scrapy.Spider):#類的名字:DemoSpider(叫啥都無所謂) 繼承:scrapy.Spider name = 'demo' allowed_domains = ['python123.io']#最開始使用者提交給命令列的域名:python123.io start_urls = ['http://python123.io/']#所要爬取頁面的初始頁面! def parse(self, response):#解析頁面為空的方法! pass
產生步驟:
修改:
程式碼:
# -*- coding: utf-8 -*-
import scrapy
class DemoSpider(scrapy.Spider):
name = 'demo'
#allowed_domains = ['python123.io']
start_urls = ['http://python123.io/ws/demo.html']
def parse(self, response):
fname = response.url.split('/')[-1]
with open(fname, 'wb') as f:
f.write(response.body)
self.log('Save file %s.' % name)
執行:
D:\pycodes\python123demo>scrapy crawl demo
但是報錯誤!
不急解決辦法:https://blog.csdn.net/weixin_42859280/article/details/84481289
還要下載依賴:
連結:https://pypi.org/project/pywin32/#files
成功解決後:
demo.py程式碼的完整版本:與普通的對比!
yiled:啥意思呢~
學習筆記,不是技術文件~