scrapy配置以及下載到資料庫或者csv檔案或者json檔案：

阿新 • • 發佈：2018-11-17

普通 srcapy檔案：

scrapy startproject 專案名稱
srcapy gensiplder 檔名 域名
#如：srcapy gensiplder baidu baidu.com
#會生成一個baidu.py檔案
scrapy crawl 名字
#執行檔案

另一種方式：

scrapy startproject 專案名稱
srcapy gensiplder   -t   crawl   檔名   域名
#如：srcapy gensiplder baidu baidu.com
#會生成一個baidu.py檔案
scrapy crawl 名字
#執行檔案

用瀏覽器爬取時的setting檔案的配置：

DOWNLOADER_MIDDLEWARES = {
    'Zhilian.middlewares.ZhilianDownloaderMiddleware': 543,
    #這個是下載中介軟體，如果開啟了這個中介軟體，整個下載器下載的過程，將會經過中介軟體的過濾
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware':None,
}

middlewares.py 檔案的配置

 def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.
        print("下載正在進行。。。")
        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called

        # 當下載器下載url的時候被呼叫
        # 攔截下載器的下載，植入selenium+webdriver
        opt = webdriver.ChromeOptions()
        opt.add_argument("--headless")
        driver = webdriver.Chrome(options=opt)
        # 用瀏覽器發起get請求
        driver.get(request.url)
        sleep(1)
        body = driver.page_source
        # 根據瀏覽器解析出來的網頁原始碼創建出一個響應物件
        return HtmlResponse(url=driver.current_url, body=body, encoding='utf-8', request=request)

補充：存入資料庫或者csv檔案或者json檔案

import csv
class InterstingPipeline(object):
    def open_spider(self,spider):
        self.csv_file = open("u148.csv",'w',encoding='utf-8')
        self.csvItems = []
    def process_item(self, item, spider):#管道，每次迭代執行一次
        csv_item=[]
        csv_item.append(item["author"])
        csv_item.append(item["title"])
        csv_item.append(item["img"])
        csv_item.append(item["abstract"])
        csv_item.append(item["time"])
        self.csvItems.append(csv_item)
        return item

    def close_spider(self, spider):
        write =csv.writer(self.csv_file)
        write.writerow(["author","title","img","abstract","time"])
        write.writerow(self.csvItems)
        self.csv_file.close()

 def open_spider(self,spider):
        self.zhilian_json = open('zhilian.json','w',encoding='utf-8')
        self.items=[]
    def process_item(self, item, spider):
        self.items.append(dict(item))
        return item
    def close_spider(self,spider):
        self.zhilian_json.write(json.dumps(self.items))
        self.zhilian_json.close()

# def open_spider(self,apider):
    #     self.conn =pymysql.connect(host = '127.0.0.1',port='3306',db='zhilian',user='cy',password = '123456',charset = 'utf8')#與資料庫之間建立連結
    #     self.cursor = self.conn.cursor()#建立遊標
    # def process_item(self, item, spider):
    #     sql = 'INSERT INTO zl VALUES(NULL ,"%s","%s","%s","%s","%s","%s",)'%(item['name'],item['salary'],item['fuli'],item['address'],item['jingyan'],item['company'])#插入資料庫的語句
    #     self.cursor.execute(sql)
    #     self.conn.commit()
    #     return  item
    # def close_spider(self,spider):
    #     self.cursor.close()#關閉遊標
    #     self.conn.close()#關閉資料庫

scrapy配置以及下載到資料庫或者csv檔案或者json檔案：

普通 srcapy檔案： scrapy startproject 專案名稱 srcapy gensiplder 檔名域名 #如：srcapy gensiplder baidu baidu.com #會生成一個baidu.py檔案 scrapy crawl 名字 #執行檔案另一種

Jdk環境配置以及mysql資料庫安裝配置環境總結

Jdk環境配置以及mysql資料庫安裝配置環境總結 1、如何配置JAVA環境 JAVA語言越來越流行，所佔比率越來越大，掌握JAVA語言無論是對程式設計還是今後找工作都有很大的幫助。然而對剛入門JAVA者來說，如何配置JAVA環境，是件煩瑣的事情，很容易出錯。下面將提供如何配置JAV

mongoDB匯出資料庫所有集合內容到json檔案

網上搜了一圈，官方並有提供批量匯出所有集合到json檔案的方法。有不少指令碼可以實現，但是我還是習慣用java，如下 package starcLL.webClient; import java.io.File; import java.io.FileOutputStream; i

處理CSV檔案和JSON檔案

處理CSV檔案：直接上示例吧：import csv exampleFile = open('example.csv')# 假設csv檔案已在工作目錄下 exampleReader = csv.reader(exampleFile) print(list(exampleRead

Python3[爬蟲實戰] scrapy爬取汽車之家全站連結存json檔案

昨晚晚上一不小心學習了崔慶才，崔大神的部落格，試著嘗試一下爬取一個網站的全部內容，福利吧網站現在已經找不到了，然後一不小心逛到了汽車之家 (http://www.autohome.com.cn/beijing/) 很喜歡這個網站，女人都喜歡車，更何況男人呢。（

stm32使用cjson檔案解析json檔案，分配記憶體宕機問題解決方案

使用乙太網從伺服器上拉取json字串，接收之後進行解析，但是cjson的包在使用malloc和free的時候經常出現問題，在研究stm32的記憶體分配問題之後，網上很多說是要自己寫記憶體管理，但是實際使用會產生很多問題，但是檢視手冊發現stm32F407本身已經有192kb的

Xml檔案和Json檔案的轉換

Xml檔案解析麻煩，所以一般將Xml檔案轉換為Json檔案來獲取檔案中的某個資訊。因為公司要做Jenkins的二次開發，所以會用到獲取Jenkins的Job資訊功能，Jenkins的job資訊是Xml儲存的，此時就轉換為了Json，進行資訊讀取，具體程式碼如下。涉及到xml轉

python Scrapy網路爬蟲實戰（存Json檔案以及存到mysql資料庫）

1-Scrapy建立新工程在開始爬取之前，您必須建立一個新的 Scrapy 專案。進入您打算儲存程式碼的目錄中【工作目錄】，執行下列命令，如下是我建立的一個爬取豆瓣的工程douban【儲存路徑為：C:\python27\web】: 命令： scrapy star

Nginx 一個伺服器多域名配置以及訪問php檔案直接下載而不執行

1.環境，為了方便直接使用lnmp 一鍵安裝包安裝成 http://lnmp.org/install.html 安裝完成後Nginx 配置在 /usr/local/nginx/conf/nginx.conf 在 /usr/local/nginx/conf 資料夾下有一個

EXCEL開啟csv檔案中文亂碼，以及匯入資料庫中文亂碼

1.將csv檔案用notepad++開啟，選擇encoding列表，選擇encode in UTF-8-BOM編碼方式，點集儲存，再次用EXCEL開啟就可以正常顯示 2.將上述的csv檔案匯入SQL中，選址65001（UTF—8）編碼方式，並在高階那一項將帶有中文的那一列的

SpringBoot通過自己的配置檔案或者從資料庫spring security動態配置url許可權

我使用springboot的時候想做自己的配置檔案的，用不了xml就重寫了過濾器首先需要了解spring security內建的各種filter： Alias Filter Class Namespace Element or Attribute CHANNEL

Struts框架上傳下載檔案輔助類，簡單實現Struts上傳圖片以及下載

首先在看這篇文章的前提下，你得會用Struts框架，有一定的基礎瞭解，說白了瞭解怎麼搭建就行了，然後基本就能順利執行本篇文章的Demo，當然這個類不僅僅侷限於圖片上傳下載的，因為是自己用流寫的方法所以可以支援其他檔案上傳下載。

pdf 轉化image(網路下載的檔案或者本地檔案)

需要jar包 pdfbox-2.0.12.jar, fontbox-2.0.12.jar package cn.zgjzd.app.util.zjl; import java.awt.image.BufferedImage; import java.io.Buffered

maven的下載安裝 settings.xml 和系統環境變數配置以及 idea maven配置

大致步驟：下載（1） Apache Maven 》》（2）settings.xml 下配置本地倉庫地址 && 阿里雲遠端倉庫》》（3）系統環境變數配置 MAVE_HOME 和編輯環境變數如果你覺得慢的話這裡提供一下 apache-mave

sublime教程以及python環境的配置(一)下載與安裝，常用快捷鍵和漢化教程

本文為我的sublime的筆記以及某些常用功能的介紹，還有基於sublime編輯器上python環境的配置。我使用的該編輯器版本為為Sublime Text Build 3176版本，目前，python已更新到了3.7版本，歡迎大家下載最新版本並比較。一：Su

windows 環境下pip環境變數配置以及如何使用pip安裝庫檔案，sklearn,numpy等

python 使用pip安裝所需要的庫檔案什麼是pip pip 環境變數配置多個python 版本下如何使用pip安裝庫檔案什麼是pip pip 是通用的 Python 包管理工具。提供了對 Python 包的查詢、下載、安裝、解除安裝的

將資料庫的資料轉換為excel檔案下載到本地的方法中遇到的困難及解決方案

以下是將資料庫的資料轉換為excel檔案下載到本地的方法 @Autowired private BooksService booksService; @RequestMapping("/downloadBooksExcel") public void down

java寫檔案、匯出效率問題以及下載問題

首先比較常用的輸出流方法分為兩大類：OutputStream類和Writer類，分別有FileOutputStream、BufferedOutputStream、FileWriter和BufferedWriter。通過以上兩篇文章我們可以發現，FileWri

spring boot 上傳檔案配置以及前後臺程式碼

spring boot上傳檔案 1、pom.xml依賴新增 org.apache.poi poi 3.8 com

scrapy配置以及下載到資料庫或者csv檔案或者json檔案：

相關推薦