1. 程式人生 > >記錄------scrapy-splash爬蟲相關

記錄------scrapy-splash爬蟲相關

splash_cebspider爬蟲程式執行

1.安裝python3

2.安裝Scrapy

3.安裝splash

命令:pip3 install scrapy-splash

3.安裝python-bloomfilter

命令:pip3 install pybloom-live

4.安裝docker

5.啟動docker

6.拉取映象(pull the image)

命令:docker pull scrapinghub/splash

7.用docker執行scrapinghub/splash服務: 

命令: docker run -d -p 8050:8050 scrapinghub/splash

-d :後臺執行

-p :將容器內部使用的網路埠對映到我們使用的主機上

8.日誌切割工具cronolog安裝

https://www.cnblogs.com/crazyzero/p/7435691.html

 

下載好安裝包(https://files.cnblogs.com/files/crazyzero/cronolog-1.6.2.tar.gz)

[[email protected] tools]# wget https://files.cnblogs.com/files/crazyzero/cronolog-1.6.2.tar.gz [[email protected]

tools]# ll cronolog-1.6.2.tar.gz -rw-r--r-- 1 root root 133591 8月 25 2017 cronolog-1.6.2.tar.gz

減壓並進入

[[email protected] tools]# tar xf cronolog-1.6.2.tar.gz [[email protected] tools]# cd cronolog-1.6.2

編譯安裝

[[email protected] cronolog-1.6.2]# ./configure [[email protected] cronolog-1.6.2]# make [

[email protected] cronolog-1.6.2]# make install

生成cronolog工具

[[email protected] cronolog-1.6.2]# ll /usr/local/sbin/cronolog -rwxr-xr-x 1 root root 40446 8月 11 11:55 /usr/local/sbin/cronolog

 

 

9.後臺執行splash_cebspider

命令:

nohup scrapy crawl splash_cebspider | nohup /usr/local/sbin/cronolog logs/splash_cebspider_%Y-%m-%d.log >> logs/cronolog.log 2>&1 &

 

監控:nohup ./monitor.sh | nohup /usr/local/sbin/cronolog logs/splash_cebspider_%Y-%m-%d.log >> logs/cronolog.log 2>&1 &

 

 

參考連結:https://www.cnblogs.com/shaosks/p/6950358.html

https://github.com/joseph-fox/python-bloomfilter