記錄------scrapy-splash爬蟲相關
splash_cebspider爬蟲程式執行
1.安裝python3
2.安裝Scrapy
3.安裝splash
命令:pip3 install scrapy-splash
命令:pip3 install pybloom-live
4.安裝docker
5.啟動docker
6.拉取映象(pull the image)
命令:docker pull scrapinghub/splash
7.用docker執行scrapinghub/splash服務:
命令: docker run -d -p 8050:8050 scrapinghub/splash
-d :後臺執行
-p :將容器內部使用的網路埠對映到我們使用的主機上
8.日誌切割工具cronolog安裝
https://www.cnblogs.com/crazyzero/p/7435691.html
下載好安裝包(https://files.cnblogs.com/files/crazyzero/cronolog-1.6.2.tar.gz)
[[email protected] tools]# wget https://files.cnblogs.com/files/crazyzero/cronolog-1.6.2.tar.gz [[email protected]
減壓並進入
[[email protected] tools]# tar xf cronolog-1.6.2.tar.gz [[email protected] tools]# cd cronolog-1.6.2
編譯安裝
[[email protected] cronolog-1.6.2]# ./configure [[email protected] cronolog-1.6.2]# make [
生成cronolog工具
[[email protected] cronolog-1.6.2]# ll /usr/local/sbin/cronolog -rwxr-xr-x 1 root root 40446 8月 11 11:55 /usr/local/sbin/cronolog
9.後臺執行splash_cebspider
命令:
nohup scrapy crawl splash_cebspider | nohup /usr/local/sbin/cronolog logs/splash_cebspider_%Y-%m-%d.log >> logs/cronolog.log 2>&1 &
監控:nohup ./monitor.sh | nohup /usr/local/sbin/cronolog logs/splash_cebspider_%Y-%m-%d.log >> logs/cronolog.log 2>&1 &