1. 程式人生 > 實用技巧 >SpringBoot+MybaitsPlus+Webmagic+AMIS爬取什麼值得買並展示

SpringBoot+MybaitsPlus+Webmagic+AMIS爬取什麼值得買並展示

1. WebMagic爬蟲框架

WebMagic是一個簡單靈活的Java爬蟲框架。基於WebMagic,你可以快速開發出一個高效、易維護的爬蟲。

1.1 相關文件

官網:http://webmagic.io

中文文件地址: http://webmagic.io/docs/zh/

English: http://webmagic.io/docs/en

1.2 WebMagic結構如下

WebMagic的結構分為DownloaderPageProcessorSchedulerPipeline四大元件,並由Spider將它們彼此組織起來。這四大元件對應爬蟲生命週期中的下載、處理、管理和持久化等功能。

2.SpringBoot整合MybatisPlus+WebMagic

2.1 整合WebMagic

spring bootwebmagic的結合主要有三個模組,分別為爬取模組Processor,入庫模組Pipeline,向資料庫存入爬取資料,和定時任務模組Scheduled,複製定時爬取網站資料。

2.1.1 新增maven依賴
<!--爬蟲框架 -->
<dependency>
    <groupId>us.codecraft</groupId>
    <artifactId>webmagic-core</artifactId>
    <version>0.7.3</
version> </dependency> <dependency> <groupId>us.codecraft</groupId> <artifactId>webmagic-extension</artifactId> <version>0.7.3</version> </dependency>
2.1.2 爬取模組Processor

爬取什麼值得買的頁面的Processor,分析什麼值得買的頁面資料,獲取響應的連結和標題,放入wegmagic的Page中,到入庫模組取出新增到資料庫。程式碼如下

package com.dxz.spider.HttpUtil;
​
import com.dxz.spider.model.SmzdmModel;
import com.dxz.spider.util.TimeUtil;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Component;
import us.codecraft.webmagic.Page;
import us.codecraft.webmagic.Site;
import us.codecraft.webmagic.processor.PageProcessor;
​
@Slf4j
@Component
public class SmzdmPageProcessor implements PageProcessor {
​
    // 部分一:抓取網站的相關配置,包括編碼、抓取間隔、重試次數等
    //抓取網站的相關配置,包括:編碼、抓取間隔、重試次數等
    private Site site = Site.me()
            .setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36")
            .setTimeOut(10 * 1000)
            .setRetryTimes(3)
            .setRetrySleepTime(3000);
​
    // process是定製爬蟲邏輯的核心介面,在這裡編寫抽取邏輯
    @Override
    public void process(Page page) {
        // 部分二:定義如何抽取頁面資訊,並儲存下來 \\w+
        if (page.getUrl().regex("https://search.smzdm.com/\\?c=faxian&s=GU&v=b&p=\\d+").match()){
            page.addTargetRequests(page.getHtml().xpath("//ul[@id='J_feed_pagenation']/li/a").links().all());
            page.addTargetRequests(page.getHtml().xpath("//div[@class=feed-main-con]/ul[@id='feed-main-list']/li/div/div[@class='z-feed-content']/h5/a").links().all());
        }else {
            SmzdmModel smzdmModel = new SmzdmModel();
            String imgLocation = page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='info']/a/img[@class=main-img]/@src").get();
            // 獲取物品的url
            String url= page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='info']/a/@href").get();
            String title= page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='info']/div[@class='info-right']/div[@class='title-box']/h1[@class='title']/text()").get();
            String price  = page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='info']/div[@class='info-right']/div[@class='title-box']/div[@class='price']/span/text()").get();
            String introduce = page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='item-name']/article/div[@class='baoliao-block']/p/text()").get();
            String baoliao =  page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='item-name']/article/p/text()").get();
            String time  = page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='info']/div[@class='info-right']/div[@class='info-details']/div[@class='author-info']/span[@class='time']/text()").get();
            String zhi = page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='item-name']/div[@class='score_rateBox']/div[@class='score_rate']/span[@id='rating_worthy_num']/text()").get();
            String buZhi = page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='item-name']/div[@class='score_rateBox']/div[@class='score_rate']/span[@id='rating_unworthy_num']/text()").get();
            String start = page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='item-name']/div[@class='operate_box']/div[@class='operate_icon']/a[@class='fav']/span/text()").get();
            String pl =page.getHtml().xpath("//section[@id='feed-wrap']/article/div[@id='feed-main']/div[@class='item-name']/div[@class='operate_box']/div[@class='operate_icon']/a[@class='comment']/em[@class='commentNum']/text()").get();
            if (StringUtils.isBlank(introduce)){
                smzdmModel.setIntroduce(baoliao);
            }
            time = TimeUtil.handSmzdm(time);
            smzdmModel.setUrl(url);
            smzdmModel.setTitle(title);
            smzdmModel.setPrice(price);
            smzdmModel.setIntroduce(introduce);
            smzdmModel.setFbtime(time);
            smzdmModel.setNoZhi(buZhi);
            smzdmModel.setZhi(zhi);
            smzdmModel.setStart(start);
            smzdmModel.setPl( pl);
            smzdmModel.setImgurl(imgLocation);
            // 將爬取結果儲存起來,key為smzdm value為爬取的資料即為smzdmModel的物件
            page.putField("smzdm",smzdmModel);
        }
    }
​
    @Override
    public Site getSite() {
        return site;
    }
}
2.1.3 入庫模組Pipeline

入庫模組結合MyBatisPlus模組一起組合成入庫方法,繼承webmagic的Pipeline,然後實現方法,在process方法中獲取爬蟲模組的資料,然後呼叫MybatisPlus的save方法。程式碼如下:

package com.dxz.spider.HttpUtil;
​
​
import com.dxz.spider.model.HotWeeklyBlogs;
import com.dxz.spider.model.SmzdmModel;
import com.dxz.spider.service.SmzdmService;
import com.dxz.spider.service.WeeklyService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import us.codecraft.webmagic.ResultItems;
import us.codecraft.webmagic.Task;
import us.codecraft.webmagic.pipeline.Pipeline;
​
@Component
public class MysqlPipeline implements Pipeline {
​
    @Autowired
    private WeeklyService weeklyService;
​
    @Autowired
    private SmzdmService smzdmService;
​
    @Override
    public void process(ResultItems resultItems, Task task) {
        // 取出processor過程中儲存的結果,和Map類似,取出的key為smzdm和blogs
        HotWeeklyBlogs blogs = resultItems.get("blogs");
        SmzdmModel smzdmModel = resultItems.get("smzdm");
        if (blogs!=null){
            weeklyService.save(blogs);
        }else if (smzdmModel!=null){
            smzdmService.save(smzdmModel);
            System.out.println(smzdmModel.toString());
        }
    }
}
​

2.1.4 定時任務模組Scheduled

使用spring boot自帶的定時任務註解@Scheduled(cron = "* * * * * ? "),每天每分鐘執行一次爬取任務,在定時任務裡調取webmagic的爬取模組Processor。程式碼如下:

package com.dxz.spider.HttpUtil;
​
import com.dxz.spider.WebMagicBugs.HttpClientDownloader;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import us.codecraft.webmagic.Spider;
​
@Component
@Slf4j
public class AllSpiderStarter {
    @Autowired
    private MysqlPipeline mysqlPipeline;
​
    @Scheduled(cron = "* * * * * ?")
    public void WeeklyScheduled(){
        log.info("開始執行爬取任務");
        Spider.create(new SmzdmPageProcessor())
                .setDownloader(new HttpClientDownloader())
                .addUrl("https://blog.hellobi.com/hot/monthly?page=1")
                .thread(5)
                .addPipeline(mysqlPipeline)
                .run();
    }
​
}
​

springboot啟動類上加註解@EnableScheduling

import com.dxz.spider.util.HotMonthWebMagic;
import org.mybatis.spring.annotation.MapperScan;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.scheduling.annotation.EnableScheduling;
​
@SpringBootApplication
@EnableScheduling
@MapperScan("com.dxz.spider.mapper")
public class SpiderApplication {
    public static void main(String[] args) {
        SpringApplication.run(SpiderApplication.class, args);
    }
}

2.2 整合MybatisPlus

2.1.1 MyBatisPlus

MyBatis-Plus(簡稱 MP)是一個 MyBatis 的增強工具,在 MyBatis 的基礎上只做增強不做改變,為簡化開發、提高效率而生。

使用上基本和MyBatis一致,但是集成了基本的CRUD介面,對基本的CRUD可以直接呼叫。

官網地址

https://mp.baomidou.com/

2.1.2 匯入maven依賴
<!-- Mybatis-plus -->
<dependency>
    <groupId>com.baomidou</groupId>
    <artifactId>mybatis-plus-boot-starter</artifactId>
    <version>3.0.5</version>
</dependency>
2.1.3 編寫Mapper、Server和Model

什麼值得買爬取的Model類

package com.dxz.spider.model;
​
import com.baomidou.mybatisplus.annotation.TableField;
import com.baomidou.mybatisplus.annotation.TableName;
import lombok.Data;
​
/**
 * 什麼值得買的資料庫模型
 */
@Data
// TODO:對應資料庫的名字,可自行更改
@TableName("smzdm")
public class SmzdmModel {
    /**
     * 標題
     */
    private String title;
    /**
     * 價格
     */
    private String price;
    /**
     * 簡介
     */
    private String introduce;
    /**
     * 認為值的人數
     */
    private String zhi;
    /**
     * 認為不值得人數
     */
    //TODO:對應的資料庫列的名字,可自行更改
    @TableField(value = "NoZhi")
    private String NoZhi;
    /**
     * 收藏的人數
     */
    private String start;
    /**
     * 評論數
     */
    private String pl;
    /**
     * 釋出時間
     */
    private String fbtime;
    /**
     * url
     */
    private String url;
​
    /**
     * 圖床連結
     */
    private String imgurl;
}

編寫Mapper類

public interface SmzdmMapper extends BaseMapper<SmzdmModel> {
    @Select("select * from smzdm")
    List<SmzdmModel> selectAll();
}

繼承BaseMapper<T>介面,獲取基礎的CRUD

@Service
@Slf4j
public class SmzdmService extends ServiceImpl<SmzdmMapper, SmzdmModel> {
    public List<SmzdmModel> selectAll(){
        return smzdmMapper.selectAll();
    }
}

編寫application.properties

spring.datasource.username=root
spring.datasource.password=123456
spring.datasource.url=jdbc:mysql://localhost:3306/dxzstudy?useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai
spring.datasource.driverClassName = com.mysql.cj.jdbc.Driver
// mybatis的xml的儲存位置
mybatis-plus.mapper-locations=classpath:mapperxml/*.xml

整合完畢!

3.編寫檢視AMIS

3.1 What is AMIS ?

amis 是一個前端低程式碼框架,它使用 JSON 配置來生成頁面,可以極大節省頁面開發工作量,極大提升開發前端介面的效率。 有了AMIS,對於基本的介面,就算程式設計師不會前端。只要會JSON配置,或者說只要會漢語就能很快上手了。百度開源的神器!!!

參考文件

https://baidu.github.io/amis/docs/intro?page=1

3.2 下載css和js

從官網下載sdk.css和就sdk.js

3.3 編寫HTML頁面

<!DOCTYPE html>
<html lang="zh">
<head>
    <meta charset="UTF-8"/>
    <title>什麼值得買</title>
    <meta name="referrer" content="no-referrer" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <meta
            name="viewport"
            content="width=device-width, initial-scale=1, maximum-scale=1"
    />
    <meta http-equiv="X-UA-Compatible" content="IE=Edge"/>
    <link rel="stylesheet" href="/static/sdk.css"/>
    <style>
        html,
        body,
        .app-wrapper {
            position: relative;
            width: 100%;
            height: 100%;
            margin: 0;
            padding: 0;
        }
    </style>
</head>
<body>
<div id="root" class="app-wrapper"></div>
<script src="/static/sdk.js"></script>
<script type="text/javascript">
    (function () {
        var amis = amisRequire('amis/embed');
        amis.embed('#root', {
            "$schema": "https://houtai.baidu.com/v2/schemas/page.json#",
            "type": "page",
            "title": "什麼值得買優衣庫專場",
            "toolbar": [
​
                {
                    "type": "button",
                    "actionType": "dialog",
                    "label": "新增",
                    "icon": "fa fa-plus pull-left",
                    "primary": true,
                    "dialog": {
                        "title": "新增",
                        "body": {
                            "type": "form",
                            "name": "sample-edit-form",
                            "api": "",
                            "controls": [
                                {
                                    "type": "alert",
                                    "level": "info",
                                    "body": "因為沒有配置 api 介面,不能真正的提交哈!"
                                },
                                {
                                    "type": "text",
                                    "name": "text",
                                    "label": "文字",
                                    "required": true
                                },
                                {
                                    "type": "divider"
                                },
                                {
                                    "type": "image",
                                    "name": "image",
                                    "label": "圖片",
                                    "required": true
                                },
                                {
                                    "type": "divider"
                                },
                                {
                                    "type": "date",
                                    "name": "date",
                                    "label": "日期",
                                    "required": true
                                },
                                {
                                    "type": "divider"
                                },
                                {
                                    "type": "select",
                                    "name": "type",
                                    "label": "選項",
                                    "options": [
                                        {
                                            "label": "漂亮",
                                            "value": "1"
                                        },
                                        {
                                            "label": "開心",
                                            "value": "2"
                                        },
                                        {
                                            "label": "驚嚇",
                                            "value": "3"
                                        },
                                        {
                                            "label": "緊張",
                                            "value": "4"
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                }
            ],
            "body": [
                {
                    "type": "form",
                    "title": "條件輸入",
                    "className": "m-t",
                    "wrapWithPanel": false,
                    "target": "service1",
                    "mode": "inline",
                    "controls": [
                        {
                            "type": "text",
                            "name": "keywords",
                            "placeholder": "關鍵字",
                            "addOn": {
                                "type": "button",
                                "icon": "fa fa-search",
                                "actionType": "submit",
                                "level": "primary"
                            }
                        }
                    ]
                },
                {
                    "type": "crud",
                    "api": "http://localhost:8080/getAll",
                    "defaultParams": {
                        "perPage": 5
                    },
                    "columns": [
                        {
                            "name": "title",
                            "label": "標題",
                            "type": "text"
                        },
                        {
                            "name": "price",
                            "label": "價格",
                            "type": "text"
                        },
                        {
                            "name": "url",
                            "label": "商品連結",
                            "type": "text"
                        },
                        {
                            "type": "image",
                            "label": "物品圖片",
                            "multiple": false,
                            "name": "imgurl",
                            "popOver": {
                                "title": "檢視大圖",
                                "body": "<div class=\"w-xxl\"><img class=\"w-full\" src=\"${imgurl}\"/></div>"
                            }
                        },
                        {
                            "name": "fbtime",
                            "type": "date",
                            "label": "釋出日期"
                        },
                        {
                            "type": "container",
                            "label": "操作",
                            "body": [
                                {
                                    "type": "button",
                                    "icon": "fa fa-eye",
                                    "level": "link",
                                    "actionType": "dialog",
                                    "tooltip": "檢視",
                                    "dialog": {
                                        "title": "檢視",
                                        "body": {
                                            "type": "form",
                                            "controls": [
                                                {
                                                    "type": "static",
                                                    "name": "title",
                                                    "label": "標題"
                                                },
                                                {
                                                    "type": "divider"
                                                },
                                                {
                                                    "type": "static",
                                                    "name": "price",
                                                    "label": "價格"
                                                },
                                                {
                                                    "type": "divider"
                                                },
                                                {
                                                    "type": "static-image",
                                                    "label": "圖片",
                                                    "name": "imgurl",
                                                    "popOver": {
                                                        "title": "檢視大圖",
                                                        "body": "<div class=\"w-xxl\"><img class=\"w-full\" src=\"${imgurl}\"/></div>"
                                                    }
                                                },
                                                {
                                                    "type": "divider"
                                                },
                                                {
                                                    "name": "fbtime",
                                                    "type": "static",
                                                    "label": "釋出時間"
                                                },
                                                {
                                                    "type": "divider"
                                                },
                                                {
                                                    "name": "url",
                                                    "type": "static",
                                                    "label": "購買連結"
                                                },
                                            ]
                                        }
                                    }
                                },
                                {
                                    "type": "button",
                                    "icon": "fa fa-pencil",
                                    "tooltip": "編輯",
                                    "level": "link",
                                    "actionType": "drawer",
                                    "drawer": {
                                        "position": "left",
                                        "size": "lg",
                                        "title": "編輯",
                                        "body": {
                                            "type": "form",
                                            "name": "sample-edit-form",
                                            "controls": [
                                                {
                                                    "type": "alert",
                                                    "level": "info",
                                                    "body": "因為沒有配置 api 介面,不能真正的提交哈!"
                                                },
                                                {
                                                    "type": "hidden",
                                                    "name": "id"
                                                },
                                                {
                                                    "type": "text",
                                                    "name": "text",
                                                    "label": "文字",
                                                    "required": true
                                                },
                                                {
                                                    "type": "divider"
                                                },
                                                {
                                                    "type": "image",
                                                    "name": "image",
                                                    "multiple": false,
                                                    "label": "圖片",
                                                    "required": true
                                                },
                                                {
                                                    "type": "divider"
                                                },
                                                {
                                                    "type": "date",
                                                    "name": "date",
                                                    "label": "日期",
                                                    "required": true
                                                },
                                                {
                                                    "type": "divider"
                                                },
                                                {
                                                    "type": "select",
                                                    "name": "type",
                                                    "label": "選項",
                                                    "options": [
                                                        {
                                                            "label": "漂亮",
                                                            "value": "1"
                                                        },
                                                        {
                                                            "label": "開心",
                                                            "value": "2"
                                                        },
                                                        {
                                                            "label": "驚嚇",
                                                            "value": "3"
                                                        },
                                                        {
                                                            "label": "漂亮",
                                                            "value": "緊張"
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    }
                                },
                                {
                                    "type": "button",
                                    "level": "link",
                                    "icon": "fa fa-times text-danger",
                                    "actionType": "ajax",
                                    "tooltip": "刪除",
                                    "confirmText": "您確認要刪除? 沒有配置 api 確定了也沒用,還是不要確定了",
                                    "api": ""
                                }
                            ]
                        }
                    ]
                }
            ]
​
        });
    })();
</script>
</body>
</html>

3.4 配置SpringBoot

編寫後臺訪問介面,這裡只寫了查詢的介面,編輯和刪除的可以自行編寫

package com.dxz.spider.web;
​
import com.dxz.spider.model.SmzdmModel;
import com.dxz.spider.service.SmzdmService;
import com.dxz.spider.web.SmzdmVO.GoodsVO;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RestController;
​
import java.util.List;
​
@RestController
@Slf4j
public class SmzdmWeb {
​
    @Autowired
    private SmzdmService smzdmService;
​
    @RequestMapping(value = "/getAll",method = RequestMethod.GET)
    public GoodsVO selectByPage(){
        log.info("請求什麼值得買的getAll介面");
        GoodsVO goodsVO = new GoodsVO();
        List<SmzdmModel> smzdmModels = smzdmService.selectAll();
        if (smzdmModels.size()>0){
            goodsVO.setStatus(0);
            goodsVO.setMsg("請求成功");
            goodsVO.setData(smzdmModels);
            return goodsVO;
        }else{
            return null;
        }
    }
}

編寫檢視控制器

package com.dxz.spider.config;
​
import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.CorsRegistry;
import org.springframework.web.servlet.config.annotation.ResourceHandlerRegistry;
import org.springframework.web.servlet.config.annotation.ViewControllerRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurationSupport;
​
@Configuration
public class WebConfig extends WebMvcConfigurationSupport {
    /**
     * 對映靜態檔案
     * @param registry
     */
    @Override
    protected void addResourceHandlers(ResourceHandlerRegistry registry) {
        registry.addResourceHandler("/static/**").addResourceLocations("classpath:/static/");
        super.addResourceHandlers(registry);
    }
​
    /**
     * 對映檢視
     * @param registry
     */
    @Override
    protected void addViewControllers(ViewControllerRegistry registry) {
        registry.addViewController("/smzdm").setViewName("smzdm");
        super.addViewControllers(registry);
    }
​
    /**
     * 跨域配置
     * @param registry
     */
    @Override
    protected void addCorsMappings(CorsRegistry registry) {
        registry.addMapping("/**")
                .allowedOrigins("http://localhost:8080")
                .allowedMethods("*")
                .allowedHeaders("*");
        super.addCorsMappings(registry);
    }
}
​

3.5 執行檢視

完事!