簡易的java爬蟲專案

阿新 • • 發佈：2020-07-03

簡易的java爬蟲專案

本專案僅供java新手學習交流，由於本人也是一名java初學者，所以專案中也有很多不規範的地方，希望各位高手不吝賜教，在評論區指出我的不足，我會虛心學習；

成果預覽：

在開始講述前想來展示一下專案的最終效果（下面是專案的執行效果和最終插入的資料）：

需求簡介：

我想要獲取一箇中醫網站中的所有的中藥材的資訊並將他們存入到我的資料庫中用來自己進行分析和學習。藥材的資訊包括：藥材名，別名，功能主治，性狀，味道，歸經，來源，用法用量。

頁面分析：

我們需要的資料都是以 “http://www.zysj.com.cn/zhongyaocai/yaocai”開頭的連結地址，如“ http://www.zysj.com.cn/zhongyaocai/yaocai_a/anchundan.html

”，“ http://www.zysj.com.cn/zhongyaocai/yaocai_a/anyou.html ”，“ http://www.zysj.com.cn/zhongyaocai/yaocai_l/longchuanhuajingye.html ”。我們需要的內容是該頁面內的以下部分：

通過觀察原始碼發現，我們主要需要或得到如下內容的標籤：

分析完畢後，我們就來進行程式碼操作做吧！

專案包含技術：

hutool工具包（用於將爬蟲獲取的資料插入資料庫）
druid
HttpClient(用於獲取頁面內容)
HtmlCleaner（用於在從獲取的頁面內容中提取有用的內容）

專案執行流程簡述：

我提供一個目標網站的連結，然後爬蟲自己進行分析，抓取並儲存資料。

原理概述：

首先，在講原理前需要展示一下專案的結構：

Bean包的封裝物件用於資料庫插入
URlProcessor包下的URlProcessor類用於獲取存放目標資料的頁面連結。
ParsingPage包下的ParsingPage類用於獲取頁面內容。
GetContent包下的GetContent類用於將從ParsingPage中獲取頁面內容中提取出我們想要的有效資料。
SqlUtil包下的SqlUtil類用於將獲取的有效資料存入資料庫中
StrUtil用於將得到的內容進行一些字串處理

下面是該專案的工作流程圖：

具體實現;

Main類下的main方法用於啟動該專案

package com;

import com.URLProcessor.URLProcessor;
 

public class Main {

    public static void main(String[] args) {

        //建立一個URLProcessor的物件並傳給該物件的getContent方法一個正確的連結，即可開始爬取

        new URLProcessor().getContent("http://www.zysj.com.cn/zhongyaocai/index__1.html");

    }

}

URlProcessor類:該類下有getContent和parsingPage方法，（這兩個方法的作用跟上面的Parsing Page和GetContent兩個類的作用相同。只是本人技術有限，沒能做到程式碼重構，歡迎各位小夥伴們指教。）該類通過了遞迴呼叫不斷地解析頁面獲取有效的url，如果url有效，有效url傳給ParsingPage類來進行進一步的操作。下面是

package com.URLProcessor;

import com.Bean.Herbal;

import com.GetContent.GetContent;

import com.ParsingPage.ParsingPage;

import com.SqlUtil.HerbalDao;

import com.alibaba.fastjson.JSON;

import com.alibaba.fastjson.JSONArray;

import org.apache.http.HttpEntity;

import org.apache.http.client.ClientProtocolException;

import org.apache.http.client.methods.CloseableHttpResponse;

import org.apache.http.client.methods.HttpGet;

import org.apache.http.impl.client.CloseableHttpClient;

import org.apache.http.impl.client.HttpClientBuilder;

import org.apache.http.util.EntityUtils;

import org.htmlcleaner.HtmlCleaner;

import org.htmlcleaner.TagNode;

import org.htmlcleaner.XPatherException;

import java.io.*;

import java.nio.charset.StandardCharsets;

import java.sql.SQLException;

import java.util.ArrayList;

import java.util.List;

/*

* url解析器，用於獲取url

* */

public class URLProcessor {

    //頁面獲取器，用於儲存已掃描的連結，防止重複掃描同一個連結和避免死迴圈掃描

    List<String> scannedUrl=new ArrayList<String>();

    //用於獲取頁面內容

    public String getContent(String url){

        String result=null;

        //獲取客戶端

        CloseableHttpClient httpClient= HttpClientBuilder.create().build();

        //建立get請求

        HttpGet httpGet=new HttpGet(url);

        //相響應模型

        CloseableHttpResponse response=null;

        try {

            //由客戶端執行get請求

            try {

                response=httpClient.execute(httpGet);

            } catch (IOException e) {

                e.printStackTrace();

            }

            //響應模型中獲取相應實體

            HttpEntity responseEntity=response.getEntity();

//            System.out.println("響應狀態為："+response.getStatusLine());

            if (responseEntity!=null){

//                System.out.println("響應長度："+responseEntity.getContentLength());

//                System.out.println("相應內容："+ EntityUtils.toString(responseEntity));

                //得到頁面內容

                result= EntityUtils.toString(responseEntity);

            }

        } catch (IOException e){

            e.printStackTrace();

        }

        //this.getUrl(result);

        //將獲取的頁面內容傳送給url提取器

        this.getUrl(result);

        return result;

    }

    //url提取器，用於從頁面內容中獲取頁面連結，並進行進一步的方法呼叫

    public void getUrl(String htmlContent){

        HtmlCleaner cleaner = new HtmlCleaner();

        TagNode node = cleaner.clean(htmlContent);

//        Object[] ns = node.getElementsByAttValue("href","http://weibo.com/qihuangdao",true,true);

//        for(Object on : ns) {

//            TagNode n = (TagNode) on;

//            System.out.println("\thref="+n.getAttributeByName("href")+", text="+n.getText());

//        }

        //獲取頁面中所有的超連結，並儲存在陣列裡

        Object[] ns = node.getElementsByName("a",true);

        //通過遍歷陣列，得到到有效的連結

        for(Object on : ns) {

            TagNode n = (TagNode) on;

//            System.out.println("href="+n.getAttributeByName("href")+", text="+n.getText());

            String href=n.getAttributeByName("href");

            //初步過濾，規定的正確的超連結格式必須為。html結尾，且前半部分為/zhongyaocai/符合條件，進一步執行

            if (!href.startsWith("http")&&href.endsWith(".html")&&href.startsWith("/zhongyaocai/")) {

                //通過拼合得到完整的連結

                href = "http://www.zysj.com.cn" + n.getAttributeByName("href");

                //判斷該超連結是否操作過，如果沒，繼續。

                if (!scannedUrl.contains(href)) {

                    //將該超連結存入到已掃描的連結List裡面。

                    scannedUrl.add(href);

                    //再次過濾，連結的前半部分必須為"http://www.zysj.com.cn/zhongyaocai/yaocai_"

                    if (href.startsWith("http://www.zysj.com.cn/zhongyaocai/yaocai_"))

                    {

                        //System.out.println("有效連結：" + href);

                        //是正確連結則分析並提取該頁面的需要資料

                        Herbal herbal=new GetContent().getContent(href);

                        try {

                            //將資料插入插入資料庫

                            new HerbalDao().mainAdd(herbal);

                        } catch (SQLException e) {

                            e.printStackTrace();

                        }

                    }

                    //分析該頁面的連結頁面，重複上述操作

                    this.getContent(href);

                }

            }

        }

    }

}

GetContent類：該類中的getContent方法通過頁面連結獲取頁面內容，並且從頁面內容中獲取正確的資料，並賦值給一個Herbal物件中，並且返回該物件。

package com.GetContent;

import com.Bean.Herbal;

import com.ParsingPage.ParsingPage;

import com.SqlUtil.HerbalDao;

import com.Utils.StrUtil;

import org.htmlcleaner.HtmlCleaner;

import org.htmlcleaner.TagNode;

import org.htmlcleaner.XPatherException;

import java.util.List;

import java.util.Map;

public class GetContent {

    public Herbal getContent(String url){

        //建立用於返回結果的物件

        Herbal herbalTemp=new Herbal();

        //呼叫ParsingPage類的getContent方法獲取頁面內容，並存放在頁面中。

        String content=new ParsingPage().getContent(url);

        //======下面是提取具體資料的方法======

        //為了規範資料格式可能會用到我自己編寫的StrUtil類進行格式化。

        HtmlCleaner cleaner=new HtmlCleaner();

        TagNode node=cleaner.clean(content);

        Object[] ns =node.getElementsByAttValue("class", "drug py", true, true);

        System.out.println("================================================================================================================");

        //拼音

        String py=null;

        if (ns.length>0){

            py=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("拼音","").replaceAll("①","");

            py=StrUtil.getAsciiToStr(py);

            herbalTemp.setPY(py);

            System.out.println("拼音："+py);

        }

        //拼音首字母

        String py_code= StrUtil.getBigStr(py);

        herbalTemp.setPy_code(py_code);

        System.out.println("拼音首字母："+py_code);

        //英文名

        ns = node.getElementsByAttValue("class", "drug ywm", true, true);

        String ename=null;

        if (ns.length>0){

            ename=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("英文名","").replaceAll("①","");

            herbalTemp.setEname(ename);

            System.out.println("英文名："+ename);

        }

        //名稱

        ns = node.getElementsByName("h1",true);

        String herbal=null;

        if (ns.length>0){

            herbal=((TagNode)ns[0]).getText().toString();

            herbalTemp.setHerbal(herbal);

            System.out.println("名稱:"+herbal);

        }

        //別名

        String alias=null;

        ns = node.getElementsByAttValue("class", "drug bm", true, true);

        if (ns.length>0){

            alias=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("別名","").replaceAll("①","");

            herbalTemp.setAlias(alias);

            System.out.println("別名："+alias);

        }

        //來源

        String source=null;

        ns = node.getElementsByAttValue("class", "drug ly", true, true);

        if (ns.length>0){

            source=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("來源","").replaceAll("①","");

            herbalTemp.setSource(source);

            System.out.println("來源："+source);

        }

        //性狀

        String trait=null;

        ns = node.getElementsByAttValue("class", "drug xz", true, true);

        if (ns.length>0){

            trait=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("性狀","").replaceAll("①","");

            herbalTemp.setTrait(trait);

            System.out.println("性狀："+trait);

        }

        //性味

        String xw=null;

        ns = node.getElementsByAttValue("class", "drug xw", true, true);

        if (ns.length>0){

            xw=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("性味","").replaceAll("①","");

        }

        Map<String,String> s=StrUtil.getNatureAndTaste(xw);

        //性味

        String nature=s.get("nature");

        //藥味

        String taste=s.get("taste");

        herbalTemp.setNature(nature);

        herbalTemp.setTaste(taste);

        System.out.println("藥味："+taste+"\n"+"藥性:"+nature);

        //歸經

        String meridians=null;

        ns = node.getElementsByAttValue("class", "drug gj", true, true);

        if (ns.length>0){

            meridians=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("歸經","").replaceAll("①","");

            herbalTemp.setMeridians(meridians);

            System.out.println("歸經："+meridians);

        }

        //功能主治

        String function=null;

        ns = node.getElementsByAttValue("class", "drug gnzz", true, true);

        if (ns.length>0){

            function=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("功能主治","").replaceAll("①","");

        }

        //功效  用於前面

        String efficacy=null;

        //主治  用於。。。。。

        String indications=null;

        List<String> stringList=StrUtil.getEfficacyAndIndication(function);

        if (stringList.size()==1){

            //功效  用於前面

             efficacy=stringList.get(0);

             herbalTemp.setEfficacy(efficacy);

        }else if (stringList.size()==2){

            //功效  用於前面

             efficacy=stringList.get(0);

            //主治  用於。。。。。

             indications=stringList.get(1);

            herbalTemp.setEfficacy(efficacy);

            herbalTemp.setIndications(indications);

        }

        System.out.println("功效："+efficacy+"\n"+"主治："+indications);

        //用法用量 usagedosage

        String usagedosage=null;

        ns = node.getElementsByAttValue("class", "drug yfyl", true, true);

        if (ns.length>0){

            usagedosage=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("用法用量","").replaceAll("①","");

            herbalTemp.setUsagedosage(usagedosage);

            System.out.println("用法用量："+usagedosage);

        }

        //炮製方法 process

        String process=null;

        ns = node.getElementsByAttValue("class", "drug pz", true, true);

        if (ns.length>0){

            process=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("炮製","").replaceAll("①","");

            herbalTemp.setProcess(process);

            System.out.println("炮製方法："+process);

        }

        //鑑別方法 identify

        ns = node.getElementsByAttValue("class", "drug jb", true, true);

        String identify=null;

        if (ns.length>0){

            identify=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("鑑別","").replaceAll("①","");

            herbalTemp.setIdentify(identify);

            System.out.println("鑑別方法："+identify);

        }

        //貯藏

        ns = node.getElementsByAttValue("class", "drug zc", true, true);

        String store=null;

        if (ns.length>0){

            store=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("貯藏","").replaceAll("①","");

            herbalTemp.setStore(store);

            System.out.println("貯藏："+store);

        }

        //mattersneedattention 注意事項

        ns = node.getElementsByAttValue("class", "drug jj", true, true);

        String mattersneedattention=null;

        if (ns.length>0){

            mattersneedattention=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("禁忌","").replaceAll("①","");

            herbalTemp.setMattersneedattention(mattersneedattention);

            System.out.println("注意事項："+mattersneedattention);

        }

        //contraindications 禁忌

        ns = node.getElementsByAttValue("class", "drug zy", true, true);

        String contraindications=null;

        if (ns.length>0){

            contraindications=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("注意","").replaceAll("①","");

            herbalTemp.setContraindications(contraindications);

            System.out.println("禁忌："+contraindications);

        }

        //remark 備註

        ns = node.getElementsByAttValue("class", "drug bz", true, true);

        String remark=null;

        if (ns.length>0){

            remark=((TagNode)ns[0]).getText().toString().replaceAll("&quot;","").replaceAll("備註","").replaceAll("①","");

            herbalTemp.setRemark(remark);

            System.out.println("備註："+remark);

        }

        return herbalTemp;

    }

//    public static void main(String[] args) {

//        new GetContent().getContent("http://www.zysj.com.cn/zhongyaocai/yaocai_b/banlangen.html");

//    }

}

ParsingPage類，通過連結獲取頁面的整體內容

package com.ParsingPage;

import com.URLProcessor.URLProcessor;

import org.apache.http.HttpEntity;

import org.apache.http.HttpVersion;

import org.apache.http.client.methods.CloseableHttpResponse;

import org.apache.http.client.methods.HttpGet;

import org.apache.http.impl.client.CloseableHttpClient;

import org.apache.http.impl.client.HttpClientBuilder;

import org.apache.http.util.EntityUtils;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

public class ParsingPage {

    public String getContent(String url){

        String result=null;

        //獲取客戶端

        CloseableHttpClient httpClient= HttpClientBuilder.create().build();

        //建立get請求

        HttpGet httpGet=new HttpGet(url);

        httpGet.setProtocolVersion(HttpVersion.HTTP_1_0);

        //相響應模型

        CloseableHttpResponse response=null;

        try {

            //由客戶端執行get請求

            try {

                response=httpClient.execute(httpGet);

            } catch (IOException e) {

                e.printStackTrace();

            }

            //響應模型中獲取相應實體

            HttpEntity responseEntity=response.getEntity();

            if (responseEntity!=null){

                result= EntityUtils.toString(responseEntity);

            }

        } catch (IOException e){

            e.printStackTrace();

        }

        return result;

    }

}

StrUtil類

package com.Utils;

import net.sourceforge.pinyin4j.PinyinHelper;

import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;

import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;

import net.sourceforge.pinyin4j.format.HanyuPinyinToneType;

import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;

import java.io.IOException;

import java.util.*;

public class StrUtil {

    public static String getBigStr(String string){

        StringBuilder result= new StringBuilder();

        if(string!=null){

            char[] chars=string.toCharArray();

            for (char aChar : chars) {

                if (aChar >= 'A' && aChar <= 'Z') {

                    result.append(aChar);

                }

            }

        }

        return result.toString();

    }

    public static List<String> getEfficacyAndIndication(String string){

        List<String> result=new ArrayList<String>();

        if (string!=null){

            if (string.contains("《")){

                int end=string.indexOf("：")+1;

                string=string.substring(end);

            }

            int index=-1;

            if(string.contains("用於")){

                index=string.indexOf("用於");

            }else if (string.contains("主治")){

                index=string.indexOf("主治");

            }else if (string.contains("主")){

                index=string.indexOf("主");

            }else if (string.contains("治")){

                index=string.indexOf("治");

            }

            if (index!=-1){

                result.add(string.substring(0,index));

                result.add(string.substring(index));

            }else {

                result.add(string);

            }

        }

        return result;

    }

    public static Map<String,String> getNatureAndTaste(String string){

        Map<String,String> stringList=new HashMap<>();

        if (string!=null){

            string=string.replaceAll("。","");

            String natureStr="寒熱溫涼平";

            String taste="辛甘酸苦鹹";

            char[] chars=string.toCharArray();

            String nTemp="";

            String tTemp="";

            for (char c:chars){

                for (char n:natureStr.toCharArray()){

                    if (c==n){

                        nTemp+= n + "、";

                    }

                }

                for (char t:taste.toCharArray()){

                    if (t==c){

                        tTemp+=t+"、";

                    }

                }

            }

            if (nTemp.length()!=0){

                nTemp=nTemp.substring(0,nTemp.length()-1);

            }

            if (tTemp.length()!=0){

                tTemp=tTemp.substring(0,tTemp.length()-1);

            }

            stringList.put("nature", nTemp);

            stringList.put("taste", tTemp);

        }

        return stringList;

    }

    public static String getAsciiToStr(String string){

        String result="";

        if (string!=null){

            List<Integer> list=getNumber(string);

            int i=0;

            for (char c:getStr(string).toCharArray()){

                if (c!=';'){

                    result+=c;

                }else {

                    result+=(char)(list.get(i)).intValue();

                    i++;

                }

            }

        }

        return result;

    }

    public static List<Integer> getNumber(String string){

        List<Integer> numberList=new ArrayList<Integer>();

        if (string!=null){

            StringBuilder temp= new StringBuilder();

            for (int i=0;i<string.length();i++){

                if (string.charAt(i)<'0'||string.charAt(i)>'9'){

                    if (!temp.toString().equals("")){

                        numberList.add(Integer.parseInt(temp.toString()));

                    }

                    temp = new StringBuilder();

                }else{

                    temp.append(string.charAt(i));

                }

            }

            int sum=0;

            for (Integer integer:numberList){

                sum=sum+integer;

            }

        }

        return numberList;

    }

    public static String getStr(String string){

        String result="";

        if (string!=null){

            char[] chars=string.toCharArray();

            for (char c:chars){

                if (c<'0'||c>'9'){

                    result+=c;

                }

            }

            result=result.replaceAll("&#","").replaceAll(" ","");

        }

        return result;

    }

    //大藍根、大青根

    public static List<String> getAlias(String string){

        List<String> list=new ArrayList<>();

        if (string!=null){

            int index=-1;

            if (string.contains("（") ){

                index=string.indexOf("（");

                string=string.substring(0,index);

            }else if (string.contains("(")){

                index=string.indexOf("(");

                string=string.substring(0,index);

            }else if(string.contains("《")){

                index=string.indexOf("《");

                string=string.substring(0,index);

            }

            String[] result=string.split("、");

            //==============

            if (getSpecial(string)!=null){

                for (String s:getSpecial(string)){

                    if (getSpecial(s)!=null){

                        List<String> temp=getAlias(s);

                        for (String s2:temp){

                            list.add(s2);

                        }

                    }else {

                        list.add(s);

                    }

                }

            }else {

                list.add(string);

            }

        }

        return list;

    }

    public static List<String> getMeridian(String string){

        List<String> result=new ArrayList<>();

        if (string!=null){

            if (string.contains("《")){

                int end=string.indexOf("：")+1;

                string=string.substring(end);

            }

            string=string.replaceAll("入","").replaceAll("。","")

                    .replaceAll("二","").replaceAll("三","")

                    .replaceAll("四","").replaceAll("五","")

                    .replaceAll("經","");

            if (getSpecial(string)!=null){

                for (String s:getSpecial(string)){

                    if (getSpecial(s)!=null){

                        List<String> temp=getMeridian(s);

                        for (String s2:temp){

                            result.add(s2+"經");

                        }

                    }else {

                        result.add(s+"經");

                    }

                }

            }else {

                result.add(string+"經");

            }

        }

        return result;

    }

    public static String[] getSpecial(String string){

        if (string.contains("、")){

            return string.split("、");

        }else if (string.contains(",")){

            return string.split(",");

        }else if (string.contains("，")){

            return string.split("，");

        }else if (string.contains("；")){

            return string.split("；");

        }else {

            return null;

        }

    }

    public static void main(String[] args) {

        String str="筋活血，涼血，止血。主治貧血或失血過多，風溼痛，跌打損傷，胃痛，癤瘡癰腫，皮炎，溼疹。";

        List<String>a=getEfficacyAndIndication(str);

        for (String s:a){

            System.out.println(s); }

//        System.out.println(ChineseToFirstLetter("夻口巴"));

        //        List<String> strings=StrUtil.getEfficacyAndIndication("");

//        for (String s :strings){

//            System.out.println(s);

//        }

//        System.out.println(ChineseToFirstLetter("你是一個豬"));

//        Map<String,String> s=StrUtil.getNatureAndTaste("苦，寒。");

//        System.out.println("藥味："+s.get("taste")+"藥性："+s.get("natureStr"));

        //á

//        String s="Bái Sháo";

//        System.out.println(getAsciiToStr(s));

    }

    public static String ChineseToFirstLetter(String c) {

        String string = "";

        try {

            char b;

            int a = c.length();

            for (int k = 0; k < a; k++) {

                b = c.charAt(k);

                String d = String.valueOf(b);

                String str = converterToFirstSpell(d);

                String s = str.toUpperCase();

                String g = s;

                char h;

                int j = g.length();

                for (int y = 0; y <= 0; y++) {

                    h = g.charAt(0);

                    string += h;

                }

            }

        }catch (Exception e){

            e.printStackTrace();

        }

        return string;

    }

    public static String converterToFirstSpell(String chines) {

        String pinyinName = "";

        try{

            char[] nameChar = chines.toCharArray();

            HanyuPinyinOutputFormat defaultFormat = new HanyuPinyinOutputFormat();

            defaultFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE);

            defaultFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);

            for (int i = 0; i < nameChar.length; i++) {

                String s = String.valueOf(nameChar[i]);

                if (s.matches("[\\u4e00-\\u9fa5]")) {

                    try {

                        String[] mPinyinArray = PinyinHelper.toHanyuPinyinStringArray(nameChar[i], defaultFormat);

                        pinyinName += mPinyinArray[0];

                    } catch (BadHanyuPinyinOutputFormatCombination e) {

                        e.printStackTrace();

                    }

                } else {

                    pinyinName += nameChar[i];

                }

            }

        }catch (NullPointerException e){

            e.printStackTrace();

        }

        return pinyinName;

    }

}

簡易的java爬蟲專案

簡易的java爬蟲專案本專案僅供java新手學習交流，由於本人也是一名java初學者，所以專案中也有很多不規範的地方，希望各位高手不吝賜教，在評論區指出我的不足，我會虛心學習；

基於正則表示式的Java爬蟲專案

　　需求分析：抓取新聞網前100條新聞標題以及對應的網頁新聞的連結　　編者這裡以齊魯工業大學校園新聞網為示例，利用Java網路程式設計、多執行緒、正則表示式來實現對於新聞內容的抓取。（注：由於校園網限制，

linux（center OS7）安裝JDK、tomcat、mysql 搭建java web專案執行環境

一、安裝JDK 1.解除安裝舊版本或者系統自帶的JDK （1）列出所有已安裝的JDK 　　rpm -qa | grep jdk

三個python爬蟲專案例項程式碼

這篇文章主要介紹了三個python爬蟲專案例項程式碼,文中通過示例程式碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下

Docker部署Python爬蟲專案的方法步驟

1) 首先安裝docker： # 用 yum 安裝並啟動 yum install docker -y && systemctl start docker 2) 下載自定義映象需要用到的基礎映象：

Java Spring專案國際化(i18n)詳細方法與例項

Spring國際化概述國際化基本規則國際化資訊”也稱為“本地化資訊”，一般需要兩個條件才可以確定一個特定型別的本地化資訊，它們分別是“語言型別”和“國家/地區的型別”。如中文字地化資訊既有中國大陸地區的中文

在idea中將建立的java web專案部署到Tomcat中的過程圖文詳解

在idea中將建立的java web專案部署到Tomcat中採用的工具idea 2018.3.6 Tomcat7 1.先建立第一個新專案secondweb（注意勾選JavaEE下的web Application(4.0),視窗下的version對應為4.0，並且保證create web.xml已經被勾

JAVA maven專案使用釘釘SDK獲取token、使用者

本文介紹了JAVA maven專案使用釘釘SDK獲取token、使用者，分享給大家，具體如下：

測試同學動手搭個簡易web開發專案

技術棧 node.js, vue.js, axios, python, django, orm, restful api, djangorestframework, mysql, nginx, jenkins.

JAVA初級專案——實現圖書管理系統

　　今天博主再給大家分享一個小專案：MiNi圖書管理系統。用的是Java語言開發的，程式碼不多，大概260行左右吧，系統是實現圖書的新增圖書、刪除圖書、借閱圖書、歸還圖書、檢視圖書等簡單的功能（後附原始碼）！

使用Spring Boot搭建Java web專案及開發過程圖文詳解

一、Spring Boot簡介 Spring Boot是由Pivotal團隊提供的全新框架，其設計目的是用來簡化新Spring應用的初始搭建以及開發過程。該框架使用了特定的方式來進行配置，從而使開發人員不再需要定義樣板化的配置。通過這種

用Scrapy框架開發的一個爬蟲專案

　　為什麼要單獨開這麼一篇隨筆，主要還是在上一篇隨筆\"一個小爬蟲的整體解決方案\"（https://www.cnblogs.com/qinyulin/p/13219838.html）中沒有著重介紹Scrapy,包括後面幾天也對程式碼做了Review，優化了一些效能

java web專案獲取專案路徑

1.方法一，獲取專案執行時的真實類路徑 /* private static Logger logger = Logger.getLogger(BookController.class); */

Java Web專案的建立——IDEA+Maven+Tomcat

怎麼在Maven工程裡面建立Java Web專案，上篇隨筆已經具體寫了Maven的配置過程，下面具體談談專案建立的步驟...

Java爬蟲技術框架之Heritrix框架詳解

Heritrix是一個由Java開發的開源Web爬蟲系統，用來獲取完整的、精確的站點內容的深度複製，

Scrapy_redis爬蟲專案

Scrapy 和 scrapy-redis的區別 scrapy是一個Python爬蟲框架，爬取效率極高，具有高度定製性，但是不支援分散式。而scrapy-redis一套基於redis資料庫、執行在scrapy框架之上的元件，可以讓scrapy支援分散式策略，Slav

Java web專案啟動Tomcat報錯解決方案

點選執行專案時顯示 A Java Exception has occurred. \'Starting Tomcat v9.0 Server at localhost\' has oncountered a problem.

91、Beego框架之爬蟲專案——2020年08月02日19:57:16

91、Beego框架之爬蟲專案 2020年08月02日15:21:32 1、建立資料庫 movie.sql CREATE TABLE `movie_info` (

2020年最新Java實戰專案智慧商貿系統（Spring+SpringDataJPA+SpringMVC+EayseUI+Maven）

智慧商貿專案-整合SpringDataJpa 一、SpringDataJpa概念是Spring的一個子框架，是JPA規範的再次封裝抽象

Jenkins下用DockerFile自動部署Java(SpringBoot)專案

Jenkins下用DockerFile自動部署Java(SpringBoot)專案，簡單自用，勿噴一、Jenkins構建Java(SpringBoot)專案

簡易的java爬蟲專案

簡易的java爬蟲專案

相關推薦