無搜尋條件根據url獲取網頁資料(java爬取網頁資料)

阿新 • • 發佈：2018-11-30

jsoup jar包

<dependency>
 	<groupId>org.jsoup</groupId>
  	<artifactId>jsoup</artifactId>
  	<version>1.11.3</version>
</dependency>

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

import org.apache.http.HttpStatus;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


/**
     * 根據URL獲得所有的html資訊
     * @param url
     * @return
	 * @throws IOException 
	 * @throws ClientProtocolException 
     */

	public static String getHtmlByUrl(String url) throws ClientProtocolException, IOException{
        String html = null;
        //建立httpClient物件
        CloseableHttpClient httpClient = HttpClients.createDefault();
        //以get方式請求該URL
        HttpGet httpget = new HttpGet(url);
        CloseableHttpResponse response = httpClient.execute(httpget);
        try {
            //得到responce物件
            //HttpResponse responce = httpClient.execute(httpget);
            //返回碼
            int resStatu = response.getStatusLine().getStatusCode();
            if (resStatu==HttpStatus.SC_OK) {//200正常  其他就不對
                //獲得輸入流
                InputStream entity = response.getEntity().getContent();
                if (entity!=null) {
                    //通過輸入流轉為字串獲得html原始碼  注：可以獲得實體，然後通過 EntityUtils.toString方法獲得html
                	//但是有可能出現亂碼，因此在這裡採用了這種方式
                    html=getStreamString(entity);
                    // System.out.println(html);
                }
            }
        } catch (Exception e) {
            //System.out.println("訪問【"+url+"】出現異常!");
            e.printStackTrace();
        } finally {
            //httpClient.getConnectionManager().shutdown();
            response.close();
            try {
				httpClient.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
        }
        return html;
    }

    /**
    * 將一個輸入流轉化為字串
    */
    public static String getStreamString(InputStream tInputStream){
        if (tInputStream != null){
        try{
	        BufferedReader tBufferedReader = new BufferedReader(new InputStreamReader(tInputStream,"gb2312"));
	        StringBuffer tStringBuffer = new StringBuffer();
	        String sTempOneLine = new String("");
        while ((sTempOneLine = tBufferedReader.readLine()) != null){
                tStringBuffer.append(sTempOneLine+"\n");
        }
            return tStringBuffer.toString();
        }catch (Exception ex){
            ex.printStackTrace();
        }
       }
         return null;
    }


 public static void main(String[] args) throws ClientProtocolException, IOException {
    	String htmlByUrl = getHtmlByUrl(url);
    	if(htmlByUrl!=null&&!"".equals(htmlByUrl)) {
            //解析內容
    		Document doc = Jsoup.parse(htmlByUrl);
        }
	}

無搜尋條件根據url獲取網頁資料(java爬取網頁資料)

jsoup jar包 <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.11.3<

無搜尋條件根據url獲取網頁資料

jsoup jar包 <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>

有搜尋條件根據url抓取網頁資料(java爬取網頁資料)

最近有一個任務抓取如下圖的網頁資料要獲取前一天的資料進行翻頁抓取資料並存入資料庫如果就只是抓取當前頁的資料沒有條件和翻頁資料這個就比較簡單了但是要選取前一天的資料,還有分頁資料一開始的思路就想錯了(開始想的是觸發查詢按鈕和

JAVA爬取網頁內容

之前的文章沒有整理好，這邊重新標註一下，有需要可以到我的個人部落格看完整的三篇文章。在此之前，大家先了解一個Jsoup，一個html頁面解析的jar包。如果你上面的Jsoup看完了。前期準備工作：需要去檢視一下要爬的網頁的結構，對自己要爬的資料的標籤要熟悉。操作：在頁面上按F

android根據Url獲取訪問網頁的原始碼

/** * 獲取HTML資料 * * */ public class HtmlService { public static String getHtml(String path

iOS開發之視頻根據url獲取第一幀圖片,獲取任一幀圖片

keyword rac onerror 根據 ati parameter all ger mage + (UIImage*) thumbnailImageForVideo:(NSURL *)videoURL atTime:(NSTimeInterval)time { AV

Java根據URL獲取視訊時長以及大小

import java.io.File; import java.io.IOException; import it.sauronsoftware.jave.Encoder; import it.sauronsoftware.jave.EncoderException; import it.sau

android根據URL獲取一級域名

轉載請註明出處：https://blog.csdn.net/u011038298/article/details/84619128 /** * 根據傳入的URL獲取一級域名 * * @param url * @return

根據Url獲取json，再根據key獲取值以及json字串轉json物件

//如果你得Url能返回json。可以使用以下程式碼。/** * 根據url獲得json * @param url * @return */ publi

java 根據URL獲取時長，視訊大小

/** * 根據網路路徑獲取時長 * @author ZhangShaobo * */ public class VideoInfoTest { /** * 獲取網路檔案，暫存為臨時檔案

根據url獲取請求中的cookie

建立網路連線， HttpURLConnection conn = (HttpURLConnection) imgUrl.openConnection() 獲取getHeaderFields：

python獲取網頁精準爬取數據

imp url pil 簡單 vid req pen pro utf import reimport urllib.requeststring=‘<div class="name">(.*?)</div>‘huo=urllib.request.url

Java爬蟲學習《一、爬取網頁URL》

導包，如果是用的maven，新增依賴： <dependency> <groupId>commons-httpclient</groupId> <artifactId>commons

Python爬取網頁的圖片資料

本案例是基於PyCharm開發的，也可以使用idea。在專案內新建一個python檔案TestCrawlers.py TestCrawlers.py # 匯入urllib下的request模組 import urllib.request # 匯入正則匹配包 import re

你以為Python爬蟲只能爬取網頁資料嗎？APP也是可以的呢！

摘要大多數APP裡面返回的是json格式資料，或者一堆加密過的資料。這裡以超級課程表APP為例，抓取超級課程表裡使用者發的話題。 1 抓取APP資料包方法詳細可以參考這篇博文：http://my.oschina.net/jhao104/blog/605963 得到超級課程表

爬蟲——爬取網頁資料存入表格

最近由於個人需要，從相關書籍以及網上資料進行爬蟲自學，目標網址為http://mzj.beijing.gov.cn，對其內容進行整理篩選，存入excel格式。首先是對錶格的內容進行設定，編碼格式定義為utf-8，新增一個sheet的表格，其中head為表頭的內容，定義之後，利用sheet.wr

JAVA爬蟲爬取網頁資料資料庫中,並且去除重複資料

pom檔案  <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId&

cheerio爬取網頁資料，儲存到MySQL資料庫

最近在做物流專案成本分析，需要爬取柴油價格資料，使用到了cheerio，cheerio實現了jQuery核心的一個子集。以下為爬取程式碼。 //getHtml.js，獲取HTML頁面資料 var http = require("http"); function gethtml(url,

Python爬取網頁資料並匯入表格

import requests import time import random import socket import http.client from bs4 import BeautifulSoup import csv def getContent(url

python初學-爬取網頁資料

python初學-爬取網頁資料 1,獲取網頁原始碼 import urllib url = 'http://www.163.com' wp = urllib.urlopen(url) file_content = wp.read() print file_content 2,

無搜尋條件根據url獲取網頁資料(java爬取網頁資料)

相關推薦