1. 程式人生 > >爬取網頁

爬取網頁

read ava str ring java 代碼 edr http pri

下面以爬取360瀏覽器網頁為例,代碼具有通用性,改變網頁路徑即可

代碼如下

package 爬取網頁;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.MalformedURLException;
import java.net.URL;

public class Main {
public static void main(String[] args) throws Exception {
URL url=new URL("https://hao.360.cn/?h_lnk");//獲取網址
BufferedReader bufferedReader=new BufferedReader(new InputStreamReader(url.openStream(),"utf-8"));//根據網頁編碼方式
String msg=null;
BufferedWriter bufferedWriter=new BufferedWriter(new OutputStreamWriter(new FileOutputStream("C:/a/360.html"),"utf-8"));

while((msg=bufferedReader.readLine())!=null) {
//System.out.println(msg);
bufferedWriter.append(msg);
bufferedWriter.newLine();
}
bufferedWriter.flush();
bufferedReader.close();
bufferedWriter.close();
}
}

運行代碼後在C盤的a文件夾裏面會有360.html文件,點擊進入360網頁

技術分享圖片

改變文件格式為txt可以查看網頁源代碼

技術分享圖片

爬取網頁