Java-基於URL流的網頁圖片爬蟲
阿新 • • 發佈:2019-01-03
技巧
在網頁元素中以img開頭的表示圖片的元素,src=“內容"字串裡的內容就是圖片的資源地址
如:
它右鍵審查元素可以看到img data-v-0d738edb=”" src=“https://avatar.csdn.net/9/9/A/1_preyhard.jpg?1543834708” alt="" class=“head”
步驟
1.建立URL流獲取整個網頁的資訊
2.從資訊中篩選出圖片的資源地址,再分別建立URL流獲取圖片資料存到新的圖片檔案中
程式碼
package westos2; import java.io.*; import java.net.HttpURLConnection; import java.net.MalformedURLException; import java.net.URL; import java.net.URLConnection; import java.util.Random; public class client { public static void main(String[] args) throws IOException { Random random = new Random ( ); HttpURLConnection connection = (HttpURLConnection) new URL ("https://tieba.baidu.com/p/2256306796?red_tag=1781367364").openConnection(); InputStream in = connection.getInputStream(); BufferedReader buffer = new BufferedReader ( new InputStreamReader ( in ) ); while (true){ String s = buffer.readLine ( ); if (s==null){ break; }else { if (s.contains ( "<img" )){ show(s,random); } } } } private static void show(String s,Random random) throws IOException { int imgindex = s.indexOf ( "<img" ); String s1 = s.substring ( imgindex ); int srcindex = s1.indexOf ( "src=" ); String s2 = s1.substring ( srcindex+5); int yinindex = s2.indexOf ( "\"" ); String s3 = s2.substring ( 0, yinindex ); System.out.println (s3 ); if (s3.startsWith ( "http" )){ HttpURLConnection url = (HttpURLConnection)new URL ( s3 ).openConnection ( ); InputStream in = url.getInputStream ( ); String i = random.nextInt ( )+""; FileOutputStream out = new FileOutputStream ( "C:\\Users\\Administrator\\Desktop\\jpg\\" + i + ".png" ); while (true){ byte[] bytes = new byte[1024 * 8]; int read = in.read ( bytes ); if (read==-1){ break; } out.write ( bytes,0,read ); } } String str = s2.substring ( yinindex ); if (str.contains ( "<img" )){ show ( str,random ); } } }
結果