PHP 匹配下載網路資源

阿新 • • 發佈：2018-12-19

<?php
/**
 * 匹配下載網路資源
 */
header("Content-type: text/html;charset=utf-8");
error_reporting(E_ALL ^E_NOTICE^E_WARNING);
class DownloadFileFromWebsite{
	private $img_ext_arr=array('WEBP','BMP','JPG','GIF','JPEG','PSD','EPS','PNG','RAW','EMF','ICO');
	public  $file_dir;
	public  $matches_x;
	function __construct($web_url,$root_dir){
       if(empty($web_url) || empty($root_dir)){
		   exit('引數錯誤！');
	   }
       $this->file_dir=$this->creatDirByWebUrl($web_url,$root_dir);
	   $this->matches_x=$this->pregMatchUrl($web_url);
    }
	
	//通過地址獲取檔案內容
	protected  function getContentByUrl($url,$param_arr)
	{
		$param_sring=http_build_query($param_arr);
		return file_get_contents($url.$param_sring);
	}
	//下載檔案
	protected function dowmload_file($file_url, $save_to,$logDir,$encodeName)
	{
		$file_name=basename($file_url);
		$content = file_get_contents($file_url);
		$result=file_put_contents($save_to, $content);
		if($result){
			$status='下載成功！';
			$state='';
		}else{
			unlink('./leikesasi_img/img/'.$file_name);//刪除空的錯誤檔案
			$status='下載失敗！！！';
			$state='[✘]';
		}
		//根據系統進行配置
		$encode = stristr(PHP_OS, 'WIN') ? 'GBK' : 'UTF-8';
		$arr_name_ext=explode('.',basename($save_to));
		$filename=iconv($encode,'UTF-8',$encodeName);
		$ext=$arr_name_ext[1];
		$log_record=$state.'圖片檔案:'.$file_name.'   ------  ['.$filename.'.'.$ext.']   ------   '.date('Y-m-d H:i:s',time()).' ------ 大小：'.round($result/1024,2).'kb ------'.$status.PHP_EOL;
		echo $log_record.'<br>';
		file_put_contents($logDir,$log_record,FILE_APPEND);
	}
	//根據訪問地址建立資料夾目錄
	private function creatDirByWebUrl($url,$root_dir){
		preg_match_all('/([http|https]*):\/\/*(.*?\/.*)/',$url,$match_web_url);
		$dir_path=str_replace('/','_',$match_web_url[2][0]);
		$this->file_dir=$root_dir.'/'.$match_web_url[1][0]."_".$dir_path;
		if(!is_dir($this->file_dir.'/img')){
			$staue=mkdir($this->file_dir.'/img',0777,true);
			if(!$staue){
				echo $dir_path.'目錄建立失敗！<br/>';
				return false;
			}
		}
		return $this->file_dir;
	}
	//根據設定規則，匹配要下載的資源
	public function pregMatchUrl($url){
		$param_arr=array();
		$string_html=$this->getContentByUrl($url,$param_arr);
		preg_match_all('/<img src=["|\']([http|https].*)["|\'] alt=["|\'](.*)["|\'].*\/*?>/U',$string_html,$matches_x);	
		return $matches_x;
	}
	
	//處理檔案字尾，並下載資源
	public function renameDownloadFiles(){
		foreach($this->matches_x[1] as $k=>$v){
			$arr=explode('.',basename($v));
			$ext=strtoupper(end($arr));
			//根據系統進行配置
			$encode = stristr(PHP_OS, 'WIN') ? 'GBK' : 'UTF-8';
			$this->matches_x[2][$k] = iconv('UTF-8', $encode, $this->matches_x[2][$k]);
			if(in_array($ext,$this->img_ext_arr)){
				$this->dowmload_file($v,$this->file_dir.'/img/'.$this->matches_x[2][$k].'.'.strtolower($ext),$this->file_dir.'/log.txt',$this->matches_x[2][$k]);
			}else{
			   //處理特殊字尾，排除網站字尾干擾  如 jpg!|   
[email protected]
				$str_ext='['.join('|',$this->img_ext_arr).']{1,}';
				preg_match_all('/'.$str_ext.'/',strtoupper($ext),$match_all);
				if($match_all){
					$ext_new=$match_all[0][0];
				}
				$this->dowmload_file($v,$this->file_dir.'/img/'.$this->matches_x[2][$k].'.'.strtolower($ext_new),$this->file_dir.'/log.txt',$this->matches_x[2][$k]);
			}
		}
	}
}
//$web_url='http://www.duok******.com/';
$pageCount=20;
for($a=1;$a<=$pageCount;$a++){
	$web_url='http://www.duok*****.com/list/1-'.$a;
	$root_dir='./DuoKan_DownloadFile';
	$obj=new DownloadFileFromWebsite($web_url,$root_dir);
	$obj->renameDownloadFiles($matches_x,$img_ext_arr,$file_dir);
}

PHP 匹配下載網路資源

<?php /** * 匹配下載網路資源 */ header("Content-type: text/html;charset=utf-8"); error_reporting(E_ALL ^E_NOTICE^E_WARNING); class Dow

Androidstudio專案連線下載網路資源

現如今開發的Android專案基本都需要進行到網路中進行資源瀏覽並下載。已經很少有僅靠單機操作的手機應用APP。本次文章我們主要介紹Android專案中如何進行連線網路操作，並通過一個現實網路圖片的案例進行解釋。聯網操作主要用到HttpURLCon

UnityWebRequest下載網路資源，支援斷點續傳、多檔案同時下載

今晚研究了下關於Unity中檔案下載的斷點續傳功能，Unity已經封裝了網路資源的下載API，當然也可以使用C#中的API進行資源下載。首先說一下斷點續傳的大概思路： 1、下載一個檔案時，向伺服器傳送下載請求，傳送一個Range的報文，指定從檔案資料的什麼位

Unity下載並解壓網路資源

using ICSharpCode.SharpZipLib.Zip; using SimpleFramework; using System; using System.Collections; using System.Collections.Generic; using System.Dia

【Python爬蟲】使用urllib.request下載已知連結的網路資源

如果有這樣一個場景，我們的EXCEL某一列記錄了好多（圖片、視訊、音訊）連結A，另外一列記錄了連結名稱B，現在我們想要自動下載這些連結的檔案，我們應該怎樣處理？ 1.迴圈去excel取值,將A和B存入到一個二維列表中 2.根據連結字尾不同情況（.jpg,.mp4,mp3等）用urllib.req

Android下載網路圖片資源

從網路下載圖片資源在各種APP中很常見，比如很多APP都有廣告輪番功能，這些廣告圖片通常是從伺服器獲取的，這裡就需要從伺服器上下載圖片資源並顯示。一、獲取網路圖片並下載到本地：程式碼：MainActivity.java： package com.example.and

android 之使用多執行緒中的AsyncTask實現下載網路圖片資源

前臺顯示:<?xml version="1.0" encoding="utf-8"?> <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android" android:

（轉）Unity3d使用心得（2）：Unity3d 動態下載動畫資源——AnimationClip 的使用 - 斯瑪特琦

通過 dsm color 解決 con content 資源 sse popu 引言：在使用 Unity3d 開發微端、或者網頁遊戲的時候常常須要將資源打包成 AssetBundle ，然後通過 www 的方式動態的下載資源。今天要分享的是我再動態下

php 打包下載服務器上指定目錄的文件

encoding osi esc sts enc 參數 content arc inux 參考代碼如下： //獲取文件列表 function list_dir($dir){ $result = array(); if (is_dir($dir))

記錄一次php占用系統資源過高的問題

連接數 images cse 連接 12.1 一次 elastics 對比 ado 本地環境：redhat6.7系統。 nginx1.12.1 ，php7.1.0，代碼使用yii2框架問題：本地的web站需要用到elasticsearch服務。當php使用本地服務器搭建

2018年4月下載中心資源下載TOP榜

資料下載中心下載排行系統運維開發技術 2018年4月1日至4月30日下載中心資源下載TOP榜A、系統運維NO.1下載數：716資源標題：Windows Server 2016管理員操作手冊資源地址：http://down.51cto.com/data/2445174NO.2下載數：

2018年5月第一二周下載中心資源下載TOP榜

系統運維編程開發 web開發數據庫考試認證 2018年5月01日至2018年5月14日下載中心資源下載TOP榜 A、系統運維 NO.1下載數：282資源標題：Linux操作系統(上百個新特性)_紅帽RHEL7新特性與功能匯總V1.0資源地址：http://down.51cto.com

2018年5月第三周下載中心資源下載TOP榜

系統運維開發技術前端開發數據庫人工智能 2018年5月14日至2018年5月20日下載中心資源下載TOP榜A、系統運維NO.1下載數：252資源標題：微軟產品安裝介質下載地址清單V1.0資源地址：http://down.51cto.com/data/2447111NO.2下載數：1

2018年5月第四周下載中心資源下載TOP榜

軟考系統運維數據庫虛擬化雲計算 2018年5月21日至2018年5月27日下載中心資源下載TOP榜 A、系統運維NO.1下載數：171資源標題：Linux常用命令總結資源地址：http://down.51cto.com/data/2447383NO.2下載數：86資源標題：shel

2018年5月下載中心資源下載TOP榜

軟考系統運維人工智能數據庫編程開發 2018年5月1日至5月31日下載中心資源下載TOP榜 NO.1 下載數：1028資源標題：2018年上半年網絡工程師上午真題及答案解析資源地址：http://down.51cto.com/data/2447719NO.2 下載數：739 資源標

6月第1周下載中心資源下載TOP榜

系統運維軟考數據庫 python Java 2018年6月1日至2018年6月10日下載中心資源下載TOP榜 A、系統運維NO.1下載數：324資源標題：華為內部Linux華為內部Linux培訓資料資源地址：http://down.51cto.com/data/2448011NO.2下

powershell 2.0 下載 bootcdn 資源

thp files java serialize cti ron 2.0 item false powershell 2.0 , win 7 以上操作系統可用。 [System.Reflection.Assembly]::LoadWithPartialName("Syst

PHP 實現下載郵件功能，保存為eml格式

eml info html 功能騰訊企業郵箱 toe 技術處理 htm 引言：因為業務需求，需要獲取訂單的往來郵件，其中涉及到下載郵件功能。由於騰訊企業郵箱API接口不支持下載郵件功能，只能自己寫。獲取郵件內容，保存到數據庫步驟略：網上有很多現成的，利用ima

php 匹配標籤內的文字內容 preg_match_all strip_tags

$str = " BT特權說明： 1.充值比例1:500，首充送雙倍鑽石 2.上線贈送滿級VIP，18888鑽石，100W金幣

Springboot下載靜態資源-excel模板

最近專案中要用到上傳員工資訊，在填寫員工資訊，那麼首先要定義模板且放在網站上，就要用到下載靜態資源，一般情況下實現都是如下： 1、獲取模板的路徑；（這裡的路徑經常會搞錯） 2、建立該路徑的輸入流； 3、設定response的標頭檔案格式； 4、將輸入流中的內容寫進resp

PHP 匹配下載網路資源

相關推薦