什麽?php也能做爬蟲?
阿新 • • 發佈:2018-02-13
wrapper gem 如果 key pro jquery 計算器 discuss login
php爬蟲代碼(爬去我的OJ題庫為例)
<?php for ($i=1000;;$i++){ $url = "http://localhost/JudgeOnline/problem.php?pid=$i"; //這兒填OJ地址 $info=file_get_contents($url); preg_match(‘|<title>(.*?)<\/title>|i‘,$info,$m); //獲取標題 $title[$i][1]=$m[1]; if (!$m[1]) break; //如果沒有標題,說明這題不存在,可以跳過 preg_match(‘|<h1 class="text-center">(.*?)<\/h1>|i‘,$info,$m); //獲取題目標題信息 $title[$i][1]=$m[1]; } echo "A total of "; echo $pnum=$i-1000; //題目總數 echo " problems<br>"; ?> <?php for ($i=1000;$i<=(999+$pnum);$i++){ $fh= file_get_contents("http://localhost/JudgeOnline/problem.php?pid=$i"); //echo $fh; echo "Get P$i "; echo ‘"‘; echo $title[$i][1]; echo ‘"<br>‘; unlink("$i.html"); $myfile = fopen("$i.html", "w"); //存放文件至 題目編號.html fwrite($myfile, $fh); fclose($myfile); } ?>
網頁端運行結果:
A total of 21 problems
Get P1000 "1000 : A+B問題"
Get P1001 "1001 : 求累加和"
Get P1002 "1002 : n的階乘"
Get P1003 "1003 : 階乘和"
Get P1004 "1004 : 第k小整數"
Get P1005 "1005 : 求a/b的高精度值"
Get P1006 "1006 : 麥森數mason"
Get P1007 "1007 : 旅行"
Get P1008 "1008 : 團夥(team)"
Get P1009 "1009 : 打擊犯罪"
Get P1010 "1010 : 家譜(gen)"
Get P1011 "1011 : 搭配購買"
Get P1012 "1012 : 合並果子"
Get P1013 "1013 : 編輯距離"
Get P1014 "1014 : 獎學金"
Get P1015 "1015 : 過河卒"
Get P1016 "1016 : Hello World"
Get P1017 "1017 : 計算器大法好……"
Get P1018 "1018 : 測試題目"
Get P1019 "1019 : 度熊的全1串"
Get P1020 "1020 : 快速排序"
截取1000.html結果:
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <!-- SEO --> <meta name="description" content="MasterOJ is an online judge system for ACM/ICPC"> <meta name="keywords" content="OJ,Online Judge,MasterOJ,ACM,ICPC"> <!-- Icons --> <link rel="icon" href="./sitefiles/favicon.ico"> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0"> <meta name="msapplication-TileColor" content="#FEF2E6"> <meta name="msapplication-TileImage" content="./sitefiles/favicon.png"> <!-- Bootstrap CSS --> <link rel="stylesheet" href="./sitefiles/css/bootstrap.min.css"> <link rel="stylesheet" href="./sitefiles/css/prettify.css" type="text/css"> <link rel="stylesheet" href="./sitefiles/css/font-awesome.min.css" type="text/css"> <link rel="stylesheet" href="./sitefiles/css/nprogress.css"> <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> <!--[if lt IE 9]> <script src="./sitefiles/js/html5shiv.js"></script> <script src="./sitefiles/js/respond.min.js"></script> <![endif]--> <!--[if lt IE 7]> <link rel="stylesheet" href="./sitefiles/css/font-awesome-ie7.css" type="text/css"> <![endif]--> <link rel="stylesheet" href="./sitefiles/css/bearkidframe.css" type="text/css"> <!-- javascripts --> <script src="./sitefiles/js/jquery.min.js"></script> <script src="./sitefiles/js/bootstrap.min.js"></script> <script src="./sitefiles/js/prettify.js"></script> <script src="./sitefiles/js/nprogress.js"></script> <title>題目描述 - MasterOJ</title> </head> <body> <nav class="navbar navbar-default"> <div class="container"> <!-- Brand and toggle get grouped for better mobile display --> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="index.html">MasterOJ</a> </div> <!-- Collect the nav links, forms, and other content for toggling --> <div id="navbar" class="collapse navbar-collapse"> <ul class="nav navbar-nav"> <li><a href="hindex.php">首頁</a></li> <li><a href="problemset.php">題庫</a></li> <li><a href="status.php">狀態</a></li> <li><a href="ranklist.php">排名</a></li> <li><a href="contestlist.php">比賽</a></li> <!--<li><a href="document.php">Document</a></li>--> <li><a href="discuss.php">論壇</a></li> <!--<li><a href="#" onclick="javascript:alert(‘該功能未開發!‘)">資源下載</a></li>--> <li><a href="./download/">資源下載</a></li> <li><a href="game/">Games</a></li> </ul> <ul class="nav navbar-nav navbar-right"> <li><a href="./registerpage.php"><i class="fa fa-user-plus"></i> 註冊</a></li> <li><a href="./loginpage.php"><i class="fa fa-sign-in"></i> 登陸</a></li> </ul> </div><!-- /.navbar-collapse --> </div><!-- /.container-fluid --> </nav> <div class="container"> <h1 class="text-center">1000 : A+B問題</h1> <p class="text-center"> 時間限制:<span class="label label-primary">1 Sec</span> 內存限制:<span class="label label-primary">256 MiB</span><br/> 提交:<span class="label label-info">8</span> 答案正確:<span class="label label-success">5</span> </p> <p class="text-center"> <a id="oj-p-submit" class="btn btn-primary" href="./problemsubmit.php?pid=1000" role="button">提交</a> <a class="btn btn-primary" href="./problemstatistics.php?pid=1000" role="button">狀態</a> <a class="btn btn-primary" href="./discuss.php?pid=1000" role="button">論壇</a> <!--<a class="btn btn-primary" href="" role="button">題解</a>--> </p> <h3><a data-toggle="collapse" data-target="#problemDesc">題目描述</a></h3> <div class="collapse in" id="problemDesc" aria-expanded="true"> <pre><p>計算A+B的值(A,B,A+B<=2147483647)</p></pre> </div> <h3><a data-toggle="collapse" data-target="#problemInput">輸入</a></h3> <div class="collapse in" id="problemInput" aria-expanded="true"> <pre><p>兩個整數A和B<br></p></pre> </div> <h3><a data-toggle="collapse" data-target="#problemOut">輸出</a></h3> <div class="collapse in" id="problemOut" aria-expanded="true"> <pre><p>輸出A+B</p></pre> </div> <h3 id="bl-p-datain"><a data-toggle="collapse" data-target="#dataIn">樣例輸入</a></h3> <div class="collapse in" id="dataIn" aria-expanded="true"> <div class="zero-clipboard"> <span id="bl-p-copy" class="btn-clipboard" onclick="copyToClipboard(document.getElementById(‘dataInContent‘).innerHTML);">復制</span> </div> <pre id="dataInContent">1 2</pre> </div> <h3><a data-toggle="collapse" data-target="#dataOut">樣例輸出</a></h3> <div class="collapse in" id="dataOut" aria-expanded="true"> <div class="zero-clipboard"> <span class="btn-clipboard" onclick="copyToClipboard(document.getElementById(‘dataOutContent‘).innerHTML);">復制</span> </div> <pre id="dataOutContent">3</pre> </div> <h3><a data-toggle="collapse" data-target="#problemHint">提示</a></h3> <div class="collapse" id="problemHint" aria-expanded="true"> <pre></pre> </div> <!-- <h3><a data-toggle="collapse" data-target="#problemTag">標簽</a></h3> <div class="collapse" id="problemTag" aria-expanded="true"> <div class="well"> <span class="label label-default"><span> </div> </div> --> <h3><a data-toggle="collapse" data-target="#problemSrc">標簽</a></h3> <div class="collapse" id="problemSrc" aria-expanded="true"> <pre>初級</pre> </div> </div><!--main wrapper end--> <footer class="footer"> <div class="container"> <p style="float: left;" align="left"> <span id="clock">服務器時間: Loading...</span><br/> <a class="bl-footer-link" href="document.php">FAQ</a> | <a class="bl-footer-link" href="document.php?f=rule">EULA</a><!--| <i class="fa fa-code"></i><i class="fa fa-download"></i><i class="fa fa-github"></i> <i class="fa fa-money"></i><i class="fa fa-book"></i><i class="fa fa-lock"></i><i class="fa fa-qq"></i><i class="fa fa-weixin"></i><i class="fa fa-facebook"></i><i class="fa fa-check"></i><i class="fa fa-circle"></i><i class="fa fa-circle-o"></i><i class="fa fa-clock-o"></i><i class="fa fa-user"></i><i class="fa fa-inbox"></i><i class="fa fa-tags"></i><i class="fa fa-cogs"></i><i class="fa fa-sign-out"></i><i class="fa fa-history"></i><i class="fa fa-edit"></i><i class="fa fa-search"></i><i class="fa fa-laptop"></i><i class="fa fa-paper-plane"></i><i class="fa fa-paper-plane-o"></i><i class="fa fa-flag"></i><i class="fa fa-heart"></i></a>--> <p style="float: right; margin-right: 15px;" class="hidden-xs" align="right"> Copyright ? 1999~2017 <a class=‘bl-footer-link‘ href=‘hindex.php‘>MasterOJ</a>.<br/> All Rights reserved </p> </div> </footer> <script> var delta=new Date("2018/02/12 22:56:52").getTime()-new Date().getTime(); function clock() { var h,m,s,finalText,week,year,mon,day; var realTime = new Date(new Date().getTime() + delta); year = realTime.getYear() + 1900; if (year > 3000) year-=1900; mon = realTime.getMonth()+1; day = realTime.getDate(); week = realTime.getDay(); h=realTime.getHours(); m=realTime.getMinutes(); s=realTime.getSeconds(); finalText="服務器時間: "+year+"/"+mon+"/"+day+" "+(h>=10?h:"0"+h)+":"+(m>=10?m:"0"+m)+":"+(s>=10?s:"0"+s); document.getElementById(‘clock‘).innerHTML=finalText; setTimeout("clock()", 1000); } clock(); </script> <script type="text/javascript"> function copyToClipboard(s){ //alert(s); if(window.clipboardData){ window.clipboardData.setData("Text",s); alert("已經復制到剪切板!"); }else if(navigator.userAgent.indexOf("Opera") != -1) { window.location = s; }else if(window.netscape) { try { netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect"); } catch (e) { alert("被瀏覽器拒絕!\n請在瀏覽器地址欄輸入‘about:config‘並回車\n然後將‘signed.applets.codebase_principal_support‘設置為‘true‘"); } var clip = Components.classes[‘@mozilla.org/widget/clipboard;1‘].createInstance(Components.interfaces.nsIClipboard); if (!clip) return; var trans = Components.classes[‘@mozilla.org/widget/transferable;1‘].createInstance(Components.interfaces.nsITransferable); if (!trans) return; trans.addDataFlavor(‘text/unicode‘); var str = new Object(); var len = new Object(); var str = Components.classes["@mozilla.org/supports-string;1"].createInstance(Components.interfaces.nsISupportsString); var copytext = s; str.data = copytext; trans.setTransferData("text/unicode",str,copytext.length*2); var clipid = Components.interfaces.nsIClipboard; if (!clip) return false; clip.setData(trans,null,clipid.kGlobalClipboard); alert("已經復制到剪切板!"); } } $(window).load(function(){ prettyPrint(); }) </script> </body> </html>
所以php也是能搞爬蟲的
什麽?php也能做爬蟲?