PHP爬蟲-爬取百度貼吧首頁違規主題貼
阿新 • • 發佈:2018-11-25
因為是第一次寫,感覺有點冗餘。不過嘛,本文章主要面向不知道爬蟲為何物的小夥伴。o(∩_∩)o
<?php $url='http://tieba.baidu.com/f?ie=utf-8&kw=php&fr=search'; // 地址 $html = file_get_contents($url); // 獲取頁面內容 $dom = new DOMDocument(); @$dom->loadHTML($html); // 因為會報警告,所以忽略掉 $xpath = new DOMXPath($dom); $condition = "php|小白"; // 這是你要搜的符合條件,|分隔 $ex_condition = explode('|', $condition); $str = ''; $count = count($ex_condition) - 1; foreach ($ex_condition as $key => $value) { // 拼接條件 if ($key < $count) { $str .= "contains(@title, '" . $value . "') or "; } else { $str .= "contains(@title, '" . $value . "')"; } } $elements['title'] = $xpath->query("//div[@class='threadlist_lz clearfix']/div/a[" . $str . "]"); // 獲取標題 $elements['href'] = $xpath->query("//div[@class='threadlist_lz clearfix']/div/a[" . $str . "]/@href"); // 獲取連結 if (!is_null($elements)) { foreach ($elements['title'] as $key => $title) { echo "<a href='http://tieba.baidu.com". $elements['href'][$key]->textContent . "'/a>" . $title->textContent . "<br>"; } }
效果是這樣的: