1. 程式人生 > >Parsing HTML in PHP

Parsing HTML in PHP

Have you ever wanted to get a list of the links contained in a HTML page? Or a list of images, the title or every other non-nested tag for that matter? Then this is the class for you!

Example:

include("phpHTMLParser.php");
$content = file_get_contents("http://www.onderstekop.nl/");
$parser = new phpHTMLParser("$content");
$HTMLObject = $parser->parse_tags(array("a", "title"));
$aTags = $HTMLObject->getTagsByName("a");
foreach ($aTags as $a) {
   if ($a->href != "") {
      echo $a->href . "<br/>";
      echo $a->innerHTML . "<br/><br/>";
   }
}
?>



In this example the parser only keeps track of the 'a' and 'title' tag from which only the 'a' tag object is being requested afterwards. Running this code will parse the HTML page obtained from http://www.onderstekop.nl/, return an object containing all the information you need and output a list of links with their description. This makes the job of dealing with web pages pretty simple, because you can work with a page in an object oriented way instead of having to go through it character by character or with sophisticated and error-prone regular expressions.

Some other features


Each tag object in the object obtained by a getTagsByName call, currently supports href and innerHTML (as shown), but also id, src and innerTag (to get all the attributes as a string).

Another feature, most useful for dumping results and debugging is the output() function available on the object returned by parse() or parse_tags() ($HTMLObject in our example). Furthermore, for even more debugging, you could set $debug=True in the php file itself.

Download phpHTMLParser