Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

HTML Parser - Grabs the link URLs + link texts from a web page and put them into an array (See related posts)

   1  
   2  <?php
   3  function parse_links($document) {
   4  
   5    # Zero or more whitespace characters
   6    $S0 = '\s*';
   7  
   8    # One or more whitespace characters
   9    $S1 = '\s+';
  10  
  11    # Anchor tag start
  12    $anch1 = '<a' . $S1;
  13  
  14    # href= pattern
  15    $href1 = 'href' . $S0 . '=' . $S0;
  16  
  17    # quoted strings, with selection
  18    $q1 = "'[^']'";
  19    $q2 = '"[^"]*"';
  20    $q = "($q1|$q2)";
  21  
  22    # full link pattern
  23    $link_RE = "$anch1$href1$q$S0>\s*(.*?)</a>";
  24  
  25  
  26    //global $q, $href1, $link_RE;
  27    preg_match_all("#$link_RE#i", $document, $matches);
  28    return $matches; // returns an array
  29  
  30  } // end function parse_links()
  31  
  32  //
  33  // DEMO OF HOW TO USE THE FUNCTION
  34  
  35  // grab a webpage
  36  $str = implode('',file('http://del.icio.us'));
  37  
  38  // call the parse_links function
  39  $linkarray=parse_links($str);
  40  
  41  // loop through the link array, outputting the URL + Link Text
  42  for ($i = 0; $i < sizeof($linkarray[0]); $i++)
  43      echo ($linkarray[2][$i] .$linkarray[1][$i] . "<br>");
  44  
  45  ?>

Comments on this post

nassausky posts on May 14, 2007 at 11:59
What do the #'s mean and the i mean after the $'s in this expression? #$link_RE#i

I can't find any documentation on how or why that is displayed like that and not just $link_RE

You need to create an account or log in to post comments to this site.


Click here to browse all 5349 code snippets

Related Posts