<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DZone Snippets: tidy code</title>
    <link>http://snippets.dzone.com/posts</link>
    <pubDate>Sat, 17 May 2008 16:20:17 GMT</pubDate>
    <description>DZone Snippets: tidy code</description>
    <item>
      <title>Converting XHTML to XML</title>
      <link>http://snippets.dzone.com/posts/show/5127</link>
      <description>Based on the code from &lt;a href="http://www.ibm.com/developerworks/library/x-tiptidy.html"&gt;'Convert from HTML to XML with HTML Tidy'&lt;/a&gt;, this code will read an xhtml file and extract text to gallery.xml as instructed by xhtml2xml.xml&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;#!/usr/bin/ruby&lt;br /&gt;  &lt;br /&gt;  require 'tidy'&lt;br /&gt;  require 'projxslt'&lt;br /&gt;  &lt;br /&gt;  FILE_PATH = "../"&lt;br /&gt;  &lt;br /&gt;  class Xhtml2Xml&lt;br /&gt;    def convert()&lt;br /&gt;      project = 'xhtml2xml'&lt;br /&gt;      filein = 'xhtml2xml.xml'&lt;br /&gt;      filehtml = 'gallery.html'&lt;br /&gt;      filexml = 'gallery_xhtml.xml'&lt;br /&gt;      xslfile_temp = 'gallery.xsl'&lt;br /&gt;      xslfile = 'xhtml2xml.xsl'&lt;br /&gt;      fileout = 'gallery.xml'&lt;br /&gt;      tidy_config = 'tidy.txt'&lt;br /&gt;      &lt;br /&gt;      project_path = FILE_PATH + project + '/'&lt;br /&gt;      tidy_config_path = project_path + tidy_config&lt;br /&gt;      filein_path = project_path + filein&lt;br /&gt;      filehtml_path = project_path + filehtml&lt;br /&gt;      filexml_path = project_path + filexml&lt;br /&gt;      xslfile_temp_path = project_path + xslfile_temp&lt;br /&gt;      xslfile_path = project_path + xslfile&lt;br /&gt;      fileout_path = project_path + fileout&lt;br /&gt;      &lt;br /&gt;      Tidy.path = '/usr/lib/libtidy.so'&lt;br /&gt;&lt;br /&gt;      file = File.new(filehtml_path,'r')&lt;br /&gt;      buffer = file.read&lt;br /&gt;      xml = Tidy.open(:show_warnings=&gt;true) do |tidy|&lt;br /&gt;        tidy.options.output_xml = true&lt;br /&gt;        tidy.load_config(tidy_config_path)&lt;br /&gt;        puts tidy.options.show_warnings&lt;br /&gt;        xml = tidy.clean(buffer)&lt;br /&gt;        puts tidy.errors&lt;br /&gt;        puts tidy.diagnostics&lt;br /&gt;        xml&lt;br /&gt;      end&lt;br /&gt;      &lt;br /&gt;      #strip out the html document type declaration and save the file&lt;br /&gt;      html_declaration = xml[/&lt;!([^&gt;]*&gt;){2}/]&lt;br /&gt;      save_file(filexml_path, xml.gsub(html_declaration,'&lt;html&gt;'))    &lt;br /&gt;      transform(filein_path, xslfile_path, xslfile_temp_path)&lt;br /&gt;      transform(filexml_path, xslfile_temp_path, fileout_path)&lt;br /&gt;      &lt;br /&gt;    end&lt;br /&gt;    &lt;br /&gt;    def transform(xml_filepath, xsl_filepath, save_filepath)&lt;br /&gt;      pxsl = Projxslt.new(xml_filepath, xsl_filepath)&lt;br /&gt;      outfile = pxsl.transform&lt;br /&gt;      save_file(save_filepath, outfile)&lt;br /&gt;    end&lt;br /&gt;    &lt;br /&gt;    def save_file(filepath, buffer)&lt;br /&gt;      file = File.new(filepath,'w') &lt;br /&gt;      file.puts buffer&lt;br /&gt;      file.close&lt;br /&gt;    end    &lt;br /&gt;  end&lt;br /&gt;  &lt;br /&gt;  if __FILE__ == $0&lt;br /&gt;    h2x = Xhtml2Xml.new()&lt;br /&gt;    h2x.convert()&lt;br /&gt;  end&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;file: xhtml2xml.xml&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;root element="gallery"&gt;&lt;br /&gt;  &lt;summary&gt;&lt;br /&gt;    &lt;field element="title" xpath="head/title"/&gt;&lt;br /&gt;  &lt;/summary&gt;&lt;br /&gt;  &lt;record xpath="body/center/table/tr/td" element="photo"&gt;&lt;br /&gt;    &lt;field xpath="font/br[3]/preceding-sibling::text()[1]" element="title"&gt;&lt;/field&gt;&lt;br /&gt;    &lt;field xpath="/html/body/table/tr/td[2]/font/br[3]/preceding-sibling::text()[1]" element="date"&gt;&lt;/field&gt;&lt;br /&gt;    &lt;field xpath="font/br[1]/preceding-sibling::text()[1]" element="image"&gt;&lt;/field&gt;&lt;br /&gt;    &lt;field xpath="font/br[2]/preceding-sibling::text()[1]" element="description"&gt;&lt;/field&gt;&lt;br /&gt;  &lt;/record&gt;&lt;br /&gt;&lt;/root&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;file:xhtml2xml.xsl (transforms the file xhtml2xml.xml to file gallery.xsl)&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;?xml version="1.0"?&gt;&lt;br /&gt;&lt;xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"&gt;&lt;br /&gt;  &lt;xsl:template match="root"&gt;&lt;br /&gt;    &lt;xsl:variable name="colon"&gt;&lt;xsl:text&gt;:&lt;/xsl:text&gt;&lt;/xsl:variable&gt;&lt;br /&gt;    &lt;br /&gt;    &lt;xsl:element name="xsl:stylesheet"&gt;&lt;br /&gt;      &lt;xsl:attribute name="xmlns{$colon}xsl"&gt;&lt;br /&gt;        &lt;xsl:text&gt;http://www.w3.org/1999/XSL/Transform&lt;/xsl:text&gt;&lt;br /&gt;      &lt;/xsl:attribute&gt;&lt;br /&gt;      &lt;xsl:attribute name="version"&gt;&lt;br /&gt;        &lt;xsl:text&gt;1.0&lt;/xsl:text&gt;&lt;br /&gt;      &lt;/xsl:attribute&gt;&lt;xsl:text&gt;&lt;br /&gt;      &lt;/xsl:text&gt;&lt;br /&gt;&lt;br /&gt;&lt;xsl:element name="xsl:output"&gt;&lt;br /&gt;  &lt;xsl:attribute name="method"&gt;&lt;br /&gt;    &lt;xsl:text&gt;xml&lt;/xsl:text&gt;&lt;br /&gt;  &lt;/xsl:attribute&gt;&lt;br /&gt;  &lt;xsl:attribute name="indent"&gt;&lt;br /&gt;    &lt;xsl:text&gt;yes&lt;/xsl:text&gt;&lt;br /&gt;  &lt;/xsl:attribute&gt;&lt;br /&gt;&lt;/xsl:element&gt;&lt;xsl:text&gt;&lt;br /&gt;&lt;br /&gt;&lt;/xsl:text&gt;&lt;br /&gt;&lt;br /&gt;&lt;xsl:element name="xsl:template"&gt;&lt;br /&gt;      &lt;xsl:attribute name="match"&gt;&lt;br /&gt;        &lt;xsl:text&gt;html&lt;/xsl:text&gt;&lt;br /&gt;      &lt;/xsl:attribute&gt;&lt;xsl:text&gt;&lt;br /&gt;&lt;/xsl:text&gt;&lt;br /&gt;      &lt;xsl:element name="{@element}"&gt;&lt;br /&gt;      &lt;xsl:apply-templates select="summary"/&gt;&lt;br /&gt;&lt;br /&gt;      &lt;xsl:element name="xsl{$colon}for-each"&gt;&lt;br /&gt;        &lt;xsl:attribute name="select"&gt;&lt;br /&gt;          &lt;xsl:value-of select="record/@xpath"/&gt;&lt;br /&gt;        &lt;/xsl:attribute&gt;&lt;xsl:text&gt;&lt;br /&gt;    &lt;/xsl:text&gt;              &lt;br /&gt;&lt;br /&gt;  &lt;xsl:for-each select="record/field"&gt;&lt;br /&gt;    &lt;xsl:element name="xsl:variable"&gt;&lt;br /&gt;      &lt;xsl:attribute name="name"&gt;&lt;br /&gt;        &lt;xsl:value-of select="@element"/&gt;&lt;br /&gt;      &lt;/xsl:attribute&gt;&lt;br /&gt;      &lt;xsl:attribute name="select"&gt;&lt;br /&gt;        &lt;xsl:value-of select="@xpath"/&gt;&lt;br /&gt;      &lt;/xsl:attribute&gt;&lt;br /&gt;    &lt;/xsl:element&gt;&lt;xsl:text&gt;&lt;br /&gt;    &lt;/xsl:text&gt;&lt;br /&gt;  &lt;/xsl:for-each&gt;&lt;br /&gt;&lt;xsl:text&gt;&lt;br /&gt;    &lt;/xsl:text&gt;&lt;br /&gt;&lt;br /&gt;        &lt;xsl:element name="{record/@element}"&gt;&lt;br /&gt;       &lt;xsl:for-each select="record/field"&gt;&lt;br /&gt;              &lt;xsl:element name="{@element}"&gt;&lt;xsl:text&gt;&lt;br /&gt;        &lt;/xsl:text&gt;&lt;br /&gt;            &lt;xsl:element name="xsl:value-of"&gt;&lt;br /&gt;              &lt;xsl:attribute name="select"&gt;&lt;xsl:text&gt;normalize-space($&lt;/xsl:text&gt;&lt;br /&gt;                &lt;xsl:value-of select="@element"/&gt;&lt;br /&gt;                &lt;xsl:text&gt;)&lt;/xsl:text&gt;                &lt;br /&gt;              &lt;/xsl:attribute&gt;&lt;br /&gt;          &lt;/xsl:element&gt;  &lt;xsl:text&gt;&lt;br /&gt;      &lt;/xsl:text&gt;&lt;br /&gt;          &lt;/xsl:element&gt;&lt;br /&gt;&lt;br /&gt;    &lt;/xsl:for-each&gt;&lt;br /&gt;&lt;/xsl:element&gt;&lt;xsl:text&gt;&lt;br /&gt;  &lt;/xsl:text&gt;&lt;br /&gt;&lt;/xsl:element&gt;&lt;xsl:text&gt;&lt;br /&gt;&lt;/xsl:text&gt;&lt;br /&gt; &lt;br /&gt;  &lt;/xsl:element&gt;&lt;br /&gt;&lt;/xsl:element&gt; &lt;!-- template match --&gt;&lt;br /&gt;&lt;/xsl:element&gt; &lt;!-- gallery --&gt;&lt;br /&gt;  &lt;/xsl:template&gt; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;xsl:template match="summary/field"&gt;&lt;xsl:text&gt;&lt;br /&gt;&lt;/xsl:text&gt;&lt;br /&gt;      &lt;xsl:element name="xsl:element"&gt;&lt;br /&gt;        &lt;xsl:attribute name="name"&gt;&lt;br /&gt;          &lt;xsl:value-of select="@element"/&gt;&lt;br /&gt;        &lt;/xsl:attribute&gt;&lt;xsl:text&gt;&lt;br /&gt;&lt;/xsl:text&gt;&lt;br /&gt;        &lt;xsl:element name="xsl:value-of"&gt;&lt;br /&gt;          &lt;xsl:attribute name="select"&gt;&lt;br /&gt;            &lt;xsl:value-of select="@xpath"/&gt;&lt;br /&gt;          &lt;/xsl:attribute&gt;&lt;xsl:text&gt;&lt;br /&gt;&lt;/xsl:text&gt;&lt;br /&gt;        &lt;/xsl:element&gt;&lt;xsl:text&gt;&lt;br /&gt;&lt;/xsl:text&gt;&lt;br /&gt;      &lt;/xsl:element&gt;&lt;xsl:text&gt;&lt;br /&gt;&lt;/xsl:text&gt;&lt;br /&gt;&lt;/xsl:template&gt;&lt;br /&gt;&lt;/xsl:stylesheet&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;output: gallery.xml (this file is the product of gallery_xhtml.xml and gallery.xsl)&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;?xml version="1.0"?&gt;&lt;br /&gt;&lt;gallery&gt;&lt;br /&gt;  &lt;title&gt;Journey to Windsor&lt;/title&gt;&lt;br /&gt;  &lt;photo&gt;&lt;br /&gt;    &lt;title&gt;Windsor Castle&lt;/title&gt;&lt;br /&gt;    &lt;date&gt;July 2003&lt;/date&gt;&lt;br /&gt;    &lt;image&gt;dscn0824.jpg&lt;/image&gt;&lt;br /&gt;    &lt;description&gt;&lt;br /&gt;      A bright, red mailbox inside the castle. It seems oddly familiar in an historic setting.&lt;br /&gt;    &lt;/description&gt;&lt;br /&gt;  &lt;/photo&gt;&lt;br /&gt;&lt;/gallery&gt;&lt;br /&gt;&lt;/code&gt;</description>
      <pubDate>Sun, 10 Feb 2008 15:35:40 GMT</pubDate>
      <guid>http://snippets.dzone.com/posts/show/5127</guid>
      <author>jrobertson (James Robertson)</author>
    </item>
    <item>
      <title>Convert from HTML to XHTML with HTML Tidy</title>
      <link>http://snippets.dzone.com/posts/show/5121</link>
      <description>This HTML Tidy example converts an html file into an xml file.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;tidy -asxhtml -numeric &lt; index.html &gt; index.xml&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;example found from &lt;a href="http://www.ibm.com/developerworks/library/x-tiptidy.html"&gt;Tip: Convert from HTML to XML with HTML Tidy&lt;/a&gt; [ibm.com]</description>
      <pubDate>Fri, 08 Feb 2008 16:51:09 GMT</pubDate>
      <guid>http://snippets.dzone.com/posts/show/5121</guid>
      <author>jrobertson (James Robertson)</author>
    </item>
    <item>
      <title>Tidy with Ruby</title>
      <link>http://snippets.dzone.com/posts/show/5120</link>
      <description>Ruby interface to HTML Tidy Library Project (tidy.sf.net).&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;  require 'tidy'&lt;br /&gt;  Tidy.path = '/usr/lib/libtidy.so'&lt;br /&gt;  html = '&lt;html&gt;&lt;title&gt;title&lt;/title&gt;Body&lt;/html&gt;'&lt;br /&gt;  xml = Tidy.open(:show_warnings=&gt;true) do |tidy|&lt;br /&gt;    tidy.options.output_xml = true&lt;br /&gt;    puts tidy.options.show_warnings&lt;br /&gt;    xml = tidy.clean(html)&lt;br /&gt;    puts tidy.errors&lt;br /&gt;    puts tidy.diagnostics&lt;br /&gt;    xml&lt;br /&gt;  end&lt;br /&gt;  puts xml&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;output&lt;br /&gt;&lt;code&gt;&lt;br /&gt;true&lt;br /&gt;line 1 column 1 - Warning: missing &lt;!DOCTYPE&gt; declaration&lt;br /&gt;line 1 column 7 - Warning: plain text isn't allowed in &lt;head&gt; elements&lt;br /&gt;Info: Document content looks like XHTML 1.0 Transitional&lt;br /&gt;2 warnings, 0 errors were found!&lt;br /&gt;&lt;br /&gt;&lt;html&gt;&lt;br /&gt;&lt;head&gt;&lt;br /&gt;&lt;meta name="generator"&lt;br /&gt;content="HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" /&gt;&lt;br /&gt;&lt;title&gt;title&lt;/title&gt;&lt;br /&gt;&lt;/head&gt;&lt;br /&gt;&lt;body&gt;Body&lt;/body&gt;&lt;br /&gt;&lt;/html&gt;&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Note: Couldn't get it to run on Ubuntu version 7.10 or 7.04 (LoadError: no such file to load -- tidy&lt;br /&gt;), however it ran fine on Gentoo.&lt;br /&gt;&lt;br /&gt;reference: http://tidy.rubyforge.org/&lt;br /&gt;&lt;br /&gt;*update 18-Mar-08 16:15*&lt;br /&gt;I got it working on Ubuntu I simply needed to add - require 'rubygems'</description>
      <pubDate>Fri, 08 Feb 2008 16:28:45 GMT</pubDate>
      <guid>http://snippets.dzone.com/posts/show/5120</guid>
      <author>jrobertson (James Robertson)</author>
    </item>
    <item>
      <title>Tidy Remote HTML (using a web service)</title>
      <link>http://snippets.dzone.com/posts/show/4218</link>
      <description>// Clean up some code using a web service. If you need to do this more quickly I suggest using a local tidy installation&lt;br /&gt;// rather than my web service, but this is nice and easy. :)&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;function tidied($url) {&lt;br /&gt;  /* Cleans up a page via Tidy, returning the cleaned up html as a string&lt;br /&gt;   * By Logan Koester &lt;logan@logankoester.com&gt; 2007-06-28&lt;br /&gt;   * Props to http://infohound.net/tidy */&lt;br /&gt;  return file_get_contents("http://logankoester.com/tools/tidy.php?q=$url");&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;</description>
      <pubDate>Thu, 28 Jun 2007 10:16:46 GMT</pubDate>
      <guid>http://snippets.dzone.com/posts/show/4218</guid>
      <author>logankoester (Logan Koester)</author>
    </item>
  </channel>
</rss>
