require 'tidy' Tidy.path = '/usr/lib/libtidy.so' html = '<html><title>title</title>Body</html>' xml = Tidy.open(:show_warnings=>true) do |tidy| tidy.options.output_xml = true puts tidy.options.show_warnings xml = tidy.clean(html) puts tidy.errors puts tidy.diagnostics xml end puts xml
output
true line 1 column 1 - Warning: missing <!DOCTYPE> declaration line 1 column 7 - Warning: plain text isn't allowed in <head> elements Info: Document content looks like XHTML 1.0 Transitional 2 warnings, 0 errors were found! <html> <head> <meta name="generator" content="HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" /> <title>title</title> </head> <body>Body</body> </html>
Note: Couldn't get it to run on Ubuntu version 7.10 or 7.04 (LoadError: no such file to load -- tidy
), however it ran fine on Gentoo.
reference: http://tidy.rubyforge.org/
*update 18-Mar-08 16:15*
I got it working on Ubuntu I simply needed to add - require 'rubygems'