Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

About this user

Andrew Pennebaker http://mcandre.devjavu.com/wiki

« Newer Snippets
Older Snippets »
Showing 1-2 of 2 total  RSS 

itcrowdquote.rb

// Prints a quote from Channel 4's "The IT Crowd"

   1  
   2  #!/usr/bin/env ruby
   3  
   4  require "rubygems"
   5  require "open-uri"
   6  require "hpricot"
   7  require "htmlentities"
   8  
   9  coder=HTMLEntities.new()
  10  
  11  doc=open("http://www.channel4.com/entertainment/tv/microsites/I/itcrowd/quote_generator/") { |f| Hpricot(f) }
  12  
  13  section=doc/"blockquote"/"p"
  14  (section/"cite").remove()
  15  quote=section.inner_html
  16  
  17  # remove leading whitespace
  18  quote=quote.gsub(/^\s+/, "")
  19  
  20  # remove trailing whitespace
  21  quote=quote.gsub(/\s+$/, $/)
  22  
  23  # remove dash
  24  quote=quote.gsub(/\s\-\s+$/, $/).chomp
  25  
  26  # decode HTML entities
  27  quote=coder.decode(quote)
  28  
  29  puts quote

html2txt.py

   1  
   2  #!/usr/bin/env python
   3  
   4  __author__="Andrew Pennebaker (andrew.pennebaker@gmail.com)"
   5  __date__="10 Dec 2006"
   6  __copyright__="Copyright 2006 Andrew Pennebaker"
   7  __license__="GPL"
   8  __version__="0.0.1"
   9  __credits__="Based on http://mail.python.org/pipermail/python-list/2004-November/291562.html"
  10  __URL__="http://snippets.dzone.com/posts/show/3127"
  11  
  12  import htmllib
  13  from sgmllib import SGMLParser
  14  
  15  import sys
  16  
  17  class html2txt(SGMLParser):
  18  	"""html2txt()"""
  19  
  20  	def reset(self):
  21  		SGMLParser.reset(self)
  22  		self.pieces=[]
  23  
  24  	def handle_data(self, text):
  25  		self.pieces.append(text)
  26  
  27  	def unknown_starttag(self, tag, attributes):
  28  		pass
  29  
  30  	def unknown_endtag(self, tag):
  31  		pass
  32  
  33  	def handle_entityref(self, ref):
  34  		try:
  35  			self.pieces.append(htmllib.HTMLParser.entitydefs[ref])
  36  		except KeyError, e:
  37  			self.pieces.append("&"+ref)
  38  
  39  	def output(self):
  40  		return "".join(self.pieces)
  41  
  42  if __name__=="__main__":
  43  	html="".join(sys.stdin.readlines())
  44  
  45  	parser = html2txt()
  46  	parser.feed(html)
  47  	parser.close()
  48  
  49  	print parser.output()
« Newer Snippets
Older Snippets »
Showing 1-2 of 2 total  RSS