Mechanize / Hpricot / Scraping setup
1 2 require 'rubygems' 3 require 'cgi' 4 require 'open-uri' 5 require 'hpricot' 6 require 'mechanize' 7 8 agent = WWW::Mechanize.new 9 doc = Hpricot(agent.get(the_url).parser.to_s)
DZone Snippets > sikelianos > ruby
13463 users tagging and storing useful source code snippets
Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world
1 2 require 'rubygems' 3 require 'cgi' 4 require 'open-uri' 5 require 'hpricot' 6 require 'mechanize' 7 8 agent = WWW::Mechanize.new 9 doc = Hpricot(agent.get(the_url).parser.to_s)
1 2 str = "<html>This and <b>that</b> and <br />and <span class='something'>the other</span>?<html>" 3 puts str.gsub(/<\/?[^>]*>/, "") 4
1 2 class String 3 def count_words 4 n = 0 5 scan(/\b\S+\b/) { n += 1} 6 n 7 end 8 end
1 2 require 'rubygems' 3 require 'cgi' 4 require 'open-uri' 5 require 'hpricot' 6 7 q = %w{meine kleine suchanfrage}.map { |w| CGI.escape(w) }.join("+") 8 url = "http://www.google.com/search?q=#{q}" 9 doc = Hpricot(open(url).read) 10 lucky_url = (doc/"div[@class='g'] a").first["href"] 11 system 'open #{lucky_url}'
1 2 def url=(addr) 3 super (addr.blank? || addr.starts_with?('http')) ? addr : "http://#{addr}" 4 end