Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Parse RSS from a specific url (See related posts)

Use this class to parse RSS at a specific URL.

require 'rexml/document'
class ParseRss
	def initialize(url)
		@url = url
	end
	
	def parse
		@content = Net::HTTP.get(URI.parse(@url))
		xml = REXML::Document.new(@content)
		data = {}
		data['title'] = xml.root.elements['channel/title'].text
		data['home_url'] = xml.root.elements['channel/link'].text
		data['rss_url'] = @url
		data['items'] = []
		xml.elements.each('//item') do |item|
			it = {}
			it['title'] = item.elements['title'].text
			it['link'] = item.elements['link'].text
			it['description'] = item.elements['description'].text
			if item.elements['dc:creator']
				it['author'] = item.elements['dc:creator'].text
			end
			if item.elements['dc:date']
				it['publication_date'] = item.elements['dc:date'].text
			elsif item.elements['pubDate']
				it['publication_date'] = item.elements['pubDate'].text
			end
			data['items'] << it
		end
		data
	end
end


Used like so: ParseRss.new('http://someurl.com/rss').parse(). It returns a hash full of nice RSS goodness that you can use as you wish

Comments on this post

moneypenny posts on Nov 28, 2006 at 19:42
Here it is using symbols instead of strings for the returned data.

require 'rexml/document'
class ParseRss
  def initialize( url )
    @url = url
  end
  
  def parse
    @content = Net::HTTP.get(
      URI.parse( @url )
    )
    
    xml = REXML::Document.new( @content )
    
    data = {
      :title => xml.root.elements['channel/title'].text,
      :home_url => xml.root.elements['channel/link'].text,
      :rss_url => @url,
      :items => []
    }

    xml.elements.each( '//item' ) do |raw_item|
      item = {
        :title => raw_item.elements['title'].text,
        :link => raw_item.elements['link'].text,
        :description => raw_item.elements['description'].text
      }

      if raw_item.elements['dc:creator']
        item[:author] = raw_item.elements['dc:creator'].text
      end

      if raw_item.elements['dc:date']
        item[:publication_date] = raw_item.elements['dc:date'].text
      elsif raw_item.elements['pubDate']
        item[:publication_date] = raw_item.elements['pubDate'].text
      end
      
      data[:items] << item
    end

    data
  end
end
derek_harmel posts on Feb 21, 2007 at 16:28
Here's a revision that does things a bit more of the Ruby way (DRY). Also made it a public class method since there's not really any reason to create an instance here. Hash indexes have been converted to symbols and "dc:" stripped out if found.

class RSSParser
  require 'rexml/document'
  def self.run(url)
    xml = REXML::Document.new Net::HTTP.get(URI.parse(url))
    data = {
      :title    => xml.root.elements['channel/title'].text,
      :home_url => xml.root.elements['channel/link'].text,
      :rss_url  => url,
      :items    => []
    }
    xml.elements.each '//item' do |item|
      new_items = {} and item.elements.each do |e| 
        new_items[e.name.gsub(/^dc:(\w)/,"\1").to_sym] = e.text
      end
      data[:items] << new_items
    end
    data
  end
end

You need to create an account or log in to post comments to this site.


Click here to browse all 4858 code snippets

Related Posts