Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

About this user

James Robertson http://www.r0bertson.co.uk

« Newer Snippets
Older Snippets »
Showing 1-3 of 3 total  RSS 

Transforming XML into RSS

Using the previous code snippet which prepared an XML file it can now be transformed to RSS using the XSL below.


file: gang2rss.xsl
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

	<xsl:output method="xml" encoding="iso-8859-1" indent="yes"  />

	<xsl:template match="rss">

		<rss version="2.0">	
		<channel>
		<title>The Gang</title>
		<link>http://newsgang.net/audio/</link>
		<description>The Gang podcast</description>

  	<language>en</language>

	<xsl:apply-templates select="item" />

		</channel>
		</rss>

	</xsl:template>

	<xsl:template match="item">

	<item>
		<title><xsl:value-of select="title"/></title>
		<link>http://newsgang.net<xsl:value-of select="href"/></link>
		<description><xsl:value-of select="date"/></description>
		<enclosure>http://newsgang.net<xsl:value-of select="href_audio"/></enclosure>
	</item>

	</xsl:template>	

</xsl:stylesheet>


To transform the xsl file from the command-line you would type:

xsltproc gang2rss.xsl thegang_rss.xml

output
<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0">
  <channel>
    <title>The Gang</title>
    <link>http://newsgang.net/audio/</link>
    <description>The Gang podcast</description>
    <language>en</language>
    <item>
      <title>TheGangXII-II</title>
      <link>http://newsgang.net/gangitem/id=6501</link>
      <description>Jan 25</description>
      <enclosure>http://newsgang.net/gangitem/id=6501&amp;from=audio</enclosure>
    </item>
    <item>
      <title>TheGangXII-I</title>
      <link>http://newsgang.net/gangitem/id=6499</link>
      <description>Jan 25</description>
      <enclosure>http://newsgang.net/gangitem/id=6499&amp;from=audio</enclosure>
    </item>
    <item>
      <title>NewsGangLive01.24.08</title>
      <link>http://newsgang.net/gangitem/id=6445</link>
      <description>Jan 24</description>
      <enclosure>http://newsgang.net/gangitem/id=6445&amp;from=audio</enclosure>
    </item>
    <item>
      <title>NewsGangLiveII</title>
      <link>http://newsgang.net/gangitem/id=6377</link>
      <description>Jan 23</description>
      <enclosure>http://newsgang.net/gangitem/id=6377&amp;from=audio</enclosure>
    </item>
  </channel>
</rss>

Note: The enclosure url in this example does not reference the media file directly.

see also: http://en.wikipedia.org/wiki/RSS_(file_format)

Scrape an XHTML document using Ruby

A simple Ruby script to scrape an XHTML file with the selected content being saved to an xml file ready for transformation into an RSS feed. This example uses the XHTML file from http://newsgang.net/audio/ which is then saved locally as 'thegang.xml'.

#!/usr/bin/ruby
# file: thegang.rb

require 'rexml/document'
include REXML

class TheGang
  def initialize()
  end
  
  def rssify()
    file = File.new('thegang.xml','r')
    doc = Document.new(file)
    rss_doc = Document.new
    root = Element.new('rss')
    rss_doc.add_element(root)
    
    doc.root.elements.each("body/div/ul/li/h2/a") do |node|    
      o_rssitem = Element.new('item')
      o_li = node.parent.parent
      
      o_rsstitle = Element.new('title')
      o_rsstitle.text = node.text.gsub(/[\n,' ']/,'')
      o_rssitem.add_element(o_rsstitle)
      
      o_rsshref_audio = Element.new('href_audio')
      o_rsshref_audio.text = node.attributes.get_attribute('href').to_s.gsub('amp;&','')      
      o_rssitem.add_element(o_rsshref_audio)
      
      o_rsshref = Element.new('href')
      o_rsshref.text = o_rsshref_audio.text.gsub('&amp;from=audio','')      
      o_rssitem.add_element(o_rsshref)
      
      o_rssdate = Element.new('date')
      o_rssdate.text = "#{o_li.elements["p/span[1]"].text} #{o_li.elements["p/span[2]"].text}"
      o_rssitem.add_element(o_rssdate)
      rss_doc.root.add_element(o_rssitem)
      
    end

    file = File.new('thegang_rss.xml','w')
    file.puts rss_doc
    file.close
  end
end


if __FILE__ == $0
  gang = TheGang.new
  gang.rssify
end


see also: www.dapper.net

output (extract)
<rss>
  <item><title>TheGangXII-II</title><href_audio>/gangitem/id=6501&amp;from=audio</href_audio><href>/gangitem/id=6501</href><date>Jan 25</date></item>
  <item><title>TheGangXII-I</title><href_audio>/gangitem/id=6499&amp;from=audio</href_audio><href>/gangitem/id=6499</href><date>Jan 25</date></item>
  <item><title>NewsGangLive01.24.08</title><href_audio>/gangitem/id=6445&amp;from=audio</href_audio><href>/gangitem/id=6445</href><date>Jan 24</date></item>
  <item><title>NewsGangLiveII</title><href_audio>/gangitem/id=6377&amp;from=audio</href_audio><href>/gangitem/id=6377</href><date>Jan 23</date></item>
  ...
</rss>

A simple RSS Reader and Podcatcher

Written in Ruby this class reads an RSS feed and downloads the latest enclosure if it exists.

require 'rss/1.0'
require 'rss/2.0'
require 'open-uri'
require 'open-uri'

class Rssreader
  def initialize(url)
    source = url # url or local file
    content = "" # raw content of rss feed will be loaded here
    open(source) do |s| content = s.read end
    @rss = RSS::Parser.parse(content, false)
  end

  # returns the first 3 titles from the rss feed
  def get_summary()
    buffer = '['
    for i in 0..2
      buffer += @rss.items[i.to_i].title + ' | '
    end      
    buffer.slice(0,buffer.length-3) + ']'
  end

  def enclosure?
    @rss.items.to_s.scan('<enclosure').length > 0
  end

  def get_enclosure_url
    enclosure = @rss.items[0].enclosure
    enclosure.url
  end

  def rwget(url, filename)
    file = File.new(filename, 'w')
    file.puts open(url, 'User-Agent' => 'Ruby-wget').read
  end

  def download_enclosure()
    if self.enclosure? then
      enclosure_url = self.get_enclosure_url()
      local_filename = File.basename(enclosure_url)
      #puts local_filename
      if not File.exist?(local_filename) then
        puts 'downloading enclosure ...'
        self.rwget(enclosure_url, local_filename)
        puts 'download completed'
      else
        puts 'enclosure downloaded already'
      end
    end
  end    
end

if __FILE__ == $0
  url = "http://mysite.com/gwd/feed/lugradio.rss"
  rss = Rssreader.new(url)
  puts rss.get_summary()
  rss.download_enclosure()
end
« Newer Snippets
Older Snippets »
Showing 1-3 of 3 total  RSS