Feed Joiner

Posted on May 3, 2005

What is it?

Feed Joiner is a script that joins several RSS or ATOM feeds into a single ATOM feed.

Introduction and motivation

I’ve had the need to take all the feeds that I create from the various instances of my on-line presence and clump it together into a master feed for some time now.

Well, one day I just decided to get to work and just do it.

This is the result. So far it is very experimental and it is kind of ugly.

Whishlist and To-Dos

This is what I hope to improve in the near future:

  • Make it auto-detect whether a feed is ATOM or some kind of RSS and treat it accordingly;
  • Sort all the feed items on the master feed (the date on each individual item is good enough for a good reader to sort it, but it would be more elegant to pre-sort them in the feed itself);
  • Handle failings in some sort of sensible way (which I still have to figure out what it actually means: do I drop the items from a source feed that has a problem? Do I use the previous version of the source feed (I’d have to cache it somewhere for that)? Do I just fail to create the master feed?);
  • Make it easier to configure. Especially take out all those hammered in strings and options;
  • Make it all around more robust.

The source code

Anyway, here is the current source code for your perusal (and no, I never claimed it was pretty, but it sure gets the job done!):

#!/usr/bin/perl -w

use strict;

use LWP::Simple;
use XML::RAI;
use XML::Atom::Syndication;
use XML::Atom::SimpleFeed;

my %urls = (
  'http://nowhereland.nunonunes.org/atom.xml' => {
    type => 'atom',
    name => 'Nowhereland',
  },
  'http://www.flickr.com/services/feeds/photos_public.gne?id=92591068@N00&tags=nunonunesispy&format=atom_03' => {
    type => 'atom',
    name => 'iSpy',
  },
  'http://del.icio.us/rss/nunonunes' => {
    type => 'rss',
    name => 'Links',
  },
);

my @newsitems;

# Let's get all the data from all the sourcess
print STDERR "Starting the master feed refresh...\n";

foreach my $url (keys %urls) {

  my $content = get($url);
  my $type = $urls{$url}{type};
  my $feed_title = $urls{$url}{name};

  die "Error getting the feed from $feed_title: $!" unless $content;

  if ($type eq 'atom') {
    my $atomic = XML::Atom::Syndication->instance;

    my $doc = $atomic->parse($content);
    foreach ($doc->query('//entry')) {
	my $newsitem = undef;

	$newsitem->{title} = "[$feed_title]";
	$newsitem->{title} .= " ".$_->query('title')->text_value
	  if $_->query('title');

	$newsitem->{content} =  $newsitem->{content}->text_value
	  if $newsitem->{content};

	$newsitem->{link} = $_->query('link/@href')
	  if $_->query('link/@href');

	$newsitem->{modified} = $_->query('modified')->text_value
	  if $_->query('modified');

	$newsitem->{summary} = $_->query('summary')->text_value
	  if $_->query('summary');

	$newsitem->{content} = $_->query('content')->text_value
	  if $_->query('content');

	$newsitem->{issued} = $_->query('issued')->text_value
	  if $_->query('issued');

	$newsitem->{id} = $_->query('id')->text_value
	  if $_->query('id');

	$newsitem->{created} = $_->query('created')->text_value
	  if $_->query('created');

	push @newsitems, $newsitem;
    }
  }
  elsif ($type eq 'rss') {
    my $rai = XML::RAI->parse($content);

    foreach my $item ( @{$rai->items} ) {
      my $newsitem = undef;

      $newsitem->{title} =  "[$feed_title] ".$item->title;

      $newsitem->{link} = $item->link if $item->link;

      $newsitem->{content} = $item->content if $item->content;

      $newsitem->{issued} = $item->issued if $item->issued;

      push @newsitems, $newsitem;
    }

  }

}

# Now let's build the aggregate feed...

my $atomfeed = XML::Atom::SimpleFeed->new(
  title => "Planet Nuno Nunes",
  link => "http://nunonunes.org/",
  tagline => "All things regarding Nuno Nunes - A collection of all the relevant feeds.",
  author => {
    name => "Nuno Nunes",
    url => "http://nunonunes.org/",
  }
)
or die "Error creating the aggregate feed: $!";

foreach (@newsitems) {
  $atomfeed->add_entry(
    %$_,
    author => {
      name => "Nuno Nunes",
      url => "http://nunonunes.org/",
    }
  )
  or die "Error adding item (\"".$_->{title}."\") to aggregate feed: $!";
}

$atomfeed->print;

print STDERR "Done refreshing the master feed.\n";

1;