What is it?
Feed Joiner is a script that joins several RSS or ATOM feeds into a single ATOM feed.
Introduction and motivation
I’ve had the need to take all the feeds that I create from the various instances of my on-line presence and clump it together into a master feed for some time now.
Well, one day I just decided to get to work and just do it.
This is the result. So far it is very experimental and it is kind of ugly.
Whishlist and To-Dos
This is what I hope to improve in the near future:
- Make it auto-detect whether a feed is ATOM or some kind of RSS and treat it accordingly;
- Sort all the feed items on the master feed (the date on each individual item is good enough for a good reader to sort it, but it would be more elegant to pre-sort them in the feed itself);
- Handle failings in some sort of sensible way (which I still have to figure out what it actually means: do I drop the items from a source feed that has a problem? Do I use the previous version of the source feed (I’d have to cache it somewhere for that)? Do I just fail to create the master feed?);
- Make it easier to configure. Especially take out all those hammered in strings and options;
- Make it all around more robust.
The source code
Anyway, here is the current source code for your perusal (and no, I never claimed it was pretty, but it sure gets the job done!):
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
use XML::RAI;
use XML::Atom::Syndication;
use XML::Atom::SimpleFeed;
my %urls = (
'http://nowhereland.nunonunes.org/atom.xml' => {
type => 'atom',
name => 'Nowhereland',
},
'http://www.flickr.com/services/feeds/photos_public.gne?id=92591068@N00&tags=nunonunesispy&format=atom_03' => {
type => 'atom',
name => 'iSpy',
},
'http://del.icio.us/rss/nunonunes' => {
type => 'rss',
name => 'Links',
},
);
my @newsitems;
# Let's get all the data from all the sourcess
print STDERR "Starting the master feed refresh...\n";
foreach my $url (keys %urls) {
my $content = get($url);
my $type = $urls{$url}{type};
my $feed_title = $urls{$url}{name};
die "Error getting the feed from $feed_title: $!" unless $content;
if ($type eq 'atom') {
my $atomic = XML::Atom::Syndication->instance;
my $doc = $atomic->parse($content);
foreach ($doc->query('//entry')) {
my $newsitem = undef;
$newsitem->{title} = "[$feed_title]";
$newsitem->{title} .= " ".$_->query('title')->text_value
if $_->query('title');
$newsitem->{content} = $newsitem->{content}->text_value
if $newsitem->{content};
$newsitem->{link} = $_->query('link/@href')
if $_->query('link/@href');
$newsitem->{modified} = $_->query('modified')->text_value
if $_->query('modified');
$newsitem->{summary} = $_->query('summary')->text_value
if $_->query('summary');
$newsitem->{content} = $_->query('content')->text_value
if $_->query('content');
$newsitem->{issued} = $_->query('issued')->text_value
if $_->query('issued');
$newsitem->{id} = $_->query('id')->text_value
if $_->query('id');
$newsitem->{created} = $_->query('created')->text_value
if $_->query('created');
push @newsitems, $newsitem;
}
}
elsif ($type eq 'rss') {
my $rai = XML::RAI->parse($content);
foreach my $item ( @{$rai->items} ) {
my $newsitem = undef;
$newsitem->{title} = "[$feed_title] ".$item->title;
$newsitem->{link} = $item->link if $item->link;
$newsitem->{content} = $item->content if $item->content;
$newsitem->{issued} = $item->issued if $item->issued;
push @newsitems, $newsitem;
}
}
}
# Now let's build the aggregate feed...
my $atomfeed = XML::Atom::SimpleFeed->new(
title => "Planet Nuno Nunes",
link => "http://nunonunes.org/",
tagline => "All things regarding Nuno Nunes - A collection of all the relevant feeds.",
author => {
name => "Nuno Nunes",
url => "http://nunonunes.org/",
}
)
or die "Error creating the aggregate feed: $!";
foreach (@newsitems) {
$atomfeed->add_entry(
%$_,
author => {
name => "Nuno Nunes",
url => "http://nunonunes.org/",
}
)
or die "Error adding item (\"".$_->{title}."\") to aggregate feed: $!";
}
$atomfeed->print;
print STDERR "Done refreshing the master feed.\n";
1;