NunoNunes.org

Loading
Entries by year
Entries by month
December
Sun Mon Tue Wed Thu Fri Sat
        3
Powered by Blosxom
Creative Commons License

The Feed Joiner

tags: coding,programming,scripting

While waiting for the rest of the lunch party, I decided to check out the feed joiner script and, to my surprise, things appear to be a-OK!

So much so that I’ve decide to integrate my wiki’s recently edited pages feed to it, just for the heck of it. And it works! It doesn’t have any useful information on this particular feed, but it worked like a charm first time.

Also, I’ve put up a small page about it, with the source code, on the wiki. It is here.

And now: to lunch!

About this entry

Originally written on May 03, 2005 @ 14:08
Read article on it's own page (permalink)

Feed Joiner

What is it?

Feed Joiner is a script that joins several RSS or ATOM feeds into a single ATOM feed.

Introduction and motivation

I’ve had the need to take all the feeds that I create from the various instances of my on-line presence and clump it together into a master feed for some time now.

Well, one day I just decided to get to work and just do it.

This is the result. So far it is very experimental and it is kind of ugly.

Whishlist and To-Dos

This is what I hope to improve in the near future:

  • Make it auto-detect whether a feed is ATOM or some kind of RSS and treat it accordingly;
  • Sort all the feed items on the master feed (the date on each individual item is good enough for a good reader to sort it, but it would be more elegant to pre-sort them in the feed itself);
  • Handle failings in some sort of sensible way (which I still have to figure out what it actually means: do I drop the items from a source feed that has a problem? Do I use the previous version of the source feed (I’d have to cache it somewhere for that)? Do I just fail to create the master feed?);
  • Make it easier to configure. Especially take out all those hammered in strings and options;
  • Make it all around more robust.

The source code

Anyway, here is the current source code for your perusal (and no, I never claimed it was pretty, but it sure gets the job done!):

#!/usr/bin/perl -w

use strict;

use LWP::Simple;
use XML::RAI;
use XML::Atom::Syndication;
use XML::Atom::SimpleFeed;

my %urls = (
  'http://nowhereland.nunonunes.org/atom.xml' => {
    type => 'atom',
    name => 'Nowhereland',
  },
  'http://www.flickr.com/services/feeds/photos_public.gne?id=92591068@N00&tags=nunonunesispy&format=atom_03' => {
    type => 'atom',
    name => 'iSpy',
  },
  'http://del.icio.us/rss/nunonunes' => {
    type => 'rss',
    name => 'Links',
  },
);

my @newsitems;

# Let's get all the data from all the sourcess
print STDERR "Starting the master feed refresh...\n";

foreach my $url (keys %urls) {

  my $content = get($url);
  my $type = $urls{$url}{type};
  my $feed_title = $urls{$url}{name};

  die "Error getting the feed from $feed_title: $!" unless $content;

  if ($type eq 'atom') {
    my $atomic = XML::Atom::Syndication->instance;

    my $doc = $atomic->parse($content);
    foreach ($doc->query('//entry')) {
    my $newsitem = undef;

    $newsitem->{title} = "[$feed_title]";
    $newsitem->{title} .= " ".$_->query('title')->text_value
      if $_->query('title');

    $newsitem->{content} =  $newsitem->{content}->text_value
      if $newsitem->{content};

    $newsitem->{link} = $_->query('link/@href')
      if $_->query('link/@href');

    $newsitem->{modified} = $_->query('modified')->text_value
      if $_->query('modified');

    $newsitem->{summary} = $_->query('summary')->text_value
      if $_->query('summary');

    $newsitem->{content} = $_->query('content')->text_value
      if $_->query('content');

    $newsitem->{issued} = $_->query('issued')->text_value
      if $_->query('issued');

    $newsitem->{id} = $_->query('id')->text_value
      if $_->query('id');

    $newsitem->{created} = $_->query('created')->text_value
      if $_->query('created');

    push @newsitems, $newsitem;
    }
  }
  elsif ($type eq 'rss') {
    my $rai = XML::RAI->parse($content);

    foreach my $item ( @{$rai->items} ) {
      my $newsitem = undef;

      $newsitem->{title} =  "[$feed_title] ".$item->title;

      $newsitem->{link} = $item->link if $item->link;

      $newsitem->{content} = $item->content if $item->content;

      $newsitem->{issued} = $item->issued if $item->issued;

      push @newsitems, $newsitem;
    }

  }

}

# Now let's build the aggregate feed...

my $atomfeed = XML::Atom::SimpleFeed->new(
  title => "Planet Nuno Nunes",
  link => "http://nunonunes.org/",
  tagline => "All things regarding Nuno Nunes - A collection of all the relevant feeds.",
  author => {
    name => "Nuno Nunes",
    url => "http://nunonunes.org/",
  }
)
or die "Error creating the aggregate feed: $!";

foreach (@newsitems) {
  $atomfeed->add_entry(
    %$_,
    author => {
      name => "Nuno Nunes",
      url => "http://nunonunes.org/",
    }
  )
  or die "Error adding item (\"".$_->{title}."\") to aggregate feed: $!";
}

$atomfeed->print;

print STDERR "Done refreshing the master feed.\n";

1;

About this entry

Originally written on May 03, 2005 @ 13:50
Read article on it's own page (permalink)

A bit of evening hacking

Having stayed home this evening (one of the few evenings when I actually have nothing planned outside), I decided to see if I could do something about a particular hitch I’ve been having for a while — getting all the relevant feeds I generate consolidated into one big feed.

I have the weblog (Nowhereland) and it’s feed, I have the links feed (powered by del.icio.us), I have the (sorry excuse at an attempt at a) photolog - ispy [Update: This site is no longer alive] (which is being changed in order to be powered by flickr) and a few more which I’m not sure I’ll want to get into the main feed right now.

These days I haven’t been really inspired to write in the weblog, but I have been active in both the photolog and the links gathering arena (not to mention the wiki, but this will have to wait until a second round), so it really bugs me that it appears as if I am just slacking off and fall bellow the radar when I’m actually getting stuff out there.

Also I now my readers want to know everything which is happening to me, not just each part individually. All three of them. Me included. And my mom. And my wife. Or maybe not…

So while I will almost certainly retain the individual feeds for each source, I will start to push the aggregate feed as the master feed for my site.

This will help with the outsourcing of parts of my site, as it allows me to change providers of services if and when I see fit without affecting (much) the way people get my content.

So anyway, I just did a quick round up of interesting Perl modules and clobbered a prototype script together to get the ball rolling. It most certainly will fail in spectacular and exciting ways before I can trust it enough to urge people to make the switch, but in the mean time it is already up at http://nunonunes.org/atom.xml. It is alpha code (I’ve written it, copied it to the development server and now I’m going to go to bed without even trying it out because otherwise this would last until very, very late) and it will be up and down as I toy with it and kick it into some sort of shape, but if you want to play with it too the source code is up in the wiki.

It is actually kind of fun to see how this sort of stuff still gets me in a zone (briefly, of course, as it didn’t take that long to write this). I was hacking away in front of the TV (something I rarely do) and I was watching Eurosport. The finals for the world snooker tournament where on and I love to watch those.
But then I got my head up and the tournament was over (didn’t even get to know who won!) and there was a football match on (that is soccer if you are of the American persuasion).
What’s so funny about that you ask? Well, I absolutely hate football. The only good thing I find about the game is that on match days the streets tend to be a bit more clear of traffic, and yet I was coding with a match going on on the TV and didn’t even notice it for a good few minutes.
Amazing! Now if it where a few years ago this kind of concentration could have gone on for hours but for now this is good enough anyway. :-)

Update: Turns out my old(ish) RedHat 9 installation didn’t have all the modules I needed to get it working. And installing all the modules required me to upgrade the version of libxml. At this hour I just wasn’t up to it, so the prototype lives instead at http://nowhereland.nunonunes.org/full-atom.xml. And let me tell you, installing Perl modules on a machine with this kind of CPU power is a serious turn-on!

About this entry

Originally written on May 03, 2005 @ 01:52
Read article on it's own page (permalink)

The content of this site is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 License.