Lately I’ve been thinking quite a bit about things related to the “Semantic Web”.
This sudden new interest came about during a conversation I was having with José Castro (leader of Lisbon.pm) in which he was telling me about his plans for a service he would like to provide in the Lisbon.pm homepage: he would like to be able to automatically list all the CPAN modules “owned” by the group’s mongers.
As we started talking about this, we concluded right away that in order to achieve that goal we needed a list of all the members and their respective PAUSE accounts and instead of listing that information on a file somewhere we wanted to use something a little more dynamic. This meant that each member would have to host a file (probably some sort of XML formated file) somewhere on the web (maybe even on the group’s server if needed) where {s}
he stated something about h[er|im]
self, which the software than runs the site would parse and then take the necessary steps to gather the rest of the information from CPAN based on that.
Well, José was thinking about using something along the lines of what the London.pm group uses (apparently they went through a similar process some time ago). They developed a XML vocabulary that they use for this purpose.
All well and good but really it felt a bit like re-inventing the wheel to me. The thing is that I had had conversations like this before, although in different contexts, and lately I have, by some twist of fate, been listening to a lot of recorded presentations around the Semantic Web, FoaF, SPARQL and the like to pass up such a good opportunity.
In fact it was Melo who first got me onto the FoaF stuff some time ago and the notion basically lay dormant in the back of my mind just waiting for an opportunity to be used.
So what am I on about with this FoaF, SPARQL and Semantic Web stuff?
Very simply put (very simply, go and look at the websites I link to or google for it if you want to know more), all of this enables machines to gather information about resources on the web –and even things off the web, like people and information about them– and structure it into something that can be automatically manipulated and from which inferences can be made automatically.
FoaF (an acronym for Friend-of-a-Friend) allows people to state things about themselves, their web-based resources, their relationships to other people or other resources, etc.
Now FoaF is very useful for this project as a descriptive language, but it is still just a piece of XML that just sits there waiting to be parsed and we all know how fun it is to parse XML, right?
Enter SPARQL, a query language designed to be “the query language for the web” which basically allows us to query multiple data sources of various formats and get back the results we need. It was a happy coincidence indeed that just as I was preparing to write down the extensions I would need to the existing Perl FoaF parser I happened to listen to an interview with Elias Torres about SPARQL and how it applies to a lot of the things I need right now.
But wait, all that is quite a mouth-full but what does it actually mean for this specific context?
Well, it means that if all the mongers who are interested create a FoaF file about themselves where they state that they have a PAUSE account with ID “FOO”, then we can build a program in the server that goes out, queries the FoaF file of each monger, checks to see if that file references any PAUSE account and, if so, then it goes out to CPAN, gets the list of modules owned by that account and does whatever we decide that we want to do with it.
Right, that’s the easy stuff done right there, figuring out which parts should be used. Now for the real fun part…
First off I had to make sure the technology was all there to begin with. The Semantic Web has been in gaseous state for much too long and I had to make sure that it wasn’t all just vapour anymore.
So I went out and tried to find out about FoaF and in the process even created my own personal FoaF file. Not bad at all, it seems that this part, at least, is good enough to go (even if some of the name spaces it uses are still in experimental stages –deemed unstable).
Then I decided to look at extracting the information without writing a specific-purpose XML parser. As I said earlier, there is a module out there that does that, but it only supports part of what I needed and I decided that, should I take that route, I had to write down a couple of extensions to it.
But then I heard the interview with Elias Torres and decided to give SPARQL a look.
It was just a matter of hopping over to http://sparql.org/ and trying out the SPARQLer demo with my own FoaF file and validating that I could indeed effortlessly get out some of the information I needed and then realizing that there are already some Perl modules for using SPARQL out there on CPAN.
All I have to do now is look into those modules to see how usable they are and with any luck I should have something knocked together real soon (or maybe not that soon, SPARQL doesn’t quite seem to be a trivial language to master).
I’ll try and write up some notes on the process as I go along and maybe even do a more structured article on the whole semantic web thing and how it fits into our solution for the article section of the Lisbon.pm’s site (assuming there will be one… José, are you listening?). ;-)