Awasu » Microformats
Wednesday 5th July 2006 10:26 AM [General]

I was going to write about this when 2.2.3 was released (in about two weeks) but there's a bit of a discussion going on right now about it so here's my $0.02 now.

It all started when DeWitt Clinton, a senior developer at Amazon's A9 project wrote about the advantages of Atom over RSS:

But what if you wanted to put something interesting inside a syndicated content feed? What if you wanted to put valid XHTML in a feed? You went through the trouble of writing XHTML, why should it be flattened to an opaque blob of "maybe plain text maybe escaped HTML but Iโ€™m not really sure"?

What if you added semantic microformat markup to your HTML? If you're using an opaque data format, then you may as well have spared yourself the effort, as no client will know it's there (my emphasis).

Or what if you wanted to put some other structured data in your syndicated content feed? Geospacial data, perhaps. Product data. Or perhaps Googleโ€™s GData format. If it's syndicated over RSS, no one will ever know.

This mirrors almost exactly what I said in the 2.2.3.alpha2 release announcement:

More and more feed publishers are now embedding specialized information in their feeds such as licensing details, geocodes, or publisher-specific metadata (e.g. Digg or Furl). Thing is, most feed readers simply ignore it all because they don't even realize it's there ๐Ÿ™„

Now, the last thing I want is to get involved in the RSS vs. Atom format wars so I'll limit my comments to a single statement of fact: there are some things you can do with Atom that you can't with RSS.

Microformats

Microformats are the new kid on the XML block, providing a way to embed small amounts of information in things like RSS or Atom feeds. A bunch of them have been published here and they have been very cleverly designed to allow them to be embedded in such a way that they can be read by both a human (i.e. they will appear in a browser page) and by a computer (i.e. they are accessible by parsing the XML).

If you think about it, the whole concept of RSS/Atom is a bit of a kludge. The same information is being published by a web site in two different formats, HTML and XML, for no reason other than computers can't really understand HTML and my Mom definitely can't understand XML. Being able to embed information in a format that is accessible to both parties will play a major part in (what I predict will be) the success of microformats. Already, major players such as Yahoo! and Technorati are using microformats to embed information in normal web pages, publishing information in a format suitable for humans but it's only a matter of time before the tools are written that go to these pages and extract this information out of them.

Why change

XHTML was never widely adopted because there was no real reason to change. Being able to reliably parse XHTML pages was always touted as one of the main benefits of converting but really, who cares about markup? Having useful information embedded in a page, accessible to both humans and computers, is a much more compelling reason.

The same applies for Atom over RSS. People often say that users shouldn't have to care about which format they're using and that's exactly right. It's also precisely why Atom hasn't made much headway in replacing RSS as the dominant format ๐Ÿ™„ Most publishing systems come with RSS set as the default and it's good enough. People don't have a reason to change and so they don't.

But they do now.

Yummy data

Syndication is about moving information around and if all you want to shift is a bit of HTML, then RSS is fine (assuming that it doesn't contain any angle brackets or ampersands, of course :roll:). But as soon as you want to do anything remotely serious like embed geocodes, details of upcoming events, your own proprietary information, then you'll find RSS just isn't up to the job. You either embed it as an opaque blob of "maybe plain text maybe escaped HTML but Iโ€™m not really sure" intended for human eyes but not accessible by a computer, or as XML where a program can get at it but you would never want to present to a user.

Embedding microformats in an Atom feed lets you have your cake and eat it too. It'll take a while for people to catch on to this but once they start seeing the benefits of microformats and realize that RSS can't handle it, they'll switch.

And when people start embedding all this yummy data in their feeds, you need a client that recognizes that it's there and is able to understand it.

That's where Awasu joins the party... ๐Ÿ˜Ž

Just in case this post wasn't long enough for ya, DeWitt also has a very good piece on how the web is decentralizing and the effect that is having on our data and how we use it. Because ultimately, that's what it's all about: doing what you want to do with your data.

Have your say