Words by c.z.robertson

Worse is better in RSS

2003-04-25 09:47:01 UTC

A few days ago I changed the template for the RSS file on this blog. The change I made was to switch from using the complex 1.0 syntax of the Content Module to using the new (seemingly perpetually draft status) content:encoded syntax.

The old syntax looked like this:

<content:items>
  <rdf:Bag>
    <rdf:li>
      <content:item>
        <content:format
          rdf:resource="http://www.w3.org/1999/xhtml" />
        <rdf:value>
          <!-- entity-encoded html here -->
        </rdf:value>
      </content:item>
    </rdf:li>
  </rdf:Bag>
</content:items>

The new syntax looks like this:

<content:encoded>
  <!-- entity-encoded html here -->
</content:encoded>

The main reason for making this change was simple pragmatism. I don't know of any RSS aggregators which handle the old format, though most seem to handle the new. I'll also be changing the default templates in Catkin to do this in the next release (whenever that may be).

I was somewhat reluctant to make this switch. The original syntax is significantly more powerful, and it feels more correct to me. Though I wasn't taking advantage of it here, it allows you to make proper use of the fact that the content might already be in an XML format and to include it without escaping all the special characters. The content can then be parsed at the same time as the RSS, and it's more easily available to other XML-based processing. The original format also allows you to have the content in multiple formats, and to say what the format is. These should be good things.

But, as often happens, I hadn't properly taken into account the worse-is-better rule. When you start thinking about it in those terms, the drawbacks of the original syntax become glaringly obvious:

  • The ability to have XML markup either as part of the RSS or encoded is a burden for implementors. They have to write code to handle both cases.
  • Furthermore, making the markup into part of the RSS completely ignores the most frequent use case. 99% of the time, the content is just going to be passed off to an HTML renderer. And the renderer doesn't want a DOM tree or a stream of SAX events. It's perfectly capable of doing the parsing itself.
  • Writing the correct sequence of elements for the rdf:Bag is tricky. I personally have a fairly high tolerance for these sorts of things, but I'm weird in that respect.

Despite these criticisms, there's still a part of me that prefers the original syntax. It is right that one XML tree should be included as a subtree of another. It's also right that it should be possible to include the content in a number of different formats. But, according to worse-is-better, completeness can and should be sacrificed for simplicity.

As far as multiple formats are concerned, in simplifying the syntax, they threw the baby out with the bathwater. This may well have something to do with the use of the RDF model. They can't just slap an attribute onto the content:encoded element like any other XML language would. This is one of the problems of using RDF for general purpose markup languages. There are some things for which a tree structure is just more appropriate than a set of RDF triples.

I'm now belatedly coming to the conclusion that RDF in RSS is a mistake. But still, it's better than RSS 2.