I’ve been playing with xml:base support in Atom and can’t help but feel it was a bad idea.
XML base allows a feed publisher to include a ‘base’ URI on an element and then every element after that could be relative. For example you could use a base of ‘feedblog.org’ and then every link could be archive/2005/foo.html which would then be expanded to feedblog.org/archive/2005/foo.html.
This ends up putting needless complexity into Atom parsers. Is it really so hard to create full URLs? If it is maybe you shouldn’t be producing Atom feeds.
On the other hand it’s a bit difficult to do this within Atom parsers. For example the
content element can have a base URI. If they’re using double encoded XML within the post then you have to decode the content, parse all the anchors out, and then expand all the URLs.
In the end it seems like pushing a lot of complexity into the hands of Atom parsers with not much of a win for Atom producers. We can’t even get RSS producers to use correct encoding. I don’t think we should give them any more features which they could screw up…
I’m willing to bet that this is one major feature that’s going to be overlooked by most Atom publishers. 99% of feeds in the wild won’t use xml:base but we’re going to have to include support for the 1% of feeds that do.
Another problem with xml:base. Since the xml:base specification allows the document’s resource to become the base for expanding URLs this means that you could have a feed with totally different URLs if parsed from a cache. This then means that all of your caching infrastructure has to be Atom-aware. Not pretty.
For example, if we had a feed with relative URLs without an explicit xml:base element then the base would become the feed’s URL.
If I then had relative URLs within the system such as:
Then when I’m parsing from the network I would have relative URLs of:
which is correct.
Then if I were to store the file in /tmp I would have a document resource URL of:
After parsing my relative URLs would be:
Which isn’t the same as http://feedblog.org/foo.html.
This means that your caching infrastructure now has to be rewritten to support Atom.
If you’re following along at home the URL to the xml:base spec is: