I’ve been playing with xml:base support in Atom and can’t help but feel it was a bad idea.
XML base allows a feed publisher to include a ‘base’ URI on an element and then every element after that could be relative. For example you could use a base of ‘feedblog.org’ and then every link could be archive/2005/foo.html which would then be expanded to feedblog.org/archive/2005/foo.html.
Tim Bray’s feed is a good example. I think he’s trying to be a bit snarky here of course. He has every combination of xml:base you could think of (though I’m sure there are more).
This ends up putting needless complexity into Atom parsers. Is it really so hard to create full URLs? If it is maybe you shouldn’t be producing Atom feeds.
On the other hand it’s a bit difficult to do this within Atom parsers. For example the content element can have a base URI. If they’re using double encoded XML within the post then you have to decode the content, parse all the anchors out, and then expand all the URLs.
In the end it seems like pushing a lot of complexity into the hands of Atom parsers with not much of a win for Atom producers. We can’t even get RSS producers to use correct encoding. I don’t think we should give them any more features which they could screw up…
I’m willing to bet that this is one major feature that’s going to be overlooked by most Atom publishers. 99% of feeds in the wild won’t use xml:base but we’re going to have to include support for the 1% of feeds that do.
Update:
Another problem with xml:base. Since the xml:base specification allows the document’s resource to become the base for expanding URLs this means that you could have a feed with totally different URLs if parsed from a cache. This then means that all of your caching infrastructure has to be Atom-aware. Not pretty.
For example, if we had a feed with relative URLs without an explicit xml:base element then the base would become the feed’s URL.
http://feedblog.org/index.atom
If I then had relative URLs within the system such as:
/foo.html
Then when I’m parsing from the network I would have relative URLs of:
http://feedblog.org/foo.html
which is correct.
Then if I were to store the file in /tmp I would have a document resource URL of:
file:///tmp/index.atom.
After parsing my relative URLs would be:
file:///tmp/foo.html
Which isn’t the same as http://feedblog.org/foo.html.
This means that your caching infrastructure now has to be rewritten to support Atom.
Update 2:
If you’re following along at home the URL to the xml:base spec is:
-
1
Trackback on Dec 6th, 2005 at 1:05 pm
Atom and xml:base
Kevin Burton has a post up called xml:base was a BAD addition to Atom, where he writes “On the other…
-
2
Trackback on Dec 6th, 2005 at 1:07 pm
Atom and xml:base
Kevin Burton has a post up called xml:base was a BAD addition to Atom, where he writes “On the other…
-
3
Trackback on Dec 6th, 2005 at 1:47 pm
Atom and xml:base
Kevin Burton has a post up called xml:base was a BAD addition to Atom, where he writes “On the other…
-
4
Trackback on Dec 6th, 2005 at 2:16 pm
Atom and xml:base
Kevin Burton has a post up called xml:base was a BAD addition to Atom, where he writes “On the other…
-
5
Trackback on Dec 6th, 2005 at 3:27 pm
Atom and xml:base
Kevin Burton has a post up called xml:base was a BAD addition to Atom, where he writes “On the other…












December 5, 2005 at 4:37 pm
Um, you do remember that people put relative URLs in RSS, too, don’t you? The only difference is that in Atom, you know exactly how to resolve them, and if the result isn’t what the publisher intended, it’s possible to determine who is at fault. With RSS, it’s anybody’s guess: the HTTP specs probably say that they should be resolved relative to the URL where you fetched the feed, while most people just typing relative URLs in a textarea and then clicking them in their HTML have them relative to the channel/link.
I agree that xml:base complicates things over something like just saying “all relative urls are relative to the feed’s rel=’alternate’ link,” but even ignoring the edge cases where that won’t work, with xml:base there’s a chance that someday your parser will handle at least part of it for you, while with an Atom-only base URL, you’ll always have to write it yourself.
December 5, 2005 at 5:35 pm
Hey Phil… thanks for the feedback!
> Um, you do remember that people put relative URLs in RSS, too, don’t you? The
> only difference is that in Atom, you know exactly how to resolve them, and if
> the result isn’t what the publisher intended, it’s possible to determine who
> is at fault.
Yes. But with RSS they can fix the problem just by using fully specified URLs.
We didn’t need to add a feature to fix this problem.
You’re also talking about a set of publishers that is clearly confused so I’m
not sure they’d know about xml:base to begin with (or figure out how to use it).
In my experience relative URLs within RSS is very rare. I’ve never actually
implemented a feature to expand based on the document URL since this problem has
never been reported in the wild. I think it might have been reported once while
I was at Rojo but it was for a small feed and I think the publisher fixed it.
> With RSS, it’s anybody’s guess: the HTTP specs probably say that they should
> be resolved relative to the URL where you fetched the feed, while most people
> just typing relative URLs in a textarea and then clicking them in their HTML
> have them relative to the channel/link.
Well that’s another bug isn’t it? Where is it resolved to? The blog URL or the
permalink URL.
> I agree that xml:base complicates things over something like just saying “all
> relative urls are relative to the feed’s rel=’alternate’ link,” but even
> ignoring the edge cases where that won’t work, with xml:base there’s a chance
> that someday your parser will handle at least part of it for you, while with
> an Atom-only base URL, you’ll always have to write it yourself.
Actually, my concern is that added complexity here will just break. RSS worked
because it was simple. Dead simple. I don’t think it’s too big of an issue to
force users to use fully specified URLs.
December 5, 2005 at 6:11 pm
That problem would only be a problem for a caching infrastructure that did not also already conveniently cache the original feed url. ;)
And agreed about xml:base being a generally bad addition to Atom, and the reason is complexity. We could, in isolation, justify any feature based on the benefit of that feature. However, if we consider the simplicity of a technology as a “feature”, we often gain value by leaving out fringe case support when it adds complexity disproprtionately to the benefit.
December 5, 2005 at 6:24 pm
Michael.
Yes… I did think of the preservation of the original feed URL right after I posted. I was going to update the post but I’m on flakey wifi. :-/
December 6, 2005 at 2:01 am
I think I’m feeling sick now… why can’t just anyone develop something with extremely strict specs, so that it’s easy to handle everything..
December 11, 2005 at 10:49 am
Julian: that’s what Atom is.
Kevin: sorry, but you’re just plain wrong:
No, that’s not what the xml:base spec does. It’s what relative URI resolution does. xml:base is a red herring here.
Au contraire, xml:base gives you a way to avoid having to make all your infrastructure base-URL-aware, by allowing you to record the base URL in-band instead fo storing it out-of-band. Say I receive your feed from
http://feedblog.org/index.atom
I can then add xml:base=”http://feedblog.org/index.atom” to the atom:feed element before saving the feed to /tmp and voilà, next time I parse the file from /tmp/index.atom, the base URL becomes
http://feedblog.org/index.atom
anyway, because xml:base overrides the location I got it from.
xml:base addresses a real need in the only way possible.
If you want to argue against relative URIs in general, fine – unfortunately, that doesn’t make much sense in practice either.
December 11, 2005 at 11:21 am
Isn’ Atom a bad addition to the world? ;)
January 6, 2006 at 7:04 am
1. If double encoded XML (really there’s no such thing) is causing you trouble, stop doing it. That’s the problem, not xml:base.
2. XOM handles xml:base for you transparently and easily. There’s no reason an Atom client should be worrying about this directly.
3. Your spam prevention is locking out blind users. Very uncool.