screenshotsNiall notes that the IE team is forcing publishers to use valid feeds.

Which is of course an interesting idea. Not in any way shape or form practical of course and certainly not obeying Postel’s Law.

Our years of experience in with HTML in Internet Explorer have taught us the long-term pain that results from being too liberal with what you accept from others.

It’s not about being too liberal – it’s about being liberal enough. How many feeds have they thrown against their parser? Honestly? I don’t see any numbers here. If you were to throw a strict XML parser against REAL WORLD feeds only 85% of so would parse. People make stupid mistakes (even brilliant people). The Zen is to be liberal enough without breaking anything. This is the brilliance of Postel’s Law.

What about whitespace preamble before the XML processing instruction? Are you going to strip that? Not allowed in XML! It’s an easy fix. What about comment preamble before the root element?. This isn’t allowed in the XML spec but I can tell you from experience that this is a REAL problem with feeds in the wild. What about HTML entities within XML?. These can easily be escaped into unicode chars but they’ll break your parser otherwise.

The RSS Team blog is still shipping Atom 0.3 even though this version is now invalid. Even if it was, it doesn’t pass the RSS validator. To add insult to injury their HTML 4.0 doesn’t validate either – 59 errors.

They’re not being intellectually honest by using encoded HTML in their feed. What are they afraid of? Use XML encoded content! They can’t because the content in their feeds isn’t well-formed so it would break all RSS parsers.

Now of course I think this is a little strict but then again I’m not the one arguing for the requirement for well-formed feeds.

This debate is as old as the hills. From you customers perspective, if Bloglines, Rojo, or any other aggregator can parse an XML feed and IE can’t then IE is broken. End of story.

I feel their pain though. I hate being the one making this decision. At Rojo I would always argue for a strict parser but sooner or later a feed would break with some trivial problem and I’d examine the feed only to find out that they had made an amazingly simple error.

My advice is to build your parser and then throw it against real world feeds. Aggregate the blogosphere for 48 hours. Not just a few feeds but a few million. Rojo aggregated 2.5M feeds on my parser. Log EVERY failure in your parser and the XML content that generated the error. You’ll probably notice a powerlaw distribution of the type of errors. Just focus on the important errors and move on. I’m sure you’ll notice that fixing one common error will result in hundreds of thousands of feeds that can now parse.

The feed industry will be better for it and so will Microsoft.


  1. What they’re announcing in that post isn’t that they will only accept valid feeds, it’s:

    “We will only support feeds that are well-formed XML.”

    And that’s not an insignificant difference.

    Certainly there are ways they could work around common XML errors, and I’ll be interested to see if they stick to their commitment when it comes to embedded HTML within feeds. But only handling well formed XML certainly makes the parsing process a lot simpler so hopefully more efficient, and hopefully Microsoft making that decision will encourage feed publishers and the authors of publishing tools to work harder to make sure their feeds are well-formed.




Leave a Comment