[whatwg] HTML5 doctypes incompatible with XHR if named entities present
bzbarsky at MIT.EDU
Wed Nov 11 21:33:17 PST 2009
On 11/11/09 11:57 PM, Aryeh Gregor wrote:
> A number of popular web apps output mostly well-formed XML, as far as
> I know: vBulletin, WordPress, etc.
I assume you meant "mostly" as in "most of the pages are well-formed",
not "pages are mostly well-formed", since the latter is useless, right?
I did a brief survey of obvious sites fitting those descriptions that I
had in my browser history at the moment. These were not-well-formed:
So either you're looking at a totally different dataset or "mostly" is a
bit of a stretch....
> Not even close to most websites, of course, but a significant number, I'd think.
Sure. 0.01% of all websites is a "significant number". I just think
it's broken often enough, and easy enough to break by accident, that
relying on it working for screen scraping is not likely to be happening
on a wide scale....
>> Yes, but browsers would have to add explicit support for it.
> That mostly defeats the point -- they could equally add explicit
> support for non-XML responseXML first.
> This makes it sound like if Wikipedia switches to HTML5 and isn't
> willing to break all screen-scrapers on principle, we'll have to use
> an obsolete but conforming doctype.
Or stop using HTML named entities, yes.
More information about the whatwg