[whatwg] Article: Growing pains afflict HTML5 standardization
julian.reschke at gmx.de
Mon Jul 12 08:41:36 PDT 2010
On 12.07.2010 16:43, Mike Wilcox wrote:
> On Jul 12, 2010, at 8:39 AM, Nils Dagsson Moskopp wrote:
>>> That's a little different. Google purposely uses unstandardized,
>>> incorrect HTML in ways that still render in a browser in order to
>>> make it more difficult for screen scrapers. They also "break it" in a
>>> different way every week.
>> Assuming this is true (which I find difficult to believe), wouldn't a
>> screen scraper based on the HTML5 parsing algorithm defeat this
>> purpose ?
> Honestly, I don't know. But W3 defaulted to an HTML5 validator:
True, but a parser conforming to the spec (*) would handle those errors,
so in this case obfuscation wouldn't work. Essentially, any code using
that parser would see the same information as an off-the-shelf web browser.
> Besides the protecting of their API, Google also will scratch and claw
> to save every byte. They are the gold standard of a high performance
Understood. There's an ongoing controversy whether it makes sense to
make things like these invalid (just stating, not offering an opinion).
> website. While this may or may not explain the things that don't
> validate, what it does say is that nothing coming from google.com
> <http://google.com> is accidental.
I believe some time ago a certain Google employee actually *did* state
that some of the conformance problems were unintentional. (yes, I did
spend a few minutes finding that statement but wasn't successful).
Best regards, Julian
(*) Implementing error recovery, which IMHO isn't required.
More information about the whatwg