[whatwg] Discrepancies between HTML and ES rules for parsing an integer or float

Aryeh Gregor ayg at aryeh.name
Fri Aug 5 08:43:50 PDT 2011

On Fri, Aug 5, 2011 at 1:57 AM, Jonas Sicking <jonas at sicking.cc> wrote:
> It would make sense to me to match ES here. The main concern is of
> course website compat. Could someone detail what the differences would
> be compared to what implementations/the HTML5 spec do now?

As far as I know, the only difference between the HTML and ES
algorithms is handling of non-ASCII whitespace: ES treats it as
whitespace, HTML does not.  Specifically, ES treats StrWhiteSpaceChar
as leading whitespace:


That includes any Unicode "space separator" (Zs), which in particular
changes over time (which seems to be Hixie's main objection IIUC).
HTML uses "skip whitespace":


Which if you follow the breadcrumbs means only [ \t\n\r\f].  So it's
almost never going to make any difference in practice, we're talking
only about corner cases.

I have a simple test-case at
<http://www.w3.org/Bugs/Public/show_bug.cgi?id=12296#c4> that shows
all browsers strip leading \x0b (vertical tab) when converting DOM
attributes to ints, which matches ES and not HTML.

> For parsing floats this would not seem like a problem though since
> attributes containing floats is relatively new IIRC.

Yes, that's correct.  There's definitely no compat issue here with
floats, but really there's not going to be any with ints either, since
it's going to be exceedingly rare that anyone will put Unicode
whitespace in DOM attributes that are reflected as integers and then
rely on them working.  So it's just a question of if we'd prefer the
algorithms to match or not.

