[whatwg] Discrepancies between HTML and ES rules for parsing an integer or float

Wed Aug 3 11:21:59 PDT 2011

Hixie just WONTFIXed two bugs that I thought might be of interest:

http://www.w3.org/Bugs/Public/show_bug.cgi?id=12220
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12296

Basically, HTML defines some algorithms for parsing integers, floats,
etc., which are used in converting DOM to IDL attributes for
reflection (among other things):

http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#numbers

The algorithms for parsing integers and floats are almost exactly the
same as ECMAScript's parseInt() and parseFloat(), down to some of the
language being copied word-for-word, but with subtle differences
involving (at least) whitespace handling.  IMO, this is bad for
several reasons:

* It's confusing to both authors and implementers to have multiple
almost identical algorithms.  Nobody's going to expect the discrepancy
in the corner cases where it matters.
* It's confusing to people reading the spec for there to be these
extra algorithms defined, whose relationship to the ES algorithms is
not obvious.  The HTML and ES algorithms are written in entirely
different styles and it's hard to tell what the differences are from
side-by-side inspections.
* In at least some cases, all browsers match ES and none match the
spec -- see <http://www.w3.org/Bugs/Public/show_bug.cgi?id=12296#c4>.
* Browsers will have to maintain the ES algorithms as well as the HTML
algorithms, so even if the HTML algorithms are superior, it doesn't
save anyone the effort of understanding or implementing the ES
algorithms.

So I think HTML should just defer to ES here.  Hixie disagrees, and
has resolved both bugs twice now, so I'm not going to reopen them
myself at this point.  However, I'd like to hear from implementers
whether they're willing to implement the spec as it stands, or whether
they want the spec algorithms to be identical to ES's algorithms.