[whatwg] Encoding Standard (mostly complete)

Glenn Maynard glenn at zewt.org
Wed Apr 18 06:40:33 PDT 2012

"This is a decoder error" seems odd; it's descriptive language ("this thing
must be made true") rather than declarative ("do this thing").  I'd suggest
the declarative language "Emit a decoder error" and "Emit an encoder error".

"If code point is equal or greater than lower boundary" is more naturally
"greater than or equal to" (and "less than or equal to").  That said, this
would be much clearer with interval syntax:

"If code point is in the range [*lower boundary*, 0x10FFFF] and is not in
the range [0xD800, 0xDFFF], emit code point (and continue)."

which I think is easier to read, and also makes it clear that the "0xD800
to 0xDFFF" is a closed interval (0xD800 and 0xDFFF are included).

> An encoder contains one or more encoder error points. Unless stated
otherwise the encoder is terminated at that point.

Encoding form data, at least, doesn't abort on the first error; any
unrepresentable codepoints are encoded as as &x1234;.  (It would sure be
nice if encoding to non-Unicode-based encodings would just *always* use
that syntax for non-ASCII, so the encoders could be dropped, but I guess
that would trigger bugs in pages that are currently masked...)  Is there
any encoding path in browsers that does give up on the first error?

Glenn Maynard

More information about the whatwg mailing list