[whatwg] Encoding Standard (mostly complete)
Anne van Kesteren
annevk at opera.com
Wed Apr 18 10:12:46 PDT 2012
On Wed, 18 Apr 2012 15:40:33 +0200, Glenn Maynard <glenn at zewt.org> wrote:
> "This is a decoder error" seems odd; it's descriptive language ("this
> thing must be made true") rather than declarative ("do this thing").
> I'd suggest the declarative language "Emit a decoder error" and "Emit an
> encoder error".
Yes. Awesome suggestion implemented.
> "If code point is equal or greater than lower boundary" is more naturally
> "greater than or equal to" (and "less than or equal to"). That said,
> this would be much clearer with interval syntax:
> "If code point is in the range [*lower boundary*, 0x10FFFF] and is not in
> the range [0xD800, 0xDFFF], emit code point (and continue)."
> which I think is easier to read, and also makes it clear that the "0xD800
> to 0xDFFF" is a closed interval (0xD800 and 0xDFFF are included).
Then we'd first have to introduce interval syntax to the English language.
We could do that I suppose in the Terminology section if you think it
would be better.
>> An encoder contains one or more encoder error points. Unless stated
>> otherwise the encoder is terminated at that point.
> Encoding form data, at least, doesn't abort on the first error; any
> unrepresentable codepoints are encoded as as &x1234;. (It would sure be
> nice if encoding to non-Unicode-based encodings would just *always* use
> that syntax for non-ASCII, so the encoders could be dropped, but I guess
> that would trigger bugs in pages that are currently masked...) Is there
> any encoding path in browsers that does give up on the first error?
It has been proposed for the API.
And in URLs you do not get "&#...;" (though in WebKit you do) but you get
"?" (IE at the network layer, Opera earlier on) or the utf-8
representation (Gecko is totally weird).
Maybe we should align URLs with <form> here and use "&#...;" throughout if
that is compatible with content. Probably deserves a a discussion in its
I do not know any cases beyond URLs, <form>, and the proposed API that
require an encoder in the platform.
Anne van Kesteren
More information about the whatwg