[whatwg] StringEncoding: Allowed encodings for TextEncoder

Mon Aug 13 09:08:24 PDT 2012

Sorry if this is a dupe; I replied to this from my phone and an incorrect
address, and my earlier reply isn't showing in the archives.

On Fri, Aug 10, 2012 at 9:16 PM, Jonas Sicking <jonas at sicking.cc> wrote:

> The spec now contains the following text:
>
> "NOTE: Because only UTF encodings are supported, and because of the
> algorithm used to convert a DOMString to a sequence of Unicode
> characters, no input can cause the encoding process to emit an encoder
> error."
>
> This is not correct. A DOMString is not a sequence of Unicode
> characters, it's a UTF16 encoded string (this is per EcmaScript). Thus
> it can contain unpaired surrogates and so the encoding process can
> result in encoder errors.
>
> As I've suggested earlier, I think we should deal with this by simply
> emitting Unicode replacement characters for these encoder errors (i.e.
> for unpaired surrogates).
>

Already accounted for. Note the phrase:

and because of the algorithm used to convert a DOMString to a sequence of
> Unicode characters

This refers to the normative text that generates a sequence of Unicode code
points from a DOMString by reference to the algorithm in WebIDL [1], which
handles unpaired surrogates etc.

This informative text should say "Unicode code points" rather than "Unicode
characters", though. Fixing now and referenced [1] even in the note.

[1] http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode