[whatwg] StringEncoding: Allowed encodings for TextEncoder
jsbell at chromium.org
Mon Aug 13 09:08:24 PDT 2012
Sorry if this is a dupe; I replied to this from my phone and an incorrect
address, and my earlier reply isn't showing in the archives.
On Fri, Aug 10, 2012 at 9:16 PM, Jonas Sicking <jonas at sicking.cc> wrote:
> The spec now contains the following text:
> "NOTE: Because only UTF encodings are supported, and because of the
> algorithm used to convert a DOMString to a sequence of Unicode
> characters, no input can cause the encoding process to emit an encoder
> This is not correct. A DOMString is not a sequence of Unicode
> characters, it's a UTF16 encoded string (this is per EcmaScript). Thus
> it can contain unpaired surrogates and so the encoding process can
> result in encoder errors.
> As I've suggested earlier, I think we should deal with this by simply
> emitting Unicode replacement characters for these encoder errors (i.e.
> for unpaired surrogates).
Already accounted for. Note the phrase:
and because of the algorithm used to convert a DOMString to a sequence of
> Unicode characters
This refers to the normative text that generates a sequence of Unicode code
points from a DOMString by reference to the algorithm in WebIDL , which
handles unpaired surrogates etc.
This informative text should say "Unicode code points" rather than "Unicode
characters", though. Fixing now and referenced  even in the note.
More information about the whatwg