[whatwg] StringEncoding: Allowed encodings for TextEncoder

Tue Aug 7 12:07:28 PDT 2012

On 8/7/2012 12:48 PM, Joshua Bell wrote:
> When Anne's spec appeared I gutted mine and deferred wherever possible 
> to his. One consequence of that was getting the other encodings "for 
> free" as far as the spec writing goes. If we achieve consensus that we 
> only want to support UTF encodings we can add the restrictions. There 
> are use cases for supporting other encodings (parsing legacy data file 
> formats, for example), but that could be deferred. 

My main use case, and the only one I'm going to argue for, is being able 
to handle mail messages with this API, and the primary concern here is 
decoding. I'll agree with other sentiments in this thread that I don't 
particularly care about encoding to anything other than UTF-8 (it might 
be nice, but I can live without it); it's being able to decode $CHARSET 
that I'm concerned about. As far as edge cases in this scenario are 
concerned, it pretty much boils down to "I want to produce the same JS 
string that would be output if I looked at the text content of the 
document data:text/plain;charset=<charset>,<data>".

When encoding, I think it is absolutely necessary to enforce a uniform 
guidelines for the output. When decoding, however, I think that most 
differences (beyond concerns like the BOM) are a result of "buggy" 
content creators as opposed to the browser media. Given that HTML 
display has apparently tolerated differences in charset decoding for 
legacy charsets, I suppose it is possible to live with a difference of 
exact character decoding for various charsets--in other words, turning 
the charset document into an advisory list of both minimum charsets to 
support and how to do so.

-- 
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth