[whatwg] StringEncoding: Allowed encodings for TextEncoder
Joshua Cranmer
Pidgeot18 at verizon.net
Tue Aug 7 12:07:28 PDT 2012
On 8/7/2012 12:48 PM, Joshua Bell wrote:
> When Anne's spec appeared I gutted mine and deferred wherever possible
> to his. One consequence of that was getting the other encodings "for
> free" as far as the spec writing goes. If we achieve consensus that we
> only want to support UTF encodings we can add the restrictions. There
> are use cases for supporting other encodings (parsing legacy data file
> formats, for example), but that could be deferred.
My main use case, and the only one I'm going to argue for, is being able
to handle mail messages with this API, and the primary concern here is
decoding. I'll agree with other sentiments in this thread that I don't
particularly care about encoding to anything other than UTF-8 (it might
be nice, but I can live without it); it's being able to decode $CHARSET
that I'm concerned about. As far as edge cases in this scenario are
concerned, it pretty much boils down to "I want to produce the same JS
string that would be output if I looked at the text content of the
document data:text/plain;charset=<charset>,<data>".
When encoding, I think it is absolutely necessary to enforce a uniform
guidelines for the output. When decoding, however, I think that most
differences (beyond concerns like the BOM) are a result of "buggy"
content creators as opposed to the browser media. Given that HTML
display has apparently tolerated differences in charset decoding for
legacy charsets, I suppose it is possible to live with a difference of
exact character decoding for various charsets--in other words, turning
the charset document into an advisory list of both minimum charsets to
support and how to do so.
--
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
More information about the whatwg
mailing list