[whatwg] StringEncoding: Allowed encodings for TextEncoder

Tue Aug 7 10:55:28 PDT 2012

On Tue, Aug 7, 2012 at 10:47 AM, Glenn Maynard <glenn at zewt.org> wrote:
> On Tue, Aug 7, 2012 at 11:48 AM, Joshua Bell <jsbell at chromium.org> wrote:
>>
>> It doesn't appear we reached consensus - there was some desire expressed
>> to scope to UTF-8, then perhaps expand to include UTF-16, definite consensus
>> that any encoding supported should be handled by both encode and decode,
>> then comments about XHR and form data encodings, but then the discussion
>> wandered into stateful vs. stateless encodings which took us off topic. So
>> Glenn's comment below pretty much reboots the conversation where it was:
>
>
> I don't agree that we necessarily need to support both encode and decode for
> every encoding.
>
> For example, an MP3 tag editor supporting legacy ID3 tags may want to be
> able to decode ISO-8859-1, since it allows tags in that encoding.  However,
> there's no reason to ever write MP3 tags in anything but Unicode--they only
> need decode support for 8859-1, not encode.
>
> This pattern of decode support for legacy, but only encoding to Unicode,
> seems common today.  Many email clients today (not a use case, just a
> comparison) also decode from any encoding but send only in UTF-8.
>
> That's not to say there are no use cases for encoding other encodings, but
> it's much easier to relax the restriction later and allow them if we really
> need to than it is to go the other way, and I think there's a danger of
> perpetuating legacy encodings if we're not careful.

Yup, that matches my feelings exactly.

>>  There are also cross-browser differences in handling decoding of certain
>> code points in certain encodings. Exposing those encodings in a new API
>> would either require that the browser vendors expose those differences
>> (bleah) or implement a compatibility switch in the affected codecs (bleah).
>
> The real fix for this would be for browsers to implement the encodings in
> the correct, interoperable way when exposed by this API, even if that means
> that this API interprets data differently than eg. the HTML parser.  MS has
> made it clear that they won't touch their encodings in any way, due to
> legacy support, but hopefully that doesn't apply to a new API with no legacy
> at all.  (If you want to find that out you'll need to ask on webapps or
> through some other channel, since they're not on this list.)

I'm hoping that browsers in general will be able to converge on the
encoding databases that they have. Both as far as which encodings are
supported, and as far as what encoding tables those encodings support.
Anne's spec is a great first step in that direction. It'll definitely
take time before we have full convergence, but I see no reason that we
couldn't get there eventually. We were able to get there with HTML5
parsing after all :-)

/ Jonas