[whatwg] Web Encodings

Ian Hickson ian at hixie.ch
Sat Aug 29 18:47:34 PDT 2009

On Wed, 19 Aug 2009, Anne van Kesteren wrote:
> Today every browser implements their own encoding label matching 
> algorithm, supports their own list of encodings, their own list of 
> encoding label aliases, and everything sort of works, but not really.
> HTML5 solves part of this problem by defining exactly how to identify an 
> encoding label alias in a text/html stream. It also defines which 
> encoding label matching algorithm to use, UTS22, but we found out that 
> this is incompatible with (existing) sites that specify EUC_JP at the 
> HTTP level and actually want to be decoded per UTF-8 according to a 
> <meta> in the text/html stream. This works fine if you have a strict 
> encoding label matching algorithm, but with UTS22, EUC_JP and EUC-JP 
> become the same thing, while only the latter is the actual encoding 
> label.

I've backed off UTS22. I think we need the IANA list updated, though, to 
include the aliases browsers support. I understand you are working on 
this? I would like to remove the table in the HTML5 spec that defines such 
mappings, once that is done.

> Another problem HTML5 does not solve is giving a definitive list of 
> encodings clients have to implement to be compatible with a large body 
> of Web content. This means new clients will have to reverse engineer 
> that list from existing clients which I think is bad.

If you can get browser vendors to agree on a comprehensive and accurate 
list, I'm happy to add it to the spec. But unless a plurality of browser 
vendors actually decide to standardise on a single set of encodings, I 
don't know that it makes sense to spec something here.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list