[whatwg] A comment to character encoding declaration

Ian Hickson ian at hixie.ch
Thu May 22 02:23:42 PDT 2008


On Fri, 7 Mar 2008, Alexey Proskuryakov wrote:
> On Mar 3, 2008, at 6:11 PM, Jjgod Jiang wrote:
> > [...] I think we can suggest clients to simply treat encodings like 
> > these as their biggest superset, for instance, treat GB2312 as 
> > GB18030.
> 
> In my testing, it appears that IE 7 and Firefox 2 do treat GBK as an 
> equivalent of GB2312, but this cannot be said about GB18030. In 
> particular, 0x80 and 0xA2E3 are treated differently.

On Wed, 19 Mar 2008, Henri Sivonen wrote:
> 
> According to source code[1], WebKit trunk also changes GB_2312-80 to 
> GBK. Gecko aliases gb_2312-80 to GB2312 (due to FrontPage output 
> according to source comment).
> 
> Also, WebKit changes KS_C_5601-1987 and EUC-KR to windows-949-2000. 
> Gecko aliases[2] KS_C_5601-1987 to x-windows-949 (due to FrontPage 
> output according to source comment). However, Gecko doesn't use its 
> alias mechanism to alias EUC-KR to windows-949. I haven't tested if 
> EUC-KR is treated equivalently to windows-949 by other means.
> 
> Yet another weird alias tidbit supported both by Gecko and WebKit source 
> as well as Googling the subject: Looks like x-x-big5 needs to be an 
> alias for Big5 due to FrontPage output.
> 
> [1] http://trac.webkit.org/projects/webkit/browser/trunk/WebCore/platform/text/TextCodecICU.cpp#L90
> [2] http://mxr.mozilla.org/seamonkey/source/intl/uconv/src/charsetalias.properties#335

So what I'm reading from the above (and other similar e-mails not quoted 
above) is that we should introduce the following mappings:

   GB2312 -> GBK
   GB_2312-80 -> GBK
   EUC-KR -> Windows-949
   KS_C_5601-1987 -> Windows-949
   x-x-big5 -> Big5

Is that correct?

I've added this to the spec. Let me know if you have any more information, 
e.g. an exact list of what should be a conformance error in each of those 
cases. Also, if you have any useful references for GB2312 and Big5, let me 
know, I couldn't find anything to reference for them.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



More information about the whatwg mailing list