[whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

Ian Hickson ian at hixie.ch
Tue Jul 7 01:25:58 PDT 2009

On Tue, 9 Jun 2009, Anne van Kesteren wrote:
> On Tue, 09 Jun 2009 01:42:57 +0200, Øistein E. Andersen <liszt at coq.no> wrote:
> > Le 5 juin 09, Anne van Kesteren écrivit :
> >>
> >> Is the implication here that Shift_JIS and Shift-JIS are distinct 
> >> [...]?
> >
> > No, Shift-JIS and Windows-932 are commonly used names/labels for the 
> > encodings that are registered as Shift_JIS and Windows-31J 
> > (respectively) in the IANA charset registry. Sorry for the confusion 
> > caused.
> So should HTML5 mention that Windows-932 maps to Windows-31J? (It does 
> not appear in the IANA registry.)

I've added this mapping too, just in case.

On Tue, 9 Jun 2009, Øistein E. Andersen wrote:
> That is an interesting question. My (apparently wrong) understanding was 
> that the table was merely supposed to provide mappings between 
> encodings, since such mappings are inappropriate in non-HTML contexts 
> and cannot be added to the IANA registry. It might be to useful to 
> include a set of MIME charset strings which cannot be or have not yet 
> been registered (e.g., x-x-big5, x-sjis, windows-932) as well as 
> information on how CJK character sets are implemented in practice, both 
> of which seem to be necessary for compatibility.
> Such information does not fit comfortably in the current table, though.

Added x-sjis. What are the other mappings that would be good?

On Tue, 9 Jun 2009, Øistein E. Andersen wrote:
> > 
> > I believe you misunderstand the purpose of this table. The idea is to 
> > give a mapping of _labels_ to encodings, not encodings to encodings. 
> > I've clarified the text to this effect.
> You seem to have added "specified by a label" to the phrase which now 
> reads "an encoding specified by a label given in the first column of the 
> following table" without changing the column heading ("Input encoding") 
> and without defining what a "label" actually is. The reference to 
> "encoding aliasing" is also intact, which seems misleading if the table 
> is not supposed to map between encodings.

I've split the table in two to avoid this issue.

Earlier, you wrote:
> GB2312 and GB_2312-80 technically refer to the *character set* GB 
> 2312-80, [...]. GBK, on the other hand, is an encoding.

As far as I can tell, GB2312 and GB_2312-80 are two different encodings 
according to IANA.

On Wed, 10 Jun 2009, Anne van Kesteren wrote:
> I would prefer them being added to the IANA registry.

I've noted that I should do that.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list