[whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]
Øistein E. Andersen
liszt at coq.no
Thu Oct 22 13:23:43 PDT 2009
On 22 Oct 2009, at 17:15, NARUSE, Yui wrote:
> First, JIS-X-0208 and JIS-X-0212 are not in IANA Charsets,
I am not sure what you mean; they are both listed at
<http://www.iana.org/assignments/character-sets>:
Name: JIS_C6226-1983 [RFC1345,KXS2]
MIBenum: 63
Source: ECMA registry
Alias: iso-ir-87
Alias: x0208
Alias: JIS_X0208-1983
Alias: csISO87JISX0208
Name: JIS_X0212-1990 [RFC1345,KXS2]
MIBenum: 98
Source: ECMA registry
Alias: x0212
Alias: iso-ir-159
Alias: csISO159JISX02121990
> moreover those correct names as spec are JIS X 0208 and JIS X 0212.
(The IANA registry is internally inconsistent and often disagrees with
official standards when it comes to capitalisation, dashes/hyphens,
underscores and spaces, so it is difficult to get this right. Please
excuse me for not always paying due attention to such details in e-
mails. Of course, the specifications should follow either IANA or the
official standard as appropriate, depending on what it is referring to.)
> Second, JIS_C6226-1983, JIS_X0212-1990, and EBCDICs are not
> ASCII compatible. So they are out of discouraged; mustn't use.
EBCDIC is clearly not ASCII-compatible and may be unique amongst the
character sets in the IANA registry in providing the full ASCII
repertoire in a different arrangement.
JIS_C6226-1983 and JIS_X0212-1990 as defined in RFC1345 (i.e., on
their own) do not contain basic ASCII characters at all, so it makes
little sense to use them for HTML documents without adding ASCII or
the ASCII-based JIS C 6220-1969, which would give something like EUC-
JP or ISO-2022-JP. JIS_C6226-1983 contains wide versions of ASCII
characters, but those are not interpreted as HTML mark-up (unless I am
mistaken). JIS_X0212-1990 does not contain ASCII, kana or basic kanji,
so it is of extremely limited usefulness on its own even in a plain-
text setting. Warning against completely useless encodings seems
pointless.
Many other encodings in the IANA registry are ASCII-incompatible in
different ways; what I do not understand is what makes the ones
currently mentioned in the HTML5 draft particularly harmful.
> Finally, Why ISO 2022 series is discouraged is not clear.
We agree on this point.
> Anyway, most of charsets defined RFC 1345 are not clear.
> Conversion table between [those charsets and] Unicode is needed.
Quite. Anne van Kesteren, I and several others are currently trying
to document how browsers handle different encodings at
<http://wiki.whatwg.org/wiki/Web_Encodings>, and defining mappings to
Unicode is one of the goals. Your contribution would be much
appreciated.
--
Øistein E. Andersen
More information about the whatwg
mailing list