[whatwg] ISO-8859-* and the C1 control range

Øistein E. Andersen html5 at xn--istein-9xa.com
Tue Jun 5 10:40:34 PDT 2007


Neither "ISO-8859-11" nor "Windows-874" appears in the list
of IANA-approved character sets:
    http://www.iana.org/assignments/character-sets
On the other hand, "TIS-620" (identical to ISO-8859-11
except that 0xA0 is left undefined) has been sanctioned by IANA.
Perhaps Henri Sivonen could add a test for TIS-620?

(To do this properly, what we really ought to do is look for
C1 and undefined characters in all IANA charsets and semi-official
mappings to Unicode and check 1) whether the gaps can be filled
by borrowing from other encodings, and 2) whether browsers
actually do so. It would probably be acceptable to require
specific treatment for ISO-8859-1 bytes, given the encoding's
special status and the fact that NCRs need this treatment anyway,
but it seems difficult to defend exceptions for one Thai encoding
without actually investigating whether similar measures might
be appropriate for other encodings as well.)

-- 
Øistein E. Andersen




More information about the whatwg mailing list