[whatwg] ISO-8859-* and the C1 control range
Øistein E. Andersen
html5 at xn--istein-9xa.com
Tue Jun 5 07:11:35 PDT 2007
On Jun 5, 2007, at 11:38, Kristof Zelechovski wrote:
> And why not:?
> 2c) If the declared encoding was ISO-8859-2, replace that
> character with the [correponding] character [... from] Windows-1250.
On Jun 5, 2007, at 11:51, Henri Sivonen wrote:
> that's not what [browsers] do, so apparently it is not
> required for compatibility
A more fundamental reason is that the two encodings are
incompatible. Amongst the nine 9 Windows-125* encodings,
8 have ISO-8859-* counterparts, of which 4 are subsets
of the corresponding Windows-125* encoding:
Windows-1250 vs. ISO-8859-2 (Eastern European):
The range 0xC0--0xFF is the same in both encodings,
but 0xA0--0xBF, which does include letters, is different.
Windows-1251 vs. ISO-8859-5 (Cyrillic):
Completely incompatible. Most notably, Cyrillic letters
from the modern Russian alphabet (32 uppercase and 32
lowercase) are shifted by 0x10.
Windows-1252 vs. ISO-8859-1 (Western European):
Superset.
Windows-1253 vs. ISO-8859-7 (Greek):
Almost compatible. Unfortunately, a few bytes in the
range 0xA0--0xBF are assigned to different characters,
and the accented capital Alpha is positioned differently.
Windows-1254 vs. ISO-8859-9 (Turkish):
Superset.
Windows-1255 vs. ISO-8859-8 (Hebrew):
Superset.
Windows-1256 vs. ISO-8859-6 (Arabic):
Arabic consonants seem to have the same code points,
but vowels have incompatible positions. Windows-1256 contains
lowercase French accented characters and even the oe
ligature, whereas ISO-8859 leaves many bytes undefined.
Windows-1257 vs. ISO-8859-13 (Baltic):
Superset.
Windows-1258 (Vietnamese):
No corresponding ISO-8859-* encoding.
--
Øistein E. Andersen
More information about the whatwg
mailing list