[whatwg] Requiring the Encoding Standard preferred name is too strict for no good reason

Wed Jul 31 18:41:56 PDT 2013

On Thu, 1 Aug 2013, Martin Janecke wrote:
> 
> I don't see any sense in making a document that is declared as 
> ISO-8859-1 and encoded as ISO-8859-1 non-conforming. Just because the 
> ISO-8859-1 code points are a subset of windows-1252? So is US-ASCII. 
> Should an US-ASCII declaration also be non-conforming then -- even if 
> the document only contains bytes from the US-ASCII range? What's the 
> benefit?
> 
> I assume this is supposed to be helpful in some way, but to me it just 
> seems wrong and confusing.

If you avoid the bytes that are different in ISO-8859-1 and Win1252, the 
spec now allows you to use either label. (As well as "cp1252", "cp819", 
"ibm819", "l1", "latin1", "x-cp1252", etc.)

The part that I find problematic is that if you use use byte 0x85 from 
Windows 1252 (U+2026 "…" HORIZONTAL ELLIPSIS), and then label the document 
as "ansi_x3.4-1968", "ascii", "iso-8859-1", "iso-ir-100", "iso8859-1", 
"iso_8859-1:1987", "us-ascii", or a number of other options, it'll still 
be valid, and it'll work exactly as if you'd labeled it "windows-1252". 
This despite the fact that in ASCII and in ISO-8859-1, byte 0x85 does not 
hap to U+2026. It maps to U+0085 in 8859-1, and it is undefined in ASCII 
(since ASCII is a 7 bit encoding).

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'