[whatwg] Requiring the Encoding Standard preferred name is too strict for no good reason
ian at hixie.ch
Wed Jul 31 18:41:56 PDT 2013
On Thu, 1 Aug 2013, Martin Janecke wrote:
> I don't see any sense in making a document that is declared as
> ISO-8859-1 and encoded as ISO-8859-1 non-conforming. Just because the
> ISO-8859-1 code points are a subset of windows-1252? So is US-ASCII.
> Should an US-ASCII declaration also be non-conforming then -- even if
> the document only contains bytes from the US-ASCII range? What's the
> I assume this is supposed to be helpful in some way, but to me it just
> seems wrong and confusing.
If you avoid the bytes that are different in ISO-8859-1 and Win1252, the
spec now allows you to use either label. (As well as "cp1252", "cp819",
"ibm819", "l1", "latin1", "x-cp1252", etc.)
The part that I find problematic is that if you use use byte 0x85 from
Windows 1252 (U+2026 "…" HORIZONTAL ELLIPSIS), and then label the document
as "ansi_x3.4-1968", "ascii", "iso-8859-1", "iso-ir-100", "iso8859-1",
"iso_8859-1:1987", "us-ascii", or a number of other options, it'll still
be valid, and it'll work exactly as if you'd labeled it "windows-1252".
This despite the fact that in ASCII and in ISO-8859-1, byte 0x85 does not
hap to U+2026. It maps to U+0085 in 8859-1, and it is undefined in ASCII
(since ASCII is a 7 bit encoding).
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg