[whatwg] Requiring the Encoding Standard preferred name is too strict for no good reason

NARUSE, Yui naruse at airemix.jp
Sat Aug 3 10:19:22 PDT 2013


2013/8/1 Ian Hickson <ian at hixie.ch>:
> On Thu, 1 Aug 2013, Martin Janecke wrote:
>>
>> I don't see any sense in making a document that is declared as
>> ISO-8859-1 and encoded as ISO-8859-1 non-conforming. Just because the
>> ISO-8859-1 code points are a subset of windows-1252? So is US-ASCII.
>> Should an US-ASCII declaration also be non-conforming then -- even if
>> the document only contains bytes from the US-ASCII range? What's the
>> benefit?
>>
>> I assume this is supposed to be helpful in some way, but to me it just
>> seems wrong and confusing.
>
> If you avoid the bytes that are different in ISO-8859-1 and Win1252, the
> spec now allows you to use either label. (As well as "cp1252", "cp819",
> "ibm819", "l1", "latin1", "x-cp1252", etc.)
>
> The part that I find problematic is that if you use use byte 0x85 from
> Windows 1252 (U+2026 "…" HORIZONTAL ELLIPSIS), and then label the document
> as "ansi_x3.4-1968", "ascii", "iso-8859-1", "iso-ir-100", "iso8859-1",
> "iso_8859-1:1987", "us-ascii", or a number of other options, it'll still
> be valid, and it'll work exactly as if you'd labeled it "windows-1252".
> This despite the fact that in ASCII and in ISO-8859-1, byte 0x85 does not
> hap to U+2026. It maps to U+0085 in 8859-1, and it is undefined in ASCII
> (since ASCII is a 7 bit encoding).

ISO-8859-1 vs. Windows-1252 issue sounds little issue because 0x85 is Next Line.
As far as I know 0x85/U+0085 is used only in some IBM system.

For Japanese encoding, there's Shift_JIS vs. Windows-31J issue, which
people long annoyed.
Windows-31J has many new characters which aren't included in Shift_JIS,
and many different Unicode mappings from Shift_JIS.
But many existing Web pages specify "Shift_JIS" and uses characters
only in Windows-31J.
Therefore if people want to specify a document as truly Shift_JIS,
there's no way on the existing framework.
It needs a new way for example a new meta specifier like <META
i-want-to-truly-specify-charset-as="Shift_JIS">
and browser recognize the document's encoding as true Shift_JIS.

But such people should use UTF-8 instead of introducing such new one.

-- 
NARUSE, Yui  <naruse at airemix.jp>


More information about the whatwg mailing list