[whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]
ian at hixie.ch
Wed Jun 3 14:58:08 PDT 2009
On Sun, 12 Apr 2009, Øistein E. Andersen wrote:
> On 2 Sep 2008, at 06:06, Ian Hickson wrote:
> > On Wed, 30 Jul 2008, Øistein E. Andersen wrote:
> > >
> > > 1. Opera, Firefox and Safari all handle US-ASCII as Windows-1252.
> > > IE7, on the other hand, simply ignores the high bit (as it does for
> > > a few other 7-bit encodings, by the way). Perhaps this
> > > alias could be dropped from the other browsers.
> > Ignoring the high bit seems like a dangerous security bug; dropping any
> > character with a high bit as U+FFFD seems unnecessarily drastic.
> According to a test I did using browsershots.org, IE8 actually seems to do
> this (8-bit characters are rendered as squares), which looks like an argument
> in favour of the more `drastic' option.
> > I've made the spec go with the O/F/S behaviour here.
> This has the advantage of not adding ASCII as a separate encoding, and
> Windows-1252 is presumably one of the encodings most often mislabelled as
> ASCII. However, IE has ignored the high bit at least since 5.01 (IE4 via
> browsershots.org treats it as CP1252, but this could well be
> locale-dependent), so there may not be that many mislabelled pages. Has
> anyone got a list of pages which are labelled as ASCII and contain 8-bit
> This is probably not very important. U+FFFD is `purer', Windows-1252 has the
> potential of rescuing a few pages. It is however essential that 8-bit
> characters be considered not conforming since they do not in fact work (as
> Windows-1252 bytes) in IE5-IE8. This is currently the case, but I think Henri
> Sivonen has argued that `misinterpretation for compatibility' should not be
> considered a conformance error (which would probably be fairly harmless for
> other mappings).
I (and the spec) agree with you here, that these should be reported as
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg