[whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]
Ian Hickson
ian at hixie.ch
Fri Oct 23 15:25:54 PDT 2009
On Fri, 23 Oct 2009, Øistein E. Andersen wrote:
> On 23 Oct 2009, at 04:20, Ian Hickson wrote:
> > On Wed, 21 Oct 2009, Ãistein E. Andersen wrote:
> > >
> > > ASCII-compatibility:
> > > The note in Â2.1.5 Character encodings seems to say that [...]
> > > ISO-2022Â[-*] are ASCII-compatible, whereas HZ-GB-2312 is not, and I
> > > cannot
> > > find anything in Section 2.1.5 that would explain this difference.
> >
> > HZ-GB-2312 uses the byte ASCII uses for "~" as the escape character.
> > ISO-2022-* uses the control codes. That's the difference.
>
> '~'/0x7E is not (and should not be, as far as I can tell) relevant for HTML5's
> concept of ASCII compatibility.
Good point. Moved the encoding over to the other side.
> The added note certainly helps, but it is vague (does "[m]ost of these
> encodings" mean "all the encodings mentioned above apart from UTF-32"?)
> and inaccurate (Philip Taylor's example does not rely on "bugs").
>
> Given that the set of encodings is open-ended, I still think it would be
> preferable to make the rationale (a definition of what makes an encoding
> problematic) primary and mention actual encodings as examples. This
> could give something like the following: "Encodings in which a series of
> bytes in the range 0x20..0x7E may encode characters other than the
> corresponding characters in the range U+20..U+7E represent a potential
> security vulnerability since a browser that does not support the
> encoding (or does not support the label used to declare the encoding, or
> does not use the same mechanism to detect the encoding of unlabelled
> content) might end up interpreting technically benign plain text content
> as HTML tags and JavaScript. In particular, this applies to encodings
> in which the bytes corresponding to '<script>' in ASCII may encode a
> different string. Authors should not use such encodings, which are known
> to include.... In addition, authors should not use UTF-32 ...."
> Alternatively, fixing the current note would help and might be
> sufficient, albeit not ideal.
I've reworded the spec based on your suggestion. Thanks!
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list