[whatwg] More prohibited characters for unquoted attributes are needed
Geoffrey Sneddon
gsneddon at opera.com
Tue Oct 13 07:02:24 PDT 2009
Ian Hickson wrote:
> On Mon, 7 Sep 2009, Aryeh Gregor wrote:
>> On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
>> <foolistbar at googlemail.com> wrote:
>>> Apparently Hixie had previously said he didn't want to change this as it
>>> will become a non-issue over time. I think it does matter due to the
>>> security issues it presents in existing UAs. Conforming markup (using
>>> elements/attributes allowed in HTML 4.01) should not cause JS to execute in
>>> one browser but not in another.
>> I agree with you as an author. I wrote an HTML output function in
>> MediaWiki assuming that what the standard says is known to be
>> interoperable, which is apparently wrong. If I hadn't been keeping up
>> with HTML 5, I would have introduced an XSS vulnerability because of
>> some browsers' handling of `.
>>
>> If the problem will go away with time, then perhaps a later version of
>> the standard could make such unquoted attributes conforming, once
>> there's no more problem with them.
>
> As far as I can tell, this is an IE bug; treating "`" as an attribute
> quoting character is non-conforming in any version of HTML so far, it
> seems. I'm certainly not going to make it non-conforming to stumble into
> any IE bug or difference in parsing between IE and previous specs or other
> browsers; we'd just end up with an asanine set of conformance
> requirements.
I agree that it's pointless to make it non-conforming to hit any parsing
bug, but I would argue that we should make as many cases as it is
sensible to do so non-conforming if they open up security holes in
websites on legacy UAs, given that website uses a HTML 5
parser/sanitizer/serializer.
> For example, should this be non-conforming?
>
> <!DOCTYPE html>
> <title>Test</title>
> <form>
> <label>Search: <input type=text></label>
> <input type=submit>
> </form>
>
> This perfectly innocent piece of HTML content (HTML2-compliant except for
> the DOCTYPE) results in a non-tree DOM in IE8. Should we make it
> non-conforming?
No, it opens up no security hole if that is done.
> Similarly, IE conditional comments make it trivial to trigger scripts in
> IE but not another UA; indeed people do this on purpose. Should we make
> those non-conforming also?
They are a harder issue, but I think it is probably fair enough to
assume that most sanitizers drop comments for such reasons, hence making
them fine to leave as conforming also.
> As I understand it, the attack here is a site that allows the user to
> input text that is used verbatim in two attributes, such that the user can
> set the first attribute's value to:
>
> `
>
> ...and the second to:
>
> ` onload='...payload...' end=x
>
> ...with the assumption that the site is going to not quote the first one,
> and quote the second one with double quotes:
(This is the default behaviour of Python html5lib, FWIW: the first is
not quoted as it does not contain any whitespace characters or U+003E
(>), the latter is quoted for that reason.)
> <body title=` class="` onload='...payload...' end=x">
>
> ...which in IE, for some reason, gets treated as:
>
> <body title=' class="'
> onload='...payload...'
> end='x"'>
Indeed, this is the attack I (and others) am concerned about.
> I've disallowed ` in unquoted attribute values for now, but I think we
> should revert this once IE has fixed this bug for a few years.
Right, once versions of IE with this bug have faded out of existence I
think this will become a non-issue. I also expect that'll be a while
yet, though, and I highly doubt that time will have come even by the
time when HTML 5 goes to REC. Furthermore, if there are similar attacks
to this, I think they should similarly be made non-conforming.
--
Geoffrey Sneddon — Opera Software
<http://gsnedders.com/>
<http://www.opera.com/>
More information about the whatwg
mailing list