[whatwg] The problem of duplicate ID as a security issue

Wed Jun 6 15:20:18 PDT 2007

On Fri, 10 Mar 2006, Alexey Feldgendler wrote:
>
> Does the current version of the spec define what happens to elements 
> with duplicate ID values?

No. It's something we should consider for fixes to DOM3 Core, though.

> The problem of duplicate ID isn't just another issue where it's nice to 
> have some well-defined error recovery just for uniformity. There are 
> cases when duplicate IDs should be viewed as a security concern.
> 
> Consider a script which augments the HTML page after it has been parsed 
> by attaching event listeners to elements in the DOM tree, inserting new 
> nodes into the tree etc. This is common practice, for example, for many 
> web-based WYSIWYG editors. In this scenario, any method the script uses 
> for identificaation of the DOM nodes subject to augmentation is 
> vulnerable to possible spoofing by user-supplied content present on the 
> same page.
> 
> For example, imagine a script which finds a button by ID and attaches an 
> event listener to it. A possible markup looks like this:
> 
> <div>
>    ...blog entry body...
> </div>
> <button id="addtomemories">Add this entry to memories</button>
> <script>
> document.getElementById('addtomemories').addEventListener('click',
> doSomeNiceAJAX);
> </script>
> 
> So, a malicious blog author can make the following entry:
> 
> I have found a <a href="#" id="addtomemories">cool website</a>.
> 
> Depending on how the browser handles duplicate IDs, any of the following 
> unwanted effects may occur, or both:
> 1. Clicking the link in the blog entry adds the entry to memories list 
> of the reader.
> 2. Clicking the real "Add this entry to memories" button does nothing.
> 
> One can think of other examples, possibly more dangerous. Other methods 
> of identification (by tag name, by class, by CSS selector as proposed 
> recently) are also vulnerable.
> 
> This kind of attack is hard to circumvent through use of HTML cleaners 
> because id="addtomemories" looks like an innocent attribute, like an 
> anchor for navigation.

It's not that hard to avoid. You can whitelist what attributes are allowed 
(e.g. only attribute consisting of "comment" followed by the comment 
number followed by 1 to 10 characters in the range a-z).

> Preventing such attacks by a HTML cleaner would require either making a 
> full list of all "forbidden" IDs, class names etc, or imposing Draconian 
> rules upon user-supplied content, completely disallowing such useful 
> attributes like id and class.

I'm not really convinced there's that much use in user-supplied IDs and 
classes, but the rules needn't be that draconian. The server could 
automatically prepend the commentN string to IDs and classes.

To be safe, a server's cleaning code must whitelist everything -- 
elements, attribute names, attribute values, element contents, etc. It's 
not trivial, but that's no excuse for not doing it.

> Necessary but not sufficient. Duplicate IDs aren't caught by a 
> validating parser, so custom code is needed to enforce many of the 
> requirements. For example, if one was trying to ensure that all IDs are 
> unique, then the ID values within the user-supplied code would have to 
> be checked for duplicates among them, too.

This is already the case, yes.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'