[whatwg] The problem of duplicate ID as a security issue

Thu Mar 9 22:45:29 PST 2006

Does the current version of the spec define what happens to elements with  
duplicate ID values?

The problem of duplicate ID isn't just another issue where it's nice to  
have some well-defined error recovery just for uniformity. There are cases  
when duplicate IDs should be viewed as a security concern.

Consider a script which augments the HTML page after it has been parsed by  
attaching event listeners to elements in the DOM tree, inserting new nodes  
into the tree etc. This is common practice, for example, for many  
web-based WYSIWYG editors. In this scenario, any method the script uses  
for identificaation of the DOM nodes subject to augmentation is vulnerable  
to possible spoofing by user-supplied content present on the same page.

For example, imagine a script which finds a button by ID and attaches an  
event listener to it. A possible markup looks like this:

<div>
     ...blog entry body...
</div>
<button id="addtomemories">Add this entry to memories</button>
<script>
document.getElementById('addtomemories').addEventListener('click',  
doSomeNiceAJAX);
</script>

So, a malicious blog author can make the following entry:

I have found a <a href="#" id="addtomemories">cool website</a>.

Depending on how the browser handles duplicate IDs, any of the following  
unwanted effects may occur, or both:
1. Clicking the link in the blog entry adds the entry to memories list of  
the reader.
2. Clicking the real "Add this entry to memories" button does nothing.

One can think of other examples, possibly more dangerous. Other methods of  
identification (by tag name, by class, by CSS selector as proposed  
recently) are also vulnerable.

This kind of attack is hard to circumvent through use of HTML cleaners  
because id="addtomemories" looks like an innocent attribute, like an  
anchor for navigation. Preventing such attacks by a HTML cleaner would  
require either making a full list of all "forbidden" IDs, class names etc,  
or imposing Draconian rules upon user-supplied content, completely  
disallowing such useful attributes like id and class.

How to address this security issue is an open question. Always using  
carefully constructed XPath expressions for finding the nodes may be a  
solution because an XPath expression can specify the whole path starting  
 from the root, like /html/body/button[@id="addtomemories"] (though  
careless XPath expressiions like //[@id="addtomemories"] can be vulnerable  
as well).

Another solution may be to define functions like getElementById(),  
getElementsByTagName() etc so that they don't cross sandbox boundaries  
during their recursive search, at least by default. (If the sandbox  
proposal makes it to the spec, of course.)

Ideas are welcome.

-- Opera M2 9.0 TP2 on Debian Linux 2.6.12-1-k7
* Origin: X-Man's Station at SW-Soft, Inc. [ICQ: 115226275]  
<alexey at feldgendler.ru>