[whatwg] The problem of duplicate ID as a security issue
alexey at feldgendler.ru
Thu Mar 9 22:45:29 PST 2006
Does the current version of the spec define what happens to elements with
duplicate ID values?
The problem of duplicate ID isn't just another issue where it's nice to
have some well-defined error recovery just for uniformity. There are cases
when duplicate IDs should be viewed as a security concern.
Consider a script which augments the HTML page after it has been parsed by
attaching event listeners to elements in the DOM tree, inserting new nodes
into the tree etc. This is common practice, for example, for many
web-based WYSIWYG editors. In this scenario, any method the script uses
for identificaation of the DOM nodes subject to augmentation is vulnerable
to possible spoofing by user-supplied content present on the same page.
For example, imagine a script which finds a button by ID and attaches an
event listener to it. A possible markup looks like this:
...blog entry body...
<button id="addtomemories">Add this entry to memories</button>
So, a malicious blog author can make the following entry:
I have found a <a href="#" id="addtomemories">cool website</a>.
Depending on how the browser handles duplicate IDs, any of the following
unwanted effects may occur, or both:
1. Clicking the link in the blog entry adds the entry to memories list of
2. Clicking the real "Add this entry to memories" button does nothing.
One can think of other examples, possibly more dangerous. Other methods of
identification (by tag name, by class, by CSS selector as proposed
recently) are also vulnerable.
This kind of attack is hard to circumvent through use of HTML cleaners
because id="addtomemories" looks like an innocent attribute, like an
anchor for navigation. Preventing such attacks by a HTML cleaner would
require either making a full list of all "forbidden" IDs, class names etc,
or imposing Draconian rules upon user-supplied content, completely
disallowing such useful attributes like id and class.
How to address this security issue is an open question. Always using
carefully constructed XPath expressions for finding the nodes may be a
solution because an XPath expression can specify the whole path starting
from the root, like /html/body/button[@id="addtomemories"] (though
careless XPath expressiions like //[@id="addtomemories"] can be vulnerable
Another solution may be to define functions like getElementById(),
getElementsByTagName() etc so that they don't cross sandbox boundaries
during their recursive search, at least by default. (If the sandbox
proposal makes it to the spec, of course.)
Ideas are welcome.
-- Opera M2 9.0 TP2 on Debian Linux 2.6.12-1-k7
* Origin: X-Man's Station at SW-Soft, Inc. [ICQ: 115226275]
<alexey at feldgendler.ru>
More information about the whatwg