[whatwg] The problem of duplicate ID as a security issue

Thu Mar 16 03:45:54 PST 2006

On Wed, 15 Mar 2006 19:26:03 +0600, Mihai Sucan <mihai.sucan at gmail.com>  
wrote:

>> Sandboxes are quite special things, so we'll need a DOMSandbox anyway.  
>> But instead of adding things like getElementById() to the DOMSandbox  
>> interface, I tend to make the "fake document" which is visible from  
>> inside the sandbox a member of the sandbox itself. The call will look  
>> like sandbox.document.getElementById().

> As Ric said, having <sandbox>es treated "too similar" to a document is  
> overkill.

A DOMDocument interface has to be exposed to the contained scripts anyway,  
ahy not also make it accessible from the outside?

>> (A wild thought: maybe enforce ID uniqueness only for <!DOCTYPE html>?)

> I think enforcing ID uniqueness in standards mode would be good, but  
> that would still probably break (very?) few pages. Those web authors  
> should have to "live with it", because they want standards-compliant  
> sites.

I'm not speaking about enforcing ID uniqueness at the time of parsing the  
page, but only at the time of calling getElementById(). I believe it will  
break very few pages, if any.

I know that many web applications have bugs like this: they have a CSS  
rule like "#titlebar { font-weight: bold; }" and a single titlebar on the  
page. After that, the requirements change, and they have more than one  
titlebar on a page. To make the rule apply to all titlebars, they give  
them all the same ID (instead of using class, not ID, in CSS rules). While  
such documents are non-connforming, they should not, in my opinion, cause  
parse errors even in standards mode. Here is why: duplicate IDs are wrong,  
but it's obvious what the author means, and it's easy to do "what the  
author intended".

Usually in such applications the scripts don't call getElementById() for  
those ID values which occur more than once. If they occasionally do, it's  
really a programming bug. I don't believe that there are applications that  
really rely on the particular behavior in this case, though I admit that  
there are possibly some that have this bug unnoticed and still work. I  
think that this case should trigger an exception in standards mode  
because, for this bug, there is no obvious fix to apply, and we don't know  
"what the author meant" -- does he want to do something to the first  
element with the specified ID, the second, or both.

> Side note and wild guess: We are probably forgeting that the beauty of  
> the web is actually allowing everyone to contribute, be it bad code or  
> better code. Wanting something *that* strict is like disproving one of  
> the essential concepts contributing to the success of the web.

Simply picking the last matching node is actually hiding a bug and letting  
it go unnoticed. (Why the last one? Why not the first, for example?)

>> And, by the way, blog entries aren't the only place where sandboxing  
>> can be appliied in blogs. For example, LiveJournal allows user-defined  
>> journal styles which are written by the users in a self-invented  
>> programming language which outputs HTML. That HTML goes through the  
>> HTML cleaner afterwards, of course. Manny people would love to add  
>> dynamic menus, AJAX comments folding etc to their styles. This could be  
>> partly solved with a set of predefined "toys", but actually the entire  
>> LiveJournal styling system is about user-initiated development. Those  
>> with programming skills write new styles, and other users may take and  
>> use them.

> I did not see LiveJournal, so I don't know what kind of features they  
> offer.
>
> <sandbox> would probably do "the trick" (would help a lot with security  
> in this case also).

Yes, I think so. Actually, my activity around the sandboxing idea has been  
inspired by several recent security incidents with LiveJournal and its  
styling system which failed to filter out some patterns of dangerous HTML.

> Take HTML, for example, it's a markup language greatly appreciated by  
> many and despised by others. Even you said in one reply to this thread  
> "today's HTML sucks" - advocating for the need of allowing user-scripts  
> in pages, for having table sorting, popup menus, etc. A few minutes  
> later in another reply you say "we already have a great markup language,  
> which is HTML" - advocating for allowing users to write HTML, instead of  
> custom markup.

Yeah, really, I sound a bit contradictory. Actually, in my opinion, HTML  
is better than most of ad-hoc markup languages, and HTML with scripts is  
still better than just HTML.

And another thing: HTML 5 is about to make HTML pages more powerful, there  
are going to be menus, datagrids and such, but most of these features are  
useless without scripting, aren't they? For example, a datagrid isn't  
really sortable at client side without a script, which makes it useless in  
blogs and CMS unless they allow some scripting.

> So, <sandbox> may be designed to help tighting-up security on the web,  
> but we should also try to think of how's it actually in usage,  
> side-effects, etc. It definitely solves problems, but will it cause  
> other problems? How important are they?

Of course, there is a lot more to think and talk about. I suppose there  
are going to be problems with particular buggy implementations of  
sandboxing and exploits specifically targetted at holes in such  
implementations. I suspect that web application authors and site  
administrators will be hesitant to allow user scripting even in sandboxes  
because of the possible browser bugs. Though, because sandboxes can be  
useful even if scripting inside them is completely disallowed, I hope that  
the use of sandboxes becomes somewhat popular even before site  
administrators decide to allow scripting.

-- Opera M2 9.0 TP2 on Debian Linux 2.6.12-1-k7
* Origin: X-Man's Station at SW-Soft, Inc. [ICQ: 115226275]  
<alexey at feldgendler.ru>