[whatwg] The problem of duplicate ID as a security issue

Thu Mar 16 04:33:30 PST 2006

Le Thu, 16 Mar 2006 13:45:54 +0200, Alexey Feldgendler  
<alexey at feldgendler.ru> a écrit:

<...>
> A DOMDocument interface has to be exposed to the contained scripts  
> anyway, ahy not also make it accessible from the outside?

Yes, but I'm afraid it's a technical challenge to implementors. Their  
browser engines might need some rewrites to properly support <sandbox>ing  
content. Therefore, instead of rewrites, they'll hack the <sandbox>es,  
opening a wide variety of security holes competing for the crown of "the  
first web virus".

<...>
> I'm not speaking about enforcing ID uniqueness at the time of parsing  
> the page, but only at the time of calling getElementById(). I believe it  
> will break very few pages, if any.
>
> I know that many web applications have bugs like this: they have a CSS  
> rule like "#titlebar { font-weight: bold; }" and a single titlebar on  
> the page. After that, the requirements change, and they have more than  
> one titlebar on a page. To make the rule apply to all titlebars, they  
> give them all the same ID (instead of using class, not ID, in CSS  
> rules). While such documents are non-connforming, they should not, in my  
> opinion, cause parse errors even in standards mode. Here is why:  
> duplicate IDs are wrong, but it's obvious what the author means, and  
> it's easy to do "what the author intended".
>
> Usually in such applications the scripts don't call getElementById() for  
> those ID values which occur more than once. If they occasionally do,  
> it's really a programming bug. I don't believe that there are  
> applications that really rely on the particular behavior in this case,  
> though I admit that there are possibly some that have this bug unnoticed  
> and still work. I think that this case should trigger an exception in  
> standards mode because, for this bug, there is no obvious fix to apply,  
> and we don't know "what the author meant" -- does he want to do  
> something to the first element with the specified ID, the second, or  
> both.

Under no way should this happen. This is adding confusion to an already  
over-confused web author (as in: a web author who doesn't know much web  
development).

Therefore, it's clear nothing has to be changed in quirks mode, but in  
standards mode:

1. break during parsing.
2. break JS code if it sets the id of a node to a duplicate ID.

Or simply leave it as it is: quirks mode behaviour.

<...>
> Simply picking the last matching node is actually hiding a bug and  
> letting it go unnoticed. (Why the last one? Why not the first, for  
> example?)

That's true, but this happens in many, many other cases.

<...>
>> I did not see LiveJournal, so I don't know what kind of features they  
>> offer.
>>
>> <sandbox> would probably do "the trick" (would help a lot with security  
>> in this case also).
>
> Yes, I think so. Actually, my activity around the sandboxing idea has  
> been inspired by several recent security incidents with LiveJournal and  
> its styling system which failed to filter out some patterns of dangerous  
> HTML.

True. As you said, there are risks with buggy <sandbox> implementations,  
but that's an advantage actually: relying on browser fixes, instead of  
site-by-site fixes. I prefer to get a single patch from the implementor,  
than wait for hundreds of sites to fix them. Yet, this is an advantage to  
malicious users too: distribution of the "virus"/exploit can be very  
powerful and fast.

<...>
> Yeah, really, I sound a bit contradictory. Actually, in my opinion, HTML  
> is better than most of ad-hoc markup languages, and HTML with scripts is  
> still better than just HTML.

Exactly.

> And another thing: HTML 5 is about to make HTML pages more powerful,  
> there are going to be menus, datagrids and such, but most of these  
> features are useless without scripting, aren't they? For example, a  
> datagrid isn't really sortable at client side without a script, which  
> makes it useless in blogs and CMS unless they allow some scripting.

True.

>> So, <sandbox> may be designed to help tighting-up security on the web,  
>> but we should also try to think of how's it actually in usage,  
>> side-effects, etc. It definitely solves problems, but will it cause  
>> other problems? How important are they?
>
> Of course, there is a lot more to think and talk about. I suppose there  
> are going to be problems with particular buggy implementations of  
> sandboxing and exploits specifically targetted at holes in such  
> implementations. I suspect that web application authors and site  
> administrators will be hesitant to allow user scripting even in  
> sandboxes because of the possible browser bugs. Though, because  
> sandboxes can be useful even if scripting inside them is completely  
> disallowed, I hope that the use of sandboxes becomes somewhat popular  
> even before site administrators decide to allow scripting.

True, but I'd test. If it works in major browsers as I want, then why not?  
Especially in the case of intranet web applications.

-- 
http://www.robodesign.ro
ROBO Design - We bring you the future