[whatwg] DOMTokenList is unordered but yet requires sorting

Mon Jul 27 20:52:17 PDT 2009

On Mon, Jul 27, 2009 at 8:24 PM, Sylvain Pasche<sylvain.pasche at gmail.com> wrote:
> On Tue, Jul 28, 2009 at 2:17 AM, Ian Hickson<ian at hixie.ch> wrote:
>> On Sun, 12 Jul 2009, Jonas Sicking wrote:
>>> >>
>>> >> Oh, I have forseen that. Is it really necessary to remove duplicates
>>> >> ? I imagine DOMTokenList to be similar to what can be achieved with a
>>> >> String.split(), but then it would be just more duplicate
>>> >> functionality.
>>> >
>>> > If we don't remove duplicates, then things like the .toggle() method
>>> > could have some quite weird effects.
>>>
>>> Such as?
>>
>> Such as .length changing by more than 1 after a call to .toggle().
>
> I guess that couldn't have happened, because .length counted only the
> unique tokens.
>
>>> I definitely think it'd be worth avoiding the code complexity and perf
>>> hit of having the implementation remove duplicates if they appear in the
>>> class attribute given how extremely rare duplicates are.
>>
>> Fair enough. I've made DOMTokenList not remove duplicates.
>
> ok, I realize now that this is about the duplicates in .length and item().
>
> By the way, preserving duplicates shouldn't be much code complexity if
> I'm not mistaken.

I take it you mean *removing* duplicates here, right?

> The only required code change would be to use a hashset when parsing
> the attribute in order to only insert unique tokens in the token
> vector. Then DOMTokenList.length would return the token vector length
> and .item() get the token by index. I don't think anything actually
> depends on keeping duplicate tokens in the token vector. Then there
> would be a small perf hit when parsing attributes with more than one
> token.

It's certainly doable to do this at the time when the token-list is
parsed. However given how extremely rare duplicated classnames are (I
can't recall ever seeing it in a real page), I think any code spent on
dealing with it is a waste.

>> On Mon, 13 Jul 2009, Jonas Sicking wrote:
>>>
>>> I do agree that the spec seems to go extraordinary far to not touch
>>> whitespace. Normalizing whitespace when parsing is a bad idea, but once
>>> the user modifies the DOMTokenList, I don't see a lot of value in
>>> maintaining whitespace exactly as it was.
>>>
>>> Ian: What is the reason for the fairly complicated code to deal with
>>> removals? At least in Gecko it would be much simpler to just regenerate
>>> the string completely. That way generating the string-value could just
>>> be dropped on modifications, and regenerated lazily when requested.
>>
>> In general, I try to be as conservative as possible in making changes to
>> the DOM. Are the algorithms really as complicated as you're making out?
>> They seem pretty trivial to me.
>
> The remove() algorithm is about 50 lines with whitespace and comments.
> After all, that's not a big cost and I guess that preserving
> whitespace may be closer to what DOMTokenList API consumers would
> expect.

The code would be 7 lines if we didn't need to preserve whitespace:

nsAttrValue newAttr(aAttr);
newAttr->ResetMiscAtomOrString();
nsCOMPtr<nsIAtom> atom = do_GetAtom(aToken);
while (newAttr->GetAtomArrayValue().RemoveElement(atom));
nsAutoString newValue;
newAttr.ToString(newValue);
mElement->SetAttr(...);

If you spent a few more lines of code you could even avoid serializing
the token-list and call SetAttrAndNotify instead of SetAttr.

/ Jonas