[whatwg] DOMTokenList is unordered but yet requires sorting

Mon Jul 13 06:24:49 PDT 2009

On 7/13/2009 7:26 AM, Jonas Sicking wrote:
> On Sun, Jul 12, 2009 at 9:00 PM, Ian Hickson<ian at hixie.ch>  wrote:
>> If we don't remove duplicates, then things like the .toggle() method could
>> have some quite weird effects.
>
> Such as? The only one I can think of is that calling .toggle() would
> remove multiple items. I don't see that as a problem?

I think Ian is referring to duplicates here as "duplicates of the token 
being removed", not as "duplicates of any token in the underlying 
attribute". In the current spec algorithm, removing or toggling a token 
won't remove duplicates in other tokens.

> Define .remove() as removing all tokens with the given value, and .toggle() as:
>
> function toggle(token) {
>    if (this.contains(token))
>      this.remove(token);
>    else
>      this.add(token);
> }

That's what toggle() does right now. (With the small difference that it 
also returns a boolean to indicate if the token was removed or added).

> I definitely think it'd be worth avoiding the code complexity and perf
> hit of having the implementation remove duplicates if they appear in
> the class attribute given how extremely rare duplicates are.

This is a bit unrelated, but when looking at the DOMTokenList 
implementation, I had an idea about an alternative algorithm that could 
be easier to implement and could also be described more simply in the 
spec. The disadvantage is that the DOMTokenList methods mutating the 
underlying string wouldn't preserve existing whitespace (which the 
current algorithms try hard to do).

The idea is that any DOMTokenList method that mutates the underlying 
string would do:
  - split the attribute in unique tokens (preserving order).
  - add or remove the token according to the method called.
  - rebuild the attribute string by concatenating tokens together (with 
a single space).

At first, this may look like inefficient (if implemented naively).
But I guess that implementations will usually keep both the attribute 
string and a list of tokens in memory, so they wouldn't have to tokenize 
the string on every mutation. There is a small performance hit during 
attribute tokenization: the list of tokens would need to keep only 
unique tokens. But after that, the DOMTokenList methods are very simple: 
length/item() don't need to take care of duplicates, add/remove/toggle 
are simple list manipulation (the attribute string could be lazily 
generated from the token list when needed).

To summarize:
pros: simpler spec algorithms, simpler implementation
cons: less whitespace preservation, small perf hit during tokenization

I don't know if I'm missing something. Does this sound reasonable?

Sylvain