[whatwg] DOMTokenList is unordered but yet requires sorting

Mon Jul 13 16:42:37 PDT 2009

On Mon, Jul 13, 2009 at 6:24 AM, news.gmane.org<sylvain.pasche at gmail.com> wrote:
> On 7/13/2009 7:26 AM, Jonas Sicking wrote:
>>
>> On Sun, Jul 12, 2009 at 9:00 PM, Ian Hickson<ian at hixie.ch>  wrote:
>>>
>>> If we don't remove duplicates, then things like the .toggle() method
>>> could
>>> have some quite weird effects.
>>
>> Such as? The only one I can think of is that calling .toggle() would
>> remove multiple items. I don't see that as a problem?
>
> I think Ian is referring to duplicates here as "duplicates of the token
> being removed", not as "duplicates of any token in the underlying
> attribute". In the current spec algorithm, removing or toggling a token
> won't remove duplicates in other tokens.

That's what I was referring to too. For example if a DOMTokenList were
to contain two "hello" tokens, and the user called .toggle(), that
would remove two tokens from the list (both with the value "hello").

>> Define .remove() as removing all tokens with the given value, and
>> .toggle() as:
>>
>> function toggle(token) {
>>   if (this.contains(token))
>>     this.remove(token);
>>   else
>>     this.add(token);
>> }
>
> That's what toggle() does right now. (With the small difference that it also
> returns a boolean to indicate if the token was removed or added).

Yup, so I don't really see a problem with allowing multiple tokens of
the same value.

> This is a bit unrelated, but when looking at the DOMTokenList
> implementation, I had an idea about an alternative algorithm that could be
> easier to implement and could also be described more simply in the spec. The
> disadvantage is that the DOMTokenList methods mutating the underlying string
> wouldn't preserve existing whitespace (which the current algorithms try hard
> to do).
>
> The idea is that any DOMTokenList method that mutates the underlying string
> would do:
>  - split the attribute in unique tokens (preserving order).
>  - add or remove the token according to the method called.
>  - rebuild the attribute string by concatenating tokens together (with a
> single space).
>
> At first, this may look like inefficient (if implemented naively).
> But I guess that implementations will usually keep both the attribute string
> and a list of tokens in memory, so they wouldn't have to tokenize the string
> on every mutation. There is a small performance hit during attribute
> tokenization: the list of tokens would need to keep only unique tokens. But
> after that, the DOMTokenList methods are very simple: length/item() don't
> need to take care of duplicates, add/remove/toggle are simple list
> manipulation (the attribute string could be lazily generated from the token
> list when needed).
>
> To summarize:
> pros: simpler spec algorithms, simpler implementation
> cons: less whitespace preservation, small perf hit during tokenization
>
> I don't know if I'm missing something. Does this sound reasonable?

I do agree that the spec seems to go extraordinary far to not touch
whitespace. Normalizing whitespace when parsing is a bad idea, but
once the user modifies the DOMTokenList, I don't see a lot of value in
maintaining whitespace exactly as it was.

Ian: What is the reason for the fairly complicated code to deal with
removals? At least in Gecko it would be much simpler to just
regenerate the string completely. That way generating the string-value
could just be dropped on modifications, and regenerated lazily when
requested.

/ Jonas