[whatwg] DOMTokenList is unordered but yet requires sorting

Mon Jul 27 20:33:21 PDT 2009

> On Mon, 13 Jul 2009, Sylvain wrote:
>>
>> This is a bit unrelated, but when looking at the DOMTokenList
>> implementation, I had an idea about an alternative algorithm that could
>> be easier to implement and could also be described more simply in the
>> spec. The disadvantage is that the DOMTokenList methods mutating the
>> underlying string wouldn't preserve existing whitespace (which the
>> current algorithms try hard to do).
>>
>> The idea is that any DOMTokenList method that mutates the underlying string
>> would do:
>>  - split the attribute in unique tokens (preserving order).
>>  - add or remove the token according to the method called.
>>  - rebuild the attribute string by concatenating tokens together (with a
>> single space).
>>
>> At first, this may look like inefficient (if implemented naively).
>> But I guess that implementations will usually keep both the attribute string
>> and a list of tokens in memory, so they wouldn't have to tokenize the string
>> on every mutation. There is a small performance hit during attribute
>> tokenization: the list of tokens would need to keep only unique tokens. But
>> after that, the DOMTokenList methods are very simple: length/item() don't need
>> to take care of duplicates, add/remove/toggle are simple list manipulation
>> (the attribute string could be lazily generated from the token list when
>> needed).
>>
>> To summarize:
>> pros: simpler spec algorithms, simpler implementation
>> cons: less whitespace preservation, small perf hit during tokenization
>>
>> I don't know if I'm missing something. Does this sound reasonable?
>
> It ends up being not much simpler since you still have to deal with direct
> changes to the underlying string, as far as I can tell.

On changes to the underlying string (using .setAttribute) you always
have to reparse from scratch anyway, so doesn't seem like that matters
here?

> On Mon, 13 Jul 2009, Jonas Sicking wrote:
>>
>> I do agree that the spec seems to go extraordinary far to not touch
>> whitespace. Normalizing whitespace when parsing is a bad idea, but once
>> the user modifies the DOMTokenList, I don't see a lot of value in
>> maintaining whitespace exactly as it was.
>>
>> Ian: What is the reason for the fairly complicated code to deal with
>> removals? At least in Gecko it would be much simpler to just regenerate
>> the string completely. That way generating the string-value could just
>> be dropped on modifications, and regenerated lazily when requested.
>
> In general, I try to be as conservative as possible in making changes to
> the DOM. Are the algorithms really as complicated as you're making out?
> They seem pretty trivial to me.

At least in the gecko implementation it's significantly more complex
than not normalizing whitespace. The way that the implementation works
in gecko is:

When a class attribute is set, (during parsing or using setAttribute)
we parse the classlist into a list of tokens. We additionally keep
around the original string in order to preserve a correct DOM
(actually, as an optimization, we only do this if needed).

This token-list is then used during Selector matching and during
getElementsByClassName.

So far I would expect most implementations to match this.

It would be very nice if DOMTokenList could be implemented as simply
exposing this internal token list. With the recent change to not
remove duplicates reading operations like .length and .item(n) maps
directly to reading from this token list. All very nice.

However writing operations such as .add and .remove requires operating
on the string rather than the internal token-list. The current spec
requires .remove to duplicate the tokenization process (granted, a
pretty simple task) and modify the string while tokenizing. It would
be significantly simpler if you could just modify the token-list as
needed and then regenerate the string from the token-list.

/ Jonas