[whatwg] DOMTokenList is unordered but yet requires sorting

Mon Jul 27 20:24:46 PDT 2009

On Tue, Jul 28, 2009 at 2:17 AM, Ian Hickson<ian at hixie.ch> wrote:
> On Sun, 12 Jul 2009, Jonas Sicking wrote:
>> >>
>> >> Oh, I have forseen that. Is it really necessary to remove duplicates
>> >> ? I imagine DOMTokenList to be similar to what can be achieved with a
>> >> String.split(), but then it would be just more duplicate
>> >> functionality.
>> >
>> > If we don't remove duplicates, then things like the .toggle() method
>> > could have some quite weird effects.
>>
>> Such as?
>
> Such as .length changing by more than 1 after a call to .toggle().

I guess that couldn't have happened, because .length counted only the
unique tokens.

>> I definitely think it'd be worth avoiding the code complexity and perf
>> hit of having the implementation remove duplicates if they appear in the
>> class attribute given how extremely rare duplicates are.
>
> Fair enough. I've made DOMTokenList not remove duplicates.

ok, I realize now that this is about the duplicates in .length and item().

By the way, preserving duplicates shouldn't be much code complexity if
I'm not mistaken.

The only required code change would be to use a hashset when parsing
the attribute in order to only insert unique tokens in the token
vector. Then DOMTokenList.length would return the token vector length
and .item() get the token by index. I don't think anything actually
depends on keeping duplicate tokens in the token vector. Then there
would be a small perf hit when parsing attributes with more than one
token.

>> To summarize:
>> pros: simpler spec algorithms, simpler implementation
>> cons: less whitespace preservation, small perf hit during tokenization
>>
>> I don't know if I'm missing something. Does this sound reasonable?
>
> It ends up being not much simpler since you still have to deal with direct
> changes to the underlying string, as far as I can tell.

I don't think changing the underlying string is related to that
algorithm (from an implementation point of view). On setting, the
tokens would be deleted and the attribute parsed again.

> On Mon, 13 Jul 2009, Jonas Sicking wrote:
>>
>> I do agree that the spec seems to go extraordinary far to not touch
>> whitespace. Normalizing whitespace when parsing is a bad idea, but once
>> the user modifies the DOMTokenList, I don't see a lot of value in
>> maintaining whitespace exactly as it was.
>>
>> Ian: What is the reason for the fairly complicated code to deal with
>> removals? At least in Gecko it would be much simpler to just regenerate
>> the string completely. That way generating the string-value could just
>> be dropped on modifications, and regenerated lazily when requested.
>
> In general, I try to be as conservative as possible in making changes to
> the DOM. Are the algorithms really as complicated as you're making out?
> They seem pretty trivial to me.

The remove() algorithm is about 50 lines with whitespace and comments.
After all, that's not a big cost and I guess that preserving
whitespace may be closer to what DOMTokenList API consumers would
expect.

Sylvain