[whatwg] Recursion and loops of Microdata items

Fri Jun 10 00:07:43 PDT 2011

On Thu, 09 Jun 2011 19:01:24 +0200, Ian Hickson <ian at hixie.ch> wrote:

> On Thu, 9 Jun 2011, Philip Jägenstedt wrote:
>>
>> As for the spec, I don't think it can or needs to define the algorithm  
>> on a
>> form suitable form implementation. Something along these lines would be  
>> much
>> clearer for reference:
>>
>> 1. create a (possibly disconnected) graph of all the items in the  
>> document (or
>> subtree)
>> 2. find the strongly connected components
>> 3. create a list of "loopy" items: those that are in the same component  
>> as any
>> other item
>>
>> The traversal would remain mostly as before, but whenever an item is
>> encountered, one checks if it is in the list of "loopy" items and if so
>> ignores it. Since "loopy" is a global property, you'll see the same  
>> properties
>> regardless of the path taken to reach it, which may or may not be the  
>> case
>> with the current spec. (In any case, it's a nice feature.)
>
> The main reason I didn't do something like this with the current spec was
> that I was trying to minimise the work needed when implementing the API  
> in
> a dynamic situation. The above would imply that any time anything in the
> document changed in a way that could affect microdata, you'd have to redo
> the whole document before the next time the API was invoked. That seems
> expensive. (Consider the WHATWG spec, which has microdata in it and is
> about 5MB. Do you really want to crawl the whole document looking for
> microdata each time the API is invoked?)
>
> What I had tried to do when implementing the spec is start at whatever
> point in the DOM the API call was related to, and then search for loops
> from that point, and drop anything loopy. That's still expensive, but it
> at least minimises the total amount of work.
>
> Does that make sense?
>
> If the expense isn't a big deal then I don't mind doing it the other way
> too, but this seems like an API that we're going to have enough trouble
> getting implemented in the first place without giving implementors a
> reason to avoid doing it at all.

I don't think the spec needs to be giving suggestions for efficient  
implementation for live collections, because we inevitable won't implement  
exactly that algorithm anyway. It is a problem if the algorithm doesn't  
clearly map to some simpler higher-level behavior, as we certainly don't  
want to emulate some edge-cases just to follow the letter of the spec.

But, let's disregard the exact algorithm for a moment and only consider  
the actual requirement we're suggesting: "any item which can reach itself  
by following itemrefs should be removed"

It seems to me that it's quite possible to check this criteria while  
traversing using an algorithm of similar structure to what is already in  
the spec. One issue is that one must first find all the loopy items and  
then remove them in one step, rather than interleaving the  
checking/removing. However, it seems that this could be solved by simply  
flagging them instead of actually removing them, so that they will still  
take part in later loop-checks.

Am I missing something significant about the spec'd algorithm that would  
make it more efficient than the above?

If we just go ahead and show an efficient (enough) implementation of loop  
removal using the suggested criteria, I assume that would be convincing  
enough? Or do you really think that itemref is useless and complicated  
enough that it would be better to throw it overboard?

-- 
Philip Jägenstedt
Core Developer
Opera Software