[whatwg] Recursion and loops of Microdata items

Wed Jun 8 12:51:57 PDT 2011

On Wed, 8 Jun 2011, Tomasz Jamroszczak wrote:
> 
> I've been looking into Microdata specification and it struck me, that 
> crawling algorithm is so complex, when it comes to expressing simple 
> ideas.  I think that foremost the algorithm should be described in the 
> specification with explanation what it's supposed to do, before steps of 
> what exactly is to be done are written.
>
> Let's see, what are the properties of Microdata item from HTML element 
> with id=up from following HTML:
> 
> <div itemscope id=up itemprop=prop0>
> 	<div itemscope id=down itemprop=prop1 itemref="up"></div>
> </div>

The microdata item with id=up has exactly one property, "prop1", whose 
value is an item.

> CRAWL
> root = up
> memory = {}
> 1. xxx
> 2. COLLECT
>   1. results = {}
>      pending = {}
>   3. pending = {down}
>   4. xxx
>   5. pending = {}
>      current = down
>   7. xxx
>   8. results = {down}
>   results = {down}
> 3. xxx
> 4. new_memory = {up}
> 5. element = down
>   CRAWL
>   0. memory2 = {up}
>      root2 = down
>   1. xxx
>   2. COLLECT
>      1. results2 = {}
>         pending2 = {}
>      3. xxx
>      4. pending2 = {up}
>      5. pending2 = {}
>         current2 = up
>      7. xxx
>      8. results2 = {up}
>      results2 = {up}
>   3. xxx
>   4. new_memory2 = {up, down}
>   5. element2 = up
>      CRAWL
>      0. memory3 = {up, down}
>         root3 = up
>      1. return FAIL
> !!!   results2 = results2 - up = {}
>   7. return results2 == {} (not FAIL).
> 7. return results == {down}
> 
> In the end properties of Microdata item from HTML element with id=up
> has length=1.

Yup.

> The troubling part is in the line marked with triple exclamation marks.  
> It means that step 5. of the algorithm should be simplified to "For each 
> element in results that has an itemscope attribute specified, if the 
> element is equal to /root/, then remove the element from results [and 
> increment errors]".  Further recursive crawling is not needed.

Yeah, you're right, the way it's written is way more complex than needed 
to get the effect described. It goes beyond what you've described, in 
fact; for example, the "find the properties of an item" algorithm defined 
a few paragraphs earlier says what to do when the "crawl" algorithm fails, 
but it cannot ever fail as written in that context.

> But then there's problem with infinite recursion when going through 
> stringification algorithm of ["extract the microdata from those nodes 
> into a JSON form"] for HTML given above.  We can proceed in two ways:
> 
> a) allow loops of Microdata items and make JSONification of Microdata 
> item behave just like JSONification of any javascript object, that is - 
> throw exception when loop is found.  Or
> 
> b) exclude loops of Microdata items (so in above example Microdata item 
> from HTML element with id=up would have no Microdata properties).  This 
> will result in crippling functionality of a quite nice HTML API, but 
> also it will produce consistent results in HTMLPropertiesCollection and 
> stringification.  Third solution:
> 
> c) cut only offending links, is not good, because in case of graph of 
> Microdata items with following paths: "A->B->C->D->B" and "E->D" 
> stringification of item A would result in item D having no properties, 
> while stringification of E would result in D having property B - so 
> presence of property would depend on path's starting part.

The idea, IIRC, was that an element would never have properties that 
involved a loop. Clearly I failed at speccing that, but that was the 
intent.

> 	I can imagine good usages of loops of Microdata items, for example
> "John knows Amy, Amy knows John":
> 
> <div itemscope id="john" itemprop>
> 	<div itemprop="friends" itemref="fred1 jenny2 amy1"></div>
> </div>
> <div itemscope id="amy1" itemprop>
> 	<div itemprop="friends" itemref="john"></div>
> </div>
> 
> There's loop:  jonh->amy1->john->... .

There's no loop there. It's invalid. itemref="" only works on an element 
with an itemscope attribute. This use case is currently intentionally not 
explicitly supported; officially the way to handle this case is using 
itemid="" and URLs (much as in one example you had that I snipped here).

> The problem I'm addressing revolves around meaning of link between 
> itemref and id attributes.  Is it meant to be a part of Microdata data 
> model? Or maybe it is introduced to cope with the fact that Microdata 
> graph is defined on top of existing data, which is something completely 
> different, and is meant to be rendered to the user (that is on top of 
> HTML tree)?  So the meaning of itemref attribute should also hint 
> interpretation of it inside the specification.

The goal of itemref="" was just to have a way to handle cases where you 
have an item's properties scattered around a document.

It's caused us more difficulties than helped anything, as far as I can 
tell. Has anyone implemented it or used it and liked it? I'd be fine with 
removing it if it's not a lot of trouble...

I haven't fixed the algorithm to be written more simply, nor fixed the 
loops in the JSON stuff, because if we remove itemref="" then those 
problems just go away. If we want to keep itemref="", though, I will fix 
them. Any opinions one way or the other?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'