[whatwg] Possible bugs : Microdata Itemscope on <link/> and <meta/>

Tim van Oostrom tim at depulz.nl
Sun Nov 29 09:57:39 PST 2009


Tim van Oostrom wrote:
> Philip Jägenstedt wrote:
>> On Sun, 29 Nov 2009 12:46:16 +0100, Tim van Oostrom <tim at depulz.nl> 
>> wrote:
>>
>>> Philip Jägenstedt wrote:
>>>> On Thu, 26 Nov 2009 22:30:41 +0100, Tim van Oostrom <tim at depulz.nl> 
>>>> wrote:
>>>>
>>>>> Hi, I made a forumpost : 
>>>>> http://forums.whatwg.org/viewtopic.php?t=4176, concerning a 
>>>>> possible "microdata specification bug" and a bug in the 
>>>>> james.html5.org microdata extractor.
>>>>>
>>>>> Comes down to <link/> and <meta/> elements possibly being unfit 
>>>>> for use with the itemscope attribute.
>>>>>
>>>>> I made an example in the forum post with some nice ubb formatting .
>>>>>
>>>> There are some other issues with <link> and <meta> you might want 
>>>> to review first: [1]
>>> Ok
>>>> Your second example was:
>>>>
>>>> <div itemtype="http://url.to/geoVocab#country" itemscope>
>>>>    <span itemprop="http://xmlns.com/foaf/spec/index.rdf#name" 
>>>> lang="cn">中華人民共和國</span>
>>>>    <span itemprop="http://xmlns.com/foaf/spec/index.rdf#name" 
>>>> lang="en">China</span>
>>>>    <link itemprop="http://url.to/city" 
>>>> href="http://url.to/shanghai" itemscope itemref="city-shanghai" />
>>>>    <div id="city-shanghai">
>>>>       <span 
>>>> itemprop="http://xmlns.com/foaf/spec/index.rdf#name">Shanghai</span>
>>>>       <span itemprop="http://url.to/demoVocab#population">14.61 
>>>> million people</span>
>>>>       <span itemprop="http://url.to/physicsVocab#time" 
>>>> datetime="2009-11-26 11:43">11:43 pm (CT)</span>
>>>>    </div>
>>>> </div>
>>>>
>>>> <link>, <meta> and any other void elements are usually the wrong 
>>>> choice for itemprop+itemscope because they don't have child 
>>>> elements, so itemref is the only way to add properties.
>>> Yes, see forumpost. Shouldn't this be noted in the Spec then ? 
>>
>> Yes, the spec certainly needs some notes on how to use <link> and 
>> <meta>.
>

And other void alements such as : area, base, br, col, command, embed, 
hr, img, input, link, meta, param, source 
(http://dev.w3.org/html5/markup/syntax.html)

Basically, the microdata can't really be on all elements as stated in : 
HTML5 spec, 5.2.2 Items

>>> According to this an "itemref" attribute can never be added to an 
>>> "item" within an itemscope of another "item" without the crawled 
>>> prop/val pairs also applying to the ancestors itemscope.
>>
>> Ah, I think you've found the root of the problem. By allowing a 
>> property to be part of several items at once, we get different kinds 
>> of strange problems. Except from messing up your example, it seems it 
>> is the real cause for the infinite recursion bug I wrote about in 
>> [1]. Then I was so focused on the recursion that I suggested a rather 
>> complex solution to detect loops in the microdata, when it seems it 
>> could be solved simply be making sure that a property belongs to only 
>> 1 item. Detailed suggestion below.
>> Now, back to the problem of one property, multiple items. The 
>> algorithm for finding the properties of an item [2] is an attempt at 
>> optimizing the search for properties starting at an item element. I 
>> think we should replace this algorithm with an algorithm for finding 
>> the item of a property. This was previously the case with the spec 
>> before the itemref mechanism. I would suggest something along these 
>> lines:
>>
>> 1. let current be the element with the itemprop attribute
>> 2. if current has an ID, for each element e in document order:
>> 2.1. if e has an itemref attribute:
>> 2.1.1. split the value of that itemref attribute on spaces. for each 
>> resulting token, ID:
>> 2.1.1.1. if ID equals the ID of current, return e
>> 3. reaching this step indicates that the item wasn't found via 
>> itemref on this element
>> 4. let parent be the parent element of current
>> 5. if parent is null, return null
>> 6. if parent has the itemscope attribute, return parent
>> 7. otherwise, let current be parent and jump to step 2.
>>
>> This algorithm will find the parent item of a property, if there is 
>> one. itemref'ing takes precedence over "parent-child linking", so in 
>> Tim's example the properties of Shanghai would be applied to only the 
>> Shanghai sub-item. I'm not convinced writing markup like that is a 
>> good idea, but at least this way it has sane processing.

Which is important in the markup-souped web of non-linked-data :-)

>> HTMLPropertiesCollection on any given element would simply match all 
>> elements in the document for which the the algorithm returns that 
>> very element. It should be invalid for there to be any elements in 
>> the document with itemprop where this algorithm returns null or the 
>> element itself.
>>
>> I will try implementing this algorithm in MicrodataJS [3] and see if 
>> it works OK. While it may look less efficient than the current 
>> algorithm, consider that a browser won't implement either algorithm 
>> as writting, only act as if they did. The expensive step of going 
>> through all elements with itemref attributes is actually no more 
>> expensive than e.g. document.querySelector('.classname') if 
>> implemented natively.

I did something like this in my experimental/unfinished/test/learn 
microdata extractor based on jquery which is here : 
http://www.depulz.nl/microdata/ (works at least in FF 3.5 and opera 10.10).

>> [1] 
>> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-November/024095.html 
>>
>> [2] 
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#the-properties-of-an-item 
>>
>> [3] http://gitorious.org/microdatajs
>>
>




More information about the whatwg mailing list