[whatwg] Possible bugs : Microdata Itemscope on <link/> and <meta/>
Tim van Oostrom
tim at depulz.nl
Sun Nov 29 09:57:39 PST 2009
Tim van Oostrom wrote:
> Philip Jägenstedt wrote:
>> On Sun, 29 Nov 2009 12:46:16 +0100, Tim van Oostrom <tim at depulz.nl>
>> wrote:
>>
>>> Philip Jägenstedt wrote:
>>>> On Thu, 26 Nov 2009 22:30:41 +0100, Tim van Oostrom <tim at depulz.nl>
>>>> wrote:
>>>>
>>>>> Hi, I made a forumpost :
>>>>> http://forums.whatwg.org/viewtopic.php?t=4176, concerning a
>>>>> possible "microdata specification bug" and a bug in the
>>>>> james.html5.org microdata extractor.
>>>>>
>>>>> Comes down to <link/> and <meta/> elements possibly being unfit
>>>>> for use with the itemscope attribute.
>>>>>
>>>>> I made an example in the forum post with some nice ubb formatting .
>>>>>
>>>> There are some other issues with <link> and <meta> you might want
>>>> to review first: [1]
>>> Ok
>>>> Your second example was:
>>>>
>>>> <div itemtype="http://url.to/geoVocab#country" itemscope>
>>>> <span itemprop="http://xmlns.com/foaf/spec/index.rdf#name"
>>>> lang="cn">中華人民共和國</span>
>>>> <span itemprop="http://xmlns.com/foaf/spec/index.rdf#name"
>>>> lang="en">China</span>
>>>> <link itemprop="http://url.to/city"
>>>> href="http://url.to/shanghai" itemscope itemref="city-shanghai" />
>>>> <div id="city-shanghai">
>>>> <span
>>>> itemprop="http://xmlns.com/foaf/spec/index.rdf#name">Shanghai</span>
>>>> <span itemprop="http://url.to/demoVocab#population">14.61
>>>> million people</span>
>>>> <span itemprop="http://url.to/physicsVocab#time"
>>>> datetime="2009-11-26 11:43">11:43 pm (CT)</span>
>>>> </div>
>>>> </div>
>>>>
>>>> <link>, <meta> and any other void elements are usually the wrong
>>>> choice for itemprop+itemscope because they don't have child
>>>> elements, so itemref is the only way to add properties.
>>> Yes, see forumpost. Shouldn't this be noted in the Spec then ?
>>
>> Yes, the spec certainly needs some notes on how to use <link> and
>> <meta>.
>
And other void alements such as : area, base, br, col, command, embed,
hr, img, input, link, meta, param, source
(http://dev.w3.org/html5/markup/syntax.html)
Basically, the microdata can't really be on all elements as stated in :
HTML5 spec, 5.2.2 Items
>>> According to this an "itemref" attribute can never be added to an
>>> "item" within an itemscope of another "item" without the crawled
>>> prop/val pairs also applying to the ancestors itemscope.
>>
>> Ah, I think you've found the root of the problem. By allowing a
>> property to be part of several items at once, we get different kinds
>> of strange problems. Except from messing up your example, it seems it
>> is the real cause for the infinite recursion bug I wrote about in
>> [1]. Then I was so focused on the recursion that I suggested a rather
>> complex solution to detect loops in the microdata, when it seems it
>> could be solved simply be making sure that a property belongs to only
>> 1 item. Detailed suggestion below.
>> Now, back to the problem of one property, multiple items. The
>> algorithm for finding the properties of an item [2] is an attempt at
>> optimizing the search for properties starting at an item element. I
>> think we should replace this algorithm with an algorithm for finding
>> the item of a property. This was previously the case with the spec
>> before the itemref mechanism. I would suggest something along these
>> lines:
>>
>> 1. let current be the element with the itemprop attribute
>> 2. if current has an ID, for each element e in document order:
>> 2.1. if e has an itemref attribute:
>> 2.1.1. split the value of that itemref attribute on spaces. for each
>> resulting token, ID:
>> 2.1.1.1. if ID equals the ID of current, return e
>> 3. reaching this step indicates that the item wasn't found via
>> itemref on this element
>> 4. let parent be the parent element of current
>> 5. if parent is null, return null
>> 6. if parent has the itemscope attribute, return parent
>> 7. otherwise, let current be parent and jump to step 2.
>>
>> This algorithm will find the parent item of a property, if there is
>> one. itemref'ing takes precedence over "parent-child linking", so in
>> Tim's example the properties of Shanghai would be applied to only the
>> Shanghai sub-item. I'm not convinced writing markup like that is a
>> good idea, but at least this way it has sane processing.
Which is important in the markup-souped web of non-linked-data :-)
>> HTMLPropertiesCollection on any given element would simply match all
>> elements in the document for which the the algorithm returns that
>> very element. It should be invalid for there to be any elements in
>> the document with itemprop where this algorithm returns null or the
>> element itself.
>>
>> I will try implementing this algorithm in MicrodataJS [3] and see if
>> it works OK. While it may look less efficient than the current
>> algorithm, consider that a browser won't implement either algorithm
>> as writting, only act as if they did. The expensive step of going
>> through all elements with itemref attributes is actually no more
>> expensive than e.g. document.querySelector('.classname') if
>> implemented natively.
I did something like this in my experimental/unfinished/test/learn
microdata extractor based on jquery which is here :
http://www.depulz.nl/microdata/ (works at least in FF 3.5 and opera 10.10).
>> [1]
>> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-November/024095.html
>>
>> [2]
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#the-properties-of-an-item
>>
>> [3] http://gitorious.org/microdatajs
>>
>
More information about the whatwg
mailing list