[whatwg] Questions regarding microdata implementations.

Tue Jan 18 10:36:51 PST 2011

Hey, Emiliano!  I'm going to snip your actual questions, as they're rather long.

> 1) The specification does not define any mechanism for an application
> using the microdata to deal with possible misuses of data
> vocabularies.

The spec completely specifies how to extract the data.  What
applications do with the data afterwards is out-of-scope for HTML.  It
may be useful for an application to accept and keep around all the
data that was extracted, even if it knows the vocabulary and sees
unknown properties (for example, this can help with
forward-compatibility, if that makes sense for the application; it
could also allow custom extensions, if that makes sense for the
application).  It may just throw away all the data it extracted that
it doesn't recognize.

Both of these, and any other behavior, are perfectly fine, and it's up
to the application to decide what's most useful.

> 2) The specs specify item types should be identified by URLs. It is
> not completely clear (or at least not clear to me) whether they
> represent the string of the URL as a URI for unambiguously
> representing the item type, a URL for a document that defines that
> item type or both. which is the case?

The former, though, since it's a URL, it can certainly play the role
of the latter as well.

> 3) The specification states that itemref references a node within the
> html tree, referencing it by it's id. However it specifies nothing
> regarding how the referenced node should be marked up. Since, the
> nodes referenced may exist before the itemrefs, an application
> discovering microdata may have to do multiple passes through the html
> tree to extract this information. I would like to know, if any thought
> has been given to using itemscope within the referenced node, ie:
>
> <div itemscope id="a">
>        <p itemprop="a1">value of a1</p>
>        <p itemprop="a2">value of a2</p>
> </div>
>
> <div itemscope id="b">
>        <p itemprop="b1">value of b1</p>
>        <div itemscope id="d" itemref="a"></div>
> </div>

Using @itemscope changes the meaning - it implies that the element
forms an independent (though possibly nested) Microdata item.

You don't necessarily need to make multiple passes through the
document to resolve all the itemrefs, though.  For example, you could
keep a stack of #ids, and associate each @itemprop you find with the
current stack.  When you're done extracting everything, you can
resolve the @itemrefs by just filtering your list of @itemprops by ids
in their stack.

> 4) What is the intended behaviour of an application when encountering
> a loop within the itemref references? ie:

This is described in
<http://www.whatwg.org/specs/web-apps/current-work/complete/microdata.html#associating-names-with-items>.
 I don't want to squint at the algorithm again to find out exactly
what happens, but the algo keeps track of things it's seen before, and
cuts off recursion if an @itemref results in a loop.

> 5) The specification states:
>
> "The itemref attribute, if specified, must have a value that is an
> unordered set of unique space-separated tokens that are
> case-sensitive, consisting of IDs of elements in the same home
> subtree."
>
> (5.2.2 of http://www.whatwg.org/specs/web-apps/current-work/#microdata)
>
> I would like to know if there has been any thoughts given to
> referencing fragments on an outside document. For example, a document
> with URL http://www.personaldata.com/me.html might contain the
> following fragment:

That's more complex than appeared necessary for any of the (fairly
extensive) use-cases that were considered when Microdata was written.

Vocabularies can certainly define that some of their properties take
urls which are intended to point to more data, but that doesn't affect
the Microdata data extraction algorithm itself, which only cares about
the single page it was run on.

~TJ