[whatwg] Issues with microdata and proposals for improvements

Ian Hickson ian at hixie.ch
Tue Dec 18 16:52:27 PST 2012

On Fri, 12 Oct 2012, pghj wrote:
> I am writing a set of tools to work with microdata, and ran into a 
> number of issues. Is there at this point still room for discussion, and 
> improvements to the specification?

There's always room for discussion and improvements; the only constraint 
is that we can't make changes that are incompatible with deployed content 
and processing software.

> == Usage of URLs that do not point to anything interesting ==
> I'm not sure whether this has been discussed in length, though it seems 
> that Philip Jägenstedt brought it up once [1]. For a variety of reasons, 
> I would much rather use <data> and <a> than <meta> and <link> for 
> microdata: less ugly, script has easy access to the user visible 
> representation of data, and CSS styling of that representation based on 
> microdata attributes (itemref complicates this - see below), etc.
> However, for enumerations like http://schema.org/InStock a clickable <a> 
> would not be desirable, yet the use of <data> would violate the 
> microdata specification, section "Values":
> "If a property's value, as defined by the property's definition, is an 
> absolute URL, the property must be specified using a URL property 
> element."
> I do not see much merit in this requirement: the URL is already 
> absolute, so it does not need resolving and it is already defined to be 
> a URL by the property's definition. Therefore storing it in a <data> 
> element would not do much harm. Because there are many benefits to being 
> able to wrap visible content in a microdata property, I would like to 
> propose that this requirement is dropped, so the <data> element may also 
> carry an absolute URL.

The main reason to use a URL property element is that you get syntax 
checking of the URL, which should reduce authoring mistakes.

> Nevertheless, I see how it would be useful to store a URL in such a
> way that it is clear it's a URL, and have it properly resolved. For as
> far as I can tell, no HTML element combines the following three
> properties:
> 1. Stores a definite URL type value,
> 2. Can have phrasing content,
> 3. Has no side effects (clickable, etc).
> Therefore, as an alternative to dropping the requirement mentioned 
> above, I would also be in favor of allowing an additional attribute on 
> the <data> element (for example named 'url'), mutually exclusive with 
> the 'value' attribute, that is to be resolved the same way as the URLs 
> obtained from <a>, <link>, <img>, etc are.

This is an interesting idea.

Could you elaborate on how you'd use this element from script and CSS?

> == Incompatible property names when using itemrefs ==
> Consider the following piece of HTML:
> <div itemscope itemtype="http://schema.org/Book" itemref="a"> ... </div>
> <div itemscope itemtype="http://schema.org/LiteraryEvent" itemref="b">
> ... </div>
> <div id="a" itemprop="author" itemscope
> itemtype="http://schema.org/Person" itemref="c"></div>
> <div id="b" itemprop="performer" itemscope
> itemtype="http://schema.org/Person" itemref="c"></div>
> <div id="c">
> 	 Name: <span itemprop="name">Amanda</span>
> </div>
> Actually, the 'Book' item and the 'LiteraryEvent' item both want to 
> refer to the same person: the first as the author, the second as a 
> performer. Because the property names differ, I can't seem to find a 
> proper way to do this using itemrefs, without either polluting other 
> items, or creating two 'Person' items (as I did above). Both approaches 
> are undesirable.

Yeah, this isn't really possible in microdata. We considered this way 
back, but IIRC the mechanisms we considered ended up making simple cases 
more confusing that necessary.

You really want a way to just say "the value of this property is that 
object over there", as in:

   <div id="a" itemprop="author" itemvalue="c"></div>
   <div id="b" itemprop="performer" itemvalue="c"></div>
   <div id="c" itemscope itemtype="http://schema.org/Person"> ... </div>

Not sure what the attribute should be exactly, if we add this.

Not sure we should add this, either. How common is this sort of problem?

In theory, itemid="" could also be used:

  <div id="a" itemprop="author" itemscope
  <div id="b" itemprop="performer" itemscope
  <div itemscope
 	 Name: <span itemprop="name">Amanda</span>

This only works if the vocabulary definition says it should, and then the 
software implements this explicitly for this vocabulary. But given those 
conditions, the three items above could be the same item, with properties 
getting merged at processing time.

I haven't yet attempted to solve this problem with new syntax in the spec, 
because I'm not sure how common this problem is.

> An alternative way of using the itemref attribute, which makes much
> more sense to me, would lead to this:
> <div itemscope itemtype="http://schema.org/Book">
> 	Author: <a itemprop="author" itemref href="#a">Amanda</a>
> 	...
> </div>
> <div id="b" itemscope itemtype="http://schema.org/LiteraryEvent">
> 	Speaking: <a itemprop="performer" itemref href="#a">Amanda</a>
> 	...
> </div>
> <div id="a" itemscope itemtype="http://schema.org/Person">
> 	Name: <span itemprop="name">Amanda</span>
> 	Near you: <a itemprop="performerIn" itemref href="#b">reading from
> her new book</a>
> </div>
> Formally:
> If an element has both the attributes itemprop and itemref, but not 
> itemscope, and itemref is empty, then it should have a URL type value 
> that points to another element that is an item. This item, if it exists 
> in the same document, will be the property's value. If not, the URL will 
> be used.

Yeah, this would be similar to the itemvalue="" idea above. I'd rather not 
reuse itemref=""; overloading attributes is an easy source of confusion.

> It opens the door to pointing to microdata in other documents. Although 
> a browser probably shouldn't try to fetch it, this can be useful for 
> search engines.

I'd rather only do something like this if we had confirmation from 
implementors that they were going to definitely implement this.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list