[whatwg] Microdata

Philip Jägenstedt philipj at opera.com
Tue Aug 25 00:43:58 PDT 2009

On Tue, 25 Aug 2009 00:29:06 +0200, Ian Hickson <ian at hixie.ch> wrote:

> On Mon, 24 Aug 2009, Philip Jägenstedt wrote:
>> I've found two related things that are a bit problematic. First, because
>> itemprops are only associated with ancestor item elements or via the
>> subject attribute, it's always necessary to find or create a separate
>> element for the item. This leads to more convoluted markup for small
>> items, so it would be nice if the first item and itemprop could be on
>> the same element when it makes sense:
>> <p item="vevent" itemprop="description">
>>   Concert at <span itemprop="dtstart">19:00</span> at <span
>> itemprop="location">the beach</span>.
>> </p>
>> rather than
>> <p item="vevent">
>>   <span itemprop="description">
>>     Concert at <span itemprop="dtstart">19:00</span> at <span
>> itemprop="location">the beach</span>.
>>   </span>
>> </p>
> As specced now, having itemprop="" and item="" on the same element  
> implies
> that the value of the property is an item rooted at this element.
> Not supporting the above was intentional, to keep the mental model of the
> markup very simple, rather than having shortcuts. (RDFa has lots of
> shortcuts and it ended up being very difficult to keep the mental model
> straight.)

There's something like an inverse relationship between simplicity of the  
syntax and complexity of the resulting markup, the best balance point  
isn't clear (to me at least). Perhaps option 3 is better, never allowing  
item+itemprop on the same element.

>> Second, because composite items can only be made by adding item and
>> itemprop to the same element, the embedded item has to know that it has
>> a parent and what itemprop it should use to describe itself. James gave
>> the example of "something like planet where each article could be a
>> com.example.blog item and within each article there could be any
>> arbitrary author-supplied microdata" [1]. I also feel that the
>> item+itemprop syntax for composite items is one of the least intuitive
>> parts of the current spec. It's easy to get confused about what the type
>> of the item vs the itemprop should be and which item the itemprop
>> actually belongs to.
> Fair points.
>> Given that flat items like vcard/vevent are likely to be the most common
>> use case I think we should optimize for that. Child items can be created
>> by using a predefined item property: itemprop="com.example.childtype
>> item".
> Ok...
>> The value of that property would then be the first item in tree-order
>> (or all items in the subtree, not sure). This way, items would have
>> better copy-paste resilience as the whole item element could be made
>> into a top-level item simply by moving it, without meddling with the
>> itemprop.
> That sounds kinda confusing...

More confusing than item+itemprop on the same element? In many cases the  
property value is the contained text, having it be the contained item  
node(s) doesn't seem much stranger.

>> If the parent-item (com.example.blog) doesn't know what the child-items
>> are, it would simply use itemprop="item".
> I don't understand this at all.

This was an attempt to have anonymous sub-items. Re-thinking this, perhaps  
a better solution would be to have each item behave in much the same way  
that the document itself does. That is, simply add items in the subtree  
without using itemprop and access them with .getItems(itemType) on the  
outer item.

Comparing the current model with a DOM tree, it seems odd in the a  
property could be an item. It would be like an element attribute being  
another element: <outer foo="<inner/>"/>. That kind of thing could just as  
well be <outer><foo><inner/></foo></outer>, <outer><inner  
type="foo"/></outer> or even <outer><inner/></outer> if the relationship  
between the elements is clear just from the fact that they have a  
parent-child relationship (usually the case).

All examples of nested items in the spec are on the form

<p itemprop="subtype" item>

These would be replaced with

<p item="subtype">

It's only in the case where both itemprop and item have a type that an  
extra level of nesting will be needed and I expect that to be the  
exception. Changing the model to something more DOM-tree-like is probably  
going to be easier to understand for many web developers. It would also  
fix the problem in my other mail where it's a bit tricky to determine via  
the DOM API whether a property is a string or an item. When on the topic  
of the DOM API, document.getItems("outer")[0].getItems("inner")[0] would  
be so much clearer than what we currently have.

>> Example:
>> <p item="vcard" itemprop="n item">
>>   My name is <span itemprop="given-name">Philip</span>
>>   <span itemprop="family-name">Jägenstedt</span>.
>> </p>
> I don't understand what this maps to at all.

The same as

<p item="vcard">
   <span itemprop="n" item>
     My name is <span itemprop="given-name">Philip</span>
     <span itemprop="family-name">Jägenstedt</span>.

Unless I've misunderstood the "n" in vcard (there's no example in the  
spec). But let's move on.

>> I'll admit that my examples are a bit simple, but the main point in my
>> opinion is to make item+itemprop less confusing. There are basically
>> only 3 options:
>> 1. for compositing items (like now)
>> 2. as shorthand on the top-level item (my suggestion)
>> 3. disallow
>> I'd primarily like for 1 and 2 to be tested, but 3 is a real option too.
>> [1] http://krijnhoetmer.nl/irc-logs/whatwg/20090824#l-375
> We can't disallow nesting items as values of properties, there are a  
> whole
> bunch of use cases that depend on it.

3 is not a suggestion to disallow nesting, but to change the syntax for it.

> Could you show how your syntax proposals would look when marking up the
> following data?
> // document URL: http://www.example.org/sample/test.html
> {
>   "items": [
>     {
>       "type": "com.example.product",
>       "properties": {
>         "about": [ "http://example.com/products/bt200x" ],
>         "image": [ "http://www.example.org/sample/bt200x.jpeg" ] //  
> please keep this one outside the item in the DOM
>         "name": [ "GPS Receiver BT 200X" ],
>         "reldate": [ "2009-01-22" ],
>         "review": [
>           {
>             "type": "",
>             "properties": {
>               "reviewer": [ "http://ln.hixie.ch/" ],
>               "text": [ "Lots of memory, not much battery, very little  
> accuracy." ]
>             }
>           }
>         ],
>       }
>     },
>     {
>       "type": "work",
>       "properties": {
>         "about": [ "http://www.example.org/sample/image.jpeg" ],
>         "license": [  
> "http://www.opensource.org/licenses/mit-license.php" ]
>         "title": [ "My Pond" ],
>       }
>     }
>   ]
> }
> Here's how it would be marked up today:
> <section id="bt200x" item=com.example.product>
>  <link itemprop=about href="http://example.com/products/bt200x">
>  <h1 itemprop=name>GPS Receiver BT 200X</h1>
>  <p>Rating: &#x22C6;&#x22C6;&#x22C6;&#x2729;&#x2729; <meta  
> itemprop=rating content="2"></p>
>  <p>Release Date: <time itemprop="reldate" datetime="2009-01-22">January  
> 22</time></p>
>  <p itemprop=review item><a itemprop=reviewer  
> href="http://ln.hixie.ch/">Ian</a>:
>  "<span itemprop=text>Lots of memory, not much battery, very little  
> accuracy.</span>"</p>
> </section>
> <figure item=work>
>  <img itemprop=about src="image.jpeg">
>  <legend>
>   <p><cite itemprop="title">My Pond</cite></p>
>   <p><small>Licensed under the <a itemprop="license"
>   href="http://www.opensource.org/licenses/mit-license.php">MIT
>   license</a>.</small>
>  </legend>
> </figure>
> <p><img subject="bt200x" itemprop="image" src="bt200x.jpeg"  
> alt="..."></p>

To be clear, I'm now suggesting that item+itemprop never be allowed on the  
same item (option 3). Nesting items is accomplished simply by nesting them.

In your example, the only change would be this line:

  <p item=review><a itemprop=reviewer href="http://ln.hixie.ch/">Ian</a>:

(Of course some tokens may need to change to be renamed to make sense as  
item names rather than itemprops.)

As an aside, subject should also be allowed to associate items with its  
parent item, just like for itemprop.

IMHO, this syntax is more copy-paste robust, favors the common cases over  
the complex cases and makes the model more intuitive to those who  
understand XML and/or DOM.

Philip Jägenstedt
Opera Software

More information about the whatwg mailing list