[whatwg] Microdata

Philip Jägenstedt philipj at opera.com
Tue Aug 25 11:47:33 PDT 2009


On Tue, 25 Aug 2009 09:43:58 +0200, Philip Jägenstedt <philipj at opera.com>  
wrote:

> On Tue, 25 Aug 2009 00:29:06 +0200, Ian Hickson <ian at hixie.ch> wrote:
>
>> On Mon, 24 Aug 2009, Philip Jägenstedt wrote:
>>>
>>> I've found two related things that are a bit problematic. First,  
>>> because
>>> itemprops are only associated with ancestor item elements or via the
>>> subject attribute, it's always necessary to find or create a separate
>>> element for the item. This leads to more convoluted markup for small
>>> items, so it would be nice if the first item and itemprop could be on
>>> the same element when it makes sense:
>>>
>>> <p item="vevent" itemprop="description">
>>>   Concert at <span itemprop="dtstart">19:00</span> at <span
>>> itemprop="location">the beach</span>.
>>> </p>
>>>
>>> rather than
>>>
>>> <p item="vevent">
>>>   <span itemprop="description">
>>>     Concert at <span itemprop="dtstart">19:00</span> at <span
>>> itemprop="location">the beach</span>.
>>>   </span>
>>> </p>
>>
>> As specced now, having itemprop="" and item="" on the same element  
>> implies
>> that the value of the property is an item rooted at this element.
>>
>> Not supporting the above was intentional, to keep the mental model of  
>> the
>> markup very simple, rather than having shortcuts. (RDFa has lots of
>> shortcuts and it ended up being very difficult to keep the mental model
>> straight.)
>
> There's something like an inverse relationship between simplicity of the  
> syntax and complexity of the resulting markup, the best balance point  
> isn't clear (to me at least). Perhaps option 3 is better, never allowing  
> item+itemprop on the same element.
>
>>> Second, because composite items can only be made by adding item and
>>> itemprop to the same element, the embedded item has to know that it has
>>> a parent and what itemprop it should use to describe itself. James gave
>>> the example of "something like planet where each article could be a
>>> com.example.blog item and within each article there could be any
>>> arbitrary author-supplied microdata" [1]. I also feel that the
>>> item+itemprop syntax for composite items is one of the least intuitive
>>> parts of the current spec. It's easy to get confused about what the  
>>> type
>>> of the item vs the itemprop should be and which item the itemprop
>>> actually belongs to.
>>
>> Fair points.
>>
>>
>>> Given that flat items like vcard/vevent are likely to be the most  
>>> common
>>> use case I think we should optimize for that. Child items can be  
>>> created
>>> by using a predefined item property: itemprop="com.example.childtype
>>> item".
>>
>> Ok...
>>
>>
>>> The value of that property would then be the first item in tree-order
>>> (or all items in the subtree, not sure). This way, items would have
>>> better copy-paste resilience as the whole item element could be made
>>> into a top-level item simply by moving it, without meddling with the
>>> itemprop.
>>
>> That sounds kinda confusing...
>
> More confusing than item+itemprop on the same element? In many cases the  
> property value is the contained text, having it be the contained item  
> node(s) doesn't seem much stranger.
>
>>> If the parent-item (com.example.blog) doesn't know what the child-items
>>> are, it would simply use itemprop="item".
>>
>> I don't understand this at all.
>
> This was an attempt to have anonymous sub-items. Re-thinking this,  
> perhaps a better solution would be to have each item behave in much the  
> same way that the document itself does. That is, simply add items in the  
> subtree without using itemprop and access them with .getItems(itemType)  
> on the outer item.
>
> Comparing the current model with a DOM tree, it seems odd in the a  
> property could be an item. It would be like an element attribute being  
> another element: <outer foo="<inner/>"/>. That kind of thing could just  
> as well be <outer><foo><inner/></foo></outer>, <outer><inner  
> type="foo"/></outer> or even <outer><inner/></outer> if the relationship  
> between the elements is clear just from the fact that they have a  
> parent-child relationship (usually the case).
>
> All examples of nested items in the spec are on the form
>
> <p itemprop="subtype" item>
>
> These would be replaced with
>
> <p item="subtype">
>
> It's only in the case where both itemprop and item have a type that an  
> extra level of nesting will be needed and I expect that to be the  
> exception. Changing the model to something more DOM-tree-like is  
> probably going to be easier to understand for many web developers. It  
> would also fix the problem in my other mail where it's a bit tricky to  
> determine via the DOM API whether a property is a string or an item.  
> When on the topic of the DOM API,  
> document.getItems("outer")[0].getItems("inner")[0] would be so much  
> clearer than what we currently have.
>
>>> Example:
>>>
>>> <p item="vcard" itemprop="n item">
>>>   My name is <span itemprop="given-name">Philip</span>
>>>   <span itemprop="family-name">Jägenstedt</span>.
>>> </p>
>>
>> I don't understand what this maps to at all.
>
> The same as
>
> <p item="vcard">
>    <span itemprop="n" item>
>      My name is <span itemprop="given-name">Philip</span>
>      <span itemprop="family-name">Jägenstedt</span>.
>    </span>
> </p>
>
> Unless I've misunderstood the "n" in vcard (there's no example in the  
> spec). But let's move on.
>
>>> I'll admit that my examples are a bit simple, but the main point in my
>>> opinion is to make item+itemprop less confusing. There are basically
>>> only 3 options:
>>>
>>> 1. for compositing items (like now)
>>> 2. as shorthand on the top-level item (my suggestion)
>>> 3. disallow
>>>
>>> I'd primarily like for 1 and 2 to be tested, but 3 is a real option  
>>> too.
>>>
>>> [1] http://krijnhoetmer.nl/irc-logs/whatwg/20090824#l-375
>>
>> We can't disallow nesting items as values of properties, there are a  
>> whole
>> bunch of use cases that depend on it.
>
> 3 is not a suggestion to disallow nesting, but to change the syntax for  
> it.
>
>> Could you show how your syntax proposals would look when marking up the
>> following data?
>>
>> // JSON DESCRIPTION OF MARKED UP DATA
>> // document URL: http://www.example.org/sample/test.html
>> {
>>   "items": [
>>     {
>>       "type": "com.example.product",
>>       "properties": {
>>         "about": [ "http://example.com/products/bt200x" ],
>>         "image": [ "http://www.example.org/sample/bt200x.jpeg" ] //  
>> please keep this one outside the item in the DOM
>>         "name": [ "GPS Receiver BT 200X" ],
>>         "reldate": [ "2009-01-22" ],
>>         "review": [
>>           {
>>             "type": "",
>>             "properties": {
>>               "reviewer": [ "http://ln.hixie.ch/" ],
>>               "text": [ "Lots of memory, not much battery, very little  
>> accuracy." ]
>>             }
>>           }
>>         ],
>>       }
>>     },
>>     {
>>       "type": "work",
>>       "properties": {
>>         "about": [ "http://www.example.org/sample/image.jpeg" ],
>>         "license": [  
>> "http://www.opensource.org/licenses/mit-license.php" ]
>>         "title": [ "My Pond" ],
>>       }
>>     }
>>   ]
>> }
>>
>>
>> Here's how it would be marked up today:
>>
>> <section id="bt200x" item=com.example.product>
>>  <link itemprop=about href="http://example.com/products/bt200x">
>>  <h1 itemprop=name>GPS Receiver BT 200X</h1>
>>  <p>Rating: &#x22C6;&#x22C6;&#x22C6;&#x2729;&#x2729; <meta  
>> itemprop=rating content="2"></p>
>>  <p>Release Date: <time itemprop="reldate"  
>> datetime="2009-01-22">January 22</time></p>
>>  <p itemprop=review item><a itemprop=reviewer  
>> href="http://ln.hixie.ch/">Ian</a>:
>>  "<span itemprop=text>Lots of memory, not much battery, very little  
>> accuracy.</span>"</p>
>> </section>
>> <figure item=work>
>>  <img itemprop=about src="image.jpeg">
>>  <legend>
>>   <p><cite itemprop="title">My Pond</cite></p>
>>   <p><small>Licensed under the <a itemprop="license"
>>   href="http://www.opensource.org/licenses/mit-license.php">MIT
>>   license</a>.</small>
>>  </legend>
>> </figure>
>> <p><img subject="bt200x" itemprop="image" src="bt200x.jpeg"  
>> alt="..."></p>
>
> To be clear, I'm now suggesting that item+itemprop never be allowed on  
> the same item (option 3). Nesting items is accomplished simply by  
> nesting them.
>
> In your example, the only change would be this line:
>
>   <p item=review><a itemprop=reviewer href="http://ln.hixie.ch/">Ian</a>:
>
> (Of course some tokens may need to change to be renamed to make sense as  
> item names rather than itemprops.)
>
> As an aside, subject should also be allowed to associate items with its  
> parent item, just like for itemprop.
>
> IMHO, this syntax is more copy-paste robust, favors the common cases  
> over the complex cases and makes the model more intuitive to those who  
> understand XML and/or DOM.

http://krijnhoetmer.nl/irc-logs/whatwg/20090825#l-469

After this discussion it is (even more) clear that at least option 3 is  
not just a syntax change but rather a change to the underlying model from  
nested name-value groups to a tree of unnamed (but not untyped) nodes  
which each have name-value groups, somewhat like DOM.

Pros:

* No itemprop+item syntax.

But simpler syntax might be compensated for by more nesting... If  
itemprop+item sticks, then some examples in the spec that use both  
itemprop name and item type would help. Something like <span  
item="book"><span itemprop="author" item="vcard">...</span></span>

It will be very interesting to see the results from usability testing if  
itemprop+item actually will confuse authors.

* Items don't need to "know" that they are part of a bigger item.

But if the two items don't know of each other then they probably don't  
belong together. The real issue is that item elements that happen to be  
children of another item element (like in Jame's blog planet example)  
aren't going to be top-level items and are simply ignored. Workarounds are  
possible, but ensuring that items are in different subtrees is impossible  
if you only control a document fragment that is included in a larger  
document. I'd suggest simply letting any item element that doesn't have an  
itemprop attribute be a top-level item. Other solutions possible.

Cons:

* It would make converting microdata into a JavaScript object awkward  
because there's no such thing as unnamed properties. On the other hand, no  
matter the syntax you'll probably end up with vocabulary-specific mappings  
to JS(ON).

* It assumes that the type of the subitem to be enough to determine how it  
relates to the item, not property name + subitem type as now. This might  
be an even worse source of confusion than itemprop+item.

Looking at the pros/cons I can only conclude that I dislike all options  
equally. Several of these issues leak over to the DOM API, but I've  
already sent feedback on that separately. I hope there will be more  
suggestions to consider.

-- 
Philip Jägenstedt
Opera Software



More information about the whatwg mailing list