[whatwg] Microdata

Brian Campbell Brian.P.Campbell at dartmouth.edu
Wed Aug 26 10:11:05 PDT 2009


On Aug 22, 2009, at 5:51 PM, Ian Hickson wrote:

> Based on some of the feedback on Microdata recently, e.g.:
>
>   http://www.jenitennison.com/blog/node/124
>
> ...and a number of e-mails sent to this list and the W3C lists, I am  
> going
> to try some tweaks to the Microdata syntax. Google has kindly  
> offered to
> provide usability testing resources so that we can try a variety of
> different syntaxes and see which one is easiest for authors to  
> understand.
>
> If anyone has any concrete syntax ideas that they would like me to
> consider, please let me know. There's a (pretty low) limit to how many
> syntaxes we can perform usability tests on, though, so I won't be  
> able to
> test every idea.

Here's an idea I've been mulling around. I think it would simplify the  
syntax and semantic model considerably.

Why do we need separate items and item properties? They seem to  
confuse people, when something can be both an item and an itemprop at  
the same time. They also seem to duplicate a certain amount of  
information; items can have "types", while itemprops can have "names",  
but they both seem to serve about the same role, which is to indicate  
how to interpret them in the context of page or larger item.

What if we just had "item", filling both of the roles? The value of  
the item would be either an associative array of the descendent items  
(or ones associated using "about") if those exists, or the text  
content of the item (or URL, depending on the tag) if it has no items  
within it.

Here's an example used elsewhere in the thread, marked up as I suggest:

<section id="bt200x" item=com.example.product>
   <link item=about href="http://example.com/products/bt200x">
   <h1 item=name>GPS Receiver BT 200X</h1>
   <p>Rating: &#x22C6;&#x22C6;&#x22C6;&#x2729;&#x2729; <meta  
item=rating content="2"></p>
   <p>Release Date:
     <time item="reldate" datetime="2009-01-22">January 22</time></p>
   <p item=review><a item=reviewer href="http://ln.hixie.ch/">Ian
     </a>:
     "<span item=text>Lots of memory, not much battery, very little
        accuracy.</span>"</p>
</section>
<figure item=work>
   <img item=about src="image.jpeg">
   <legend>
     <p><cite item="title">My Pond</cite></p>
     <p><small>Licensed under the <a item="license"
         href="http://www.opensource.org/licenses/mit-license.php">MIT
       license</a>.</small>
   </legend>
</figure>
<p><img subject="bt200x" item="image" src="bt200x.jpeg" alt="..."></p>

This would translate into the following JSON. Note that this is a  
simpler structure than the existing one proposed for microdata; it is  
a lot closer to how people generally use JSON natively, rather than  
using an extra level of nesting to distinguish types and properties:

// JSON DESCRIPTION OF MARKED UP DATA
// document URL: http://www.example.org/sample/test.html
{
  "com.example.product": [
    {
      "about": [ "http://example.com/products/bt200x" ],
      "image": [ "http://www.example.org/sample/bt200x.jpeg" ]
      "name": [ "GPS Receiver BT 200X" ],
      "reldate": [ "2009-01-22" ],
      "review": [
        {
          "reviewer": [ "http://ln.hixie.ch/" ],
          "text": [ "Lots of memory, not much battery, very little  
accuracy." ]
        }
      ],
    },
  ],
  "work": [
      {
        "about": [ "http://www.example.org/sample/image.jpeg" ],
        "license": [ "http://www.opensource.org/licenses/mit- 
license.php" ]
        "title": [ "My Pond" ],
      }
   ]
}

This has the slightly surprising property of making something like this:

   <section item=foo>Some text. <a href="somewhere">A link</a>. Some  
more text</section>

Result in:

   // http://example.org/sample/test
   { "foo": [ "Some text. A link. Some more text" ] }

While simply changing link an item:

   <section item=foo>Some text <a item=link href="somewhere">A link</ 
a>. Some more text</section>

Gives you:

   // http://example.org/sample/test
   { "foo": [ { link: [ "http://example.org/sample/somewhere" ] } ] }

However, I think that people will generally expect "item" to be used  
for its text/URL content only on leaf nodes or nodes without much  
nested within them, while they would expect "item" to return  
structured, nested data when the DOM is nested deeply with items  
inside it, so I don't think people would be surprised by this behavior  
very often.

I haven't yet looked at every use case proposed so far to see how well  
this idea works for them, nor have I worked out the API differences  
(which should be simpler than the existing API). If there seem to be  
no serious problems with this idea, I can write up a more detailed  
justification and examples.

-- Brian



More information about the whatwg mailing list