[whatwg] inverse property mechanism for Microdata?

Wed Mar 19 11:06:22 PDT 2014

On 17 March 2014 21:15, Ian Hickson <ian at hixie.ch> wrote:
> On Mon, 17 Mar 2014, Dan Brickley wrote:
>>
>> We discussed this (and the -inv suggestion) at schema.org again, and the
>> consensus there was that we'd like to have the search engines proceed
>> with accepting an experimental/proposed 'inverse itemprop' attribute,
>> rather than work around its absence.
>
> So the idea here that the itemprop-up (or whatever -- it would be good to
> get a more intuitive name, not sure what to call it though) would have to
> be specified in conjunction with the itemscope="" attribute on a top-level
> microdata item whose element had an ancestor that itself creates an item,
> and would actually specify a property on the inner item, whose value was
> the outer item?
>
> This is what the example would look like if I'm understanding this right:
>
>   <div itemscope itemtype="http://schema.org/LocalBusiness">
>     <h1><span itemprop="name">(Entity A) Beachwalk Beachwear &
>     Giftware</span></h1>
>     <span itemprop="description"> A superb collection of fine gifts and clothing
>     to accent your stay in Mexico Beach.</span>
>     Phone: <span itemprop="telephone">850-648-4200</span>
>
>     <div itemscope itemtype="http://schema.org/LocalBusiness"
>          itemprop-up="containedIn">
>       <h2><span itemprop="name">(Entity B) The tiny store within a
>       store</span></h2>
>       <span itemprop="description"> A superb collection of tiny clothes,
>       from the store within the store.</span>
>       Phone: <span itemprop="telephone">123-456-7890</span>
>     </div>
>
>   </div>
>
> It's not too bad, I guess.

Yes. I notice that the words we were playing with at schema.org relate
to the underlying graph data model  itemprop-inverse, -reverse etc.,
whereas your draft name, itemprop-up is about the markup hierarchy.

>     My main concern is that this seems to solve a
> very narrow use case for non-tree structures, but doesn't take into
> account the many, many other non-tree structures.

Yup, there are some cases where this can be addressed through the
rigorous use of entity IDs in itemid, as you sketch below. That would
be relatively new territory for schema.org and for publishers. Perhaps
there is an attribute name we can find that would leave the door open
to more use cases, e.g. "itemprop-backwards" rather than
"itemprop-up". It seems reasonable to try to address relationships
between sibling elements too.

Something like (trying out -backwards instead of -up, to allow for
non-hierarchical usage):

<div itemid="bigshop" itemscope itemtype="http://schema.org/LocalBusiness">
    <h1><span itemprop="name">(Entity A) Beachwalk Beachwear &
Giftware</span></h1>
</div>
<div itemscope itemtype="http://schema.org/Pharmacy">
      <meta itemprop-backwards="containedIn" itemid="bigshop" />
      <h2><span itemprop="name">Tiny pharmacy store within a store</span></h2>
</div>

?

Can we use itemid in that way, to give a property value too? I don't
see itemid used much in the wild and the spec only mentions its use
for the item having the property, rather than using when supplying the
value of a property.

> For example, consider
> the case of a TV Episode with an Actor:
>
>    <div itemscope itemtype="http://schema.org/Episode">
>     ...
>     <div itemprop="actor"
>          itemscope itemtype="http://schema.org/Person">
>      ...
>     </div>
>    </div>
>
> ...now suppose it's marked up the other way around:
>
>    <div itemscope itemtype="http://schema.org/Person">
>     ...
>     <div itemprop-up="actor"
>          itemscope itemtype="http://schema.org/Episode">
>      ...
>     </div>
>    </div>
>
> So far so good. But what if there's two episodes with two actors, and the
> page just lists both episodes and both actors, and wants to
> cross-reference both episodes to both actors?
>
> itemprop-up (or whatever we call it) can't help there. itemref="" can help
> in some simple cases, but as you pointed out, it soon gets out of hand.
>
> Microdata actually already has a solution to this. The vocabulary can
> define an ID for each item using itemid="", and can define multiple items
> having the same ID as being the same conceptual item. Thus:
>
>    <!-- first episode -->
>    <div itemscope itemtype="http://schema.org/Episode">
>     ...
>     <div itemprop="actor"
>          itemscope itemtype="http://schema.org/Person"
>          itemid="http://.../person/123"></div>
>     <div itemprop="actor"
>          itemscope itemtype="http://schema.org/Person"
>          itemid="http://.../person/456"></div>
>    </div>
>
>    <!-- second episode -->
>    <div itemscope itemtype="http://schema.org/Episode">
>     ...
>     <div itemprop="actor"
>          itemscope itemtype="http://schema.org/Person"
>          itemid="http://.../person/123"></div>
>     <div itemprop="actor"
>          itemscope itemtype="http://schema.org/Person"
>          itemid="http://.../person/456"></div>
>    </div>
>
>    <!-- actors -->
>    <div itemscope itemtype="http://schema.org/Person"
>         itemid="http://.../person/123">
>     ...
>    </div>
>    <div itemscope itemtype="http://schema.org/Person"
>         itemid="http://.../person/456">
>     ...
>    </div>
>
> This also enables the data to be spread across multiple pages without
> confusion. (This is similar to how RDF uses identifiers for everything --
> essentially, in RDF terms, this turns the microdata item from a bnode into
> a node with a global identifier.)

Yes, it succeeds or fails to the extent people agree on these global
identifiers.

> Your example would become:
>
>   <div itemscope itemtype="http://schema.org/LocalBusiness"
>        itemid="...">
>     <h1><span itemprop="name">(Entity A) Beachwalk Beachwear &
>     Giftware</span></h1>
>     <span itemprop="description"> A superb collection of fine gifts and clothing
>     to accent your stay in Mexico Beach.</span>
>     Phone: <span itemprop="telephone">850-648-4200</span>
>
>     <div itemscope itemtype="http://schema.org/LocalBusiness">
>       <div itemprop="containedIn"
>            itemscope itemtype="http://schema.org/LocalBusiness"
>            itemid="..."></div>
>       <h2><span itemprop="name">(Entity B) The tiny store within a
>       store</span></h2>
>       <span itemprop="description"> A superb collection of tiny clothes,
>       from the store within the store.</span>
>       Phone: <span itemprop="telephone">123-456-7890</span>
>     </div>
>
>   </div>
>
> Is this not suitable for schema.org? Or is it just too much markup?

It's in the clever-but-fragile category, I'd say. So yes, a bit too
much markup. In English this is saying something like

"We're describing a LocalBusiness whose global ID is [xyz]; it has
certain name, description, telephone properties given here.
There is also a LocalBusiness that is containedIn a LocalBusiness
whose global ID is [xyz]; this [other] LocalBusiness has the following
name, description, telephone etc properties...."

Just as in the English, it is rather easy to lose track of which
LocalBusiness we're talking about.

>> > That is another option, similar to the parenthetical itemid="" note
>> > above -- you could just have the vocabulary define that for every
>> > property whose value is an item, the item type that that property can
>> > point to has another property with the same name plus a fixed suffix,
>> > like "-inv", that inverses the relationship. [...]
>>
>> This is easier to understand than itemref, but still involves creating
>> 100s of additional properties instead of just one new piece of syntax.
>
> What do you mean by "creating additional properties" here? It's relatively
> trivial to define these with one sentence, you don't need to actually list
> them or anything. Implementing support is similarly easy, as far as I can
> tell -- you just check for the suffix or prefix and handle it accordingly.

Re "you don't need to actually list them", this effectively creates
two classes of property. Real ones, and fake/pseudo properties which
we're pretending exist so that we can re-use a piece of syntax that
expects a property name. Once these pseudo properties are released
into the wild, they'll show up as if they were real.

What we want to avoid is saying things like:

"You can use itemprop='containedIn-rev' to indicate a property that
means the inverse of containedIn. However this is not a first class
schema.org property, and should not be used other syntaxes (JSON etc),
data dumps, APIs etc. You should canonicalize x containedIn-rev y
into: y containedIn x., ..."

This burdens all users of schema.org data with a distinction that we
can hopefully avoid them having to think much about. Adding a new
attribute is also a burden of course; for parser writers, and for
people using or encountering it. I think the difference is that there
is less 'leakage'. Once the data has been parsed, it's perfectly
normal schema.org / microdata. Whereas with -rev, the distinction is
needed wherever the data shows up.

> I don't really understand why 'itemprop-foo="bar"' would be any better
> than 'itemprop="bar-foo"'. Can you elaborate on this? It seems like it
> would essentially be identical in practice. There are significant costs to
> introducing a new language feature here, I think, so we should definitely
> consider this alternative carefully before dismissing it.

It feels like extra conceptual baggage to introduce a new category of
microdata / schema.org property for things like containedIn-rev, which
is a semi-pretend property that only exists as a syntax-related
workaround. Whereas doubling the size of schema.org's list of 'real'
properties adds other mental strains. On the one side of this
tradeoff, containedIn-rev is a 2nd class citizen and not a full
property. On the other side, it is a full property,one that within
schema.org's own site might be baked into the documentation system but
which other systems would treat as if it were any other property.

>> Would a data- attribute be an appropriate way to indicate an
>> experimental/proposed attribute? And then if it works out well perhaps a
>> real microdata attribute could be added later? e.g.
>> data-itemprop-inverse="alumni" ...
>
> If you do want to go with a new property, just use the name you would want
> in the spec. I weakly recommend "itemprop-up", which is the most intuitive
> name I've seen so far for this, but if you find a better name just use
> that. I guarantee that I won't make the spec conflict with whatever you
> use, as long as you tell me what it is. :-) Assuming that it works well,
> then we would just update the spec to use that term directly,
> retroactively making the experimental content conforming.

Thanks! I'll discuss this thread with the schema.org team. My guess is
that there's still a strong preference for a new property, and we'd be
happy to avoid using data-*.

If I understand right, the outstanding area of discussion/confusion is
whether there are cases beyond simple DOM element containment where we
might want to use an inverse itemprop construction (even though we can
see how itemid everywhere might also be used). If we're only using
element hierarchy then itemprop-up could work.

cheers,

Dan

> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'