[whatwg] Annotating structured data that HTML has no semantics for

Eduard Pascual herenvardo at gmail.com
Tue May 12 03:55:42 PDT 2009

I don't really like to be harsh, but I have some criticism to this,
and it's going to be quite hard. However, my goal by pointing out what
I consider so big mistakes is to help HTML5 becoming as good as it
could be.

First issue: it solves a (major) subset of what RDFa would solve.
However, it has been taken as a requirement to avoid
clashes/incompatibilities with RDFa. In other words, as things stand,
authors will face two options: either use RDFa in HTML5, which would
forsake validation but actually work; or take a less powerful, less
supported (at least for now: many RDFa-aware agents vs. zero HTML5's
microdata -aware agents) that validates but provides no pragmatic
IMO, an approach that forces authors to choose between
validity/conformance which doesn't *yet* works vs. invalid solutions
that actually work is a horrible idea: it encourages authors to
forsake validity if they want things to work.
Wouldn't the RDFa + @prefix solution suggested many times work better
and require less effort (for spec writters, for implementors, and for
content authors)? Keep in mind that I don't think RDFa + @prefix is
the solution we need; I'm just trying to point out that the current
approach is even worse than that.

Second issue: as the "decaffeinated RDFa" it is, the HTML5 Microdata
approach tends to fail where RDFa itself fails. It's nice that, thanks
to the <time> element, the problem with trying to reuse human-readable
dates as machine-readable is dodged; but there are other cases where
separate values might be needed: for example using a street address
for the human-readable representation of a location and the exact
geographic coordinates as the machine-readable (since not all
micro-data parsers can rely on Google Maps's database to resolve
street addresses, you know); or using a colored name (such as "lime
green" displayed on lime green color) as the human-readable
representation of a color, and the hexcode (like #00FF00) as the
machine-readable representation. These are just the cases from the top
of my head, and this can't be considered in any way a complete list.
While *favoring* the reuse of human-readable values for the
machine-readable ones is appropiate, because it's the widely most
common case, *forcing* that reuse is a quite bad idea, because it is
*not* the *only* case.

Third issue: also a flaw inherited from RDFa, it can be summarized as
completelly ignoring the requirement I submitted to this list on April
28th, in reply to Ian asking us to review the use cases [1]. I'll try
to illustrate it with a example, inspired by the original use-case:
Let's say someone's marking up a collection of iguanas (or cats, or
even CDs, doesn't really make a difference when illustrating this
issue), making a page for each iguana (or whatever) with all the
details for it; and then making an "index" page listing the maybe 20
iguanas with their name, picture, and link to the corresponding page.
Adding micro-data to that "index", either with RDFa or with Ian's
microdata proposal, would involve stating 20 times in the markup
something like "this is the iguana's picture; this is the iguana's
name; and this is the iguana's URL". It would be preferable to be able
to state something like "each (row) <tr> in the <table> describes an
iguana: the <img>s are each iguana's picture, the contents of the
<a>'s are the names, and the @href of the <a>'s are the URLs to their
main pages" just once. If I only need to state the table headings once
for the users to understand this concept, why should a micro-data
consumer require me to state it 20 times, once for each row?
Please note how such a page would be quite painful to maintain: any
mistake in the micro-data mark-up would generate invalid data and
require a manual harvest of the data on the page, thus killing the
whole purpose of micro-data. And repeating something 20 (or more)
times brings a lot of chances to put a typo in, or to miss an
attribute, or any minor but devastating mistake like these.

Last, but not least, I'm not sure if it was wise to start defining a
solution while some of the requirements seem to be still under
discussion. Actually, I had a possible solution in mind, but I was
holding it while reviewing it against the requiremetns being
discussed, so I could adapt it to any requirements I might had
initially missed. Seeing that solutions are already being discussed
here, I'm trying to put the ideas into a human-readable document that
I plan to submit to this list either late today or early tomorrow for
your review and consideration.

Eduard Pascual

[1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019487.html

More information about the whatwg mailing list