[whatwg] Annotating structured data that HTML has no semantics for

Tue Jun 9 01:40:54 PDT 2009

On Mon, 11 May 2009, Simon Pieters wrote:
> On Sun, 10 May 2009 12:32:34 +0200, Ian Hickson <ian at hixie.ch> wrote:
> 
> >    Page 3:
> >    <h2>My Cats<h2>
> >    <dl>
> >     <dt>Schr&ouml;dinger
> >     <dd item="com.damowmow.cat">
> >      <meta property="com.damowmow.name" content="Schr&ouml;dinger">
> >      <meta property="com.damowmow.age" content="9">
> >      <p property="com.damowmow.desc">Orange male.
> >     <dt>Erwin
> >     <dd item="com.damowmow.cat">
> >      <meta property="com.damowmow.name" content="Lord Erwin">
> >      <meta property="com.damowmow.age" content="3">
> >      <p property="com.damowmow.desc">Siamese color-point.
> >      <img property="com.damowmow.img" alt="" src="/images/erwin.jpeg">
> >    </dl>
> 
> Given the microdata solution and this example, there is now a reason other
> than styling to introduce <di>, since here you duplicate the <dt> information
> in <meta>.
> 
>   <dl>
>    <di item="com.damowmow.cat">
>     <dt property="com.damowmow.name">Schr&ouml;dinger
>     <dd>
>      <meta property="com.damowmow.age" content="9">
>      <p property="com.damowmow.desc">Orange male.
>    </di>
>    ...
> 
> The styling problem is discussed at
> http://forums.whatwg.org/viewtopic.php?t=47

Yeah, I noticed that. I agree that if it turns out that this is a common 
authoring pattern (and assuming we can work around the difficulties in 
adjusting the parser to handle this), we should probably introduce <di> 
after all. I intend to wait and see what happens first though.

On Mon, 11 May 2009, Giovanni Gentili wrote:
> Ian Hickson:
> >   USE CASE: Annotate structured data that HTML has no semantics for, and
> >   which nobody has annotated before, and may never again, for private use or
> >   use in a small self-contained community.
> > (..)
> >   SCENARIOS:
> 
> Between the scenarios should be considered also this case:
> 
> * a user (or groups of users) wants to annotate
> items present on a generic web page with
> additional properties in a certain vocabulary.
> for example Joe wants to gather in a blog
> a series of personal annotation to movies
> (or other type of items) present in imdb.com.

This isn't really a use case, it's a solution. What is the end-user 
scenario that the author is trying to address? For example, what kind of 
software will collect this information? What problem are we solving?

> a) In the case of properties specified for element without ancestor with 
> an item attribute specified the corresponding item should be the 
> document? (element body with implicit item attribute).

We already have mechanisms for providing name-value pairs for a document; 
namely, <meta name> and <link rel>.

> b) Do we need to require UA to offer a standard way to visualize (at 
> least as an option left to the user) the structured information carried 
> in microdata ?

Not as far as I can tell; what use case would this be for?

> And copy&paste?

The spec already requires user agents to include microdata in copy and 
paste.

On Tue, 12 May 2009, Tim Tepaße wrote:
> > 
> > (Note the <meta>s in the last example -- since sometimes the 
> > information isn't visible, rather than requiring that people put it in 
> > and hide it with display:none, which has a rather poor accessibility 
> > story, I figured we could just allow <meta> anywhere, if it has a 
> > property="" attribute.)
> 
> That seems to be a solution optimised for extremely invisible metadata 
> but not for metadata which differs from the human visible data.

It handles both -- instead of:

   <span itemprop="x">y</span>

...you can do:

   <span><meta itemprop="x" content="y">z</span>

> Imagine as an example the simple act of marking up a number (and 
> ignoring what the number denotes). For human consumption a thousands 
> seperator is often used, the type of seperator differs by language, 
> locale and context. Just in my little word I see on regular basis the 
> point, the comma, the space, the thin space and sometimes the the 
> apostrophe. Parsing different representations of numbers would be a 
> chore. The value of textContent of the element <span 
> itemprop="com.example.price">€&nbsp;1thinsp;000thinsp;000,&mdash;</span> 
> is clearly unusable, demanding an additional invisible <meta 
> property="com.example.price" content="1000000">.

Right.

> My irritation lies in the element proliferation, requiring one element/ 
> attribute combination for machines, one element/text content combination 
> for humans. Of course, any sane author would arrange both elements in a 
> close relation, as parent/child or sibling but there would be still two 
> different elements to maintain, leading to a higher cognitive load. Not 
> just for authors but also for programmers: a fluctating price had to be 
> actualized on two different elements; tree walking DOM scripts had to 
> take meta-Elements in account. Furthermore it clashes with the familiar 
> habit of other elements in HTML. A hyperlink is one element with a 
> machine-readable attribute and human- readable text content. A citation 
> is one element with a machine-readable reference and human-readable text 
> content. The same model is used in <meter>, <progress>, <time>, <abbr> 
> ... but not in user-defined objects. I'd prefer an additional 
> @content-like attribute which supersedes the text content and maybe even 
> the default values of the other value-bearing elements, reducing two 
> different elements to maintain or change to just one.

I don't really understand what you are proposing. How would you reduce the 
number of places where the value is represented?

I think on the long run if we find particular data types (such as numbers) 
are commonly used in these scenarios, we'll just introduce a new element 
like we did for dates and times (<time>), which can then just render the 
number in a locale-specific manner automatically.

> > Instead, let us try using the regular "IDREF" functionality that HTML 
> > uses in a variety of other places, like <label for="">. For this we'll 
> > need a new attribute, but unfortunately we can't use about="" (which 
> > would be the obvious name to use), because that would conflict with 
> > RDFa, so instead we'll use subject="":
> 
> I'm slighty irritated by the implied change from active, possessive 
> formulating (“The cat has the name Hedral.”) to something more passive-y 
> (“Hedral is a name owned by that cat.“). My mental model for property 
> relationships orients itself more on the former wording; link 
> relationships are similar in that regard. @about/@subject are like @rev; 
> a @resource alias @rel would feel more natural. There are practical 
> relation by the missing @resource, I think. Imagine a document 
> documenting an household and a household vocabulary which allows triples 
> of <human>s which are in an <owner> relationship to a <cat>. Given an 
> household of two humans and one cat; how does one markup the assumption 
> that the cat has two owners?

I agree that there are use cases for both subject="" like "reverse" links 
back to an item, and "inclusion"-style links that embed other data into an 
item. I have a rough proposal in the form of a <ref> element that could be 
used to do inclusions (based somewhat on the Microformats include 
pattern), but I think we should wait to see how well microdata works in 
the wild before we start adding more features. If it turns out that 
microdata doesn't work, then there's no point worrying about inclusions!

On Tue, 12 May 2009, Eduard Pascual wrote:
>
> First issue: it solves a (major) subset of what RDFa would solve. 
> However, it has been taken as a requirement to avoid 
> clashes/incompatibilities with RDFa. In other words, as things stand, 
> authors will face two options: either use RDFa in HTML5, which would 
> forsake validation but actually work; or take a less powerful, less 
> supported (at least for now: many RDFa-aware agents vs. zero HTML5's 
> microdata -aware agents) that validates but provides no pragmatic 
> advantages.
>
> IMO, an approach that forces authors to choose between 
> validity/conformance which doesn't *yet* works vs. invalid solutions 
> that actually work is a horrible idea: it encourages authors to forsake 
> validity if they want things to work.
>
> Wouldn't the RDFa + @prefix solution suggested many times work better 
> and require less effort (for spec writters, for implementors, and for 
> content authors)? Keep in mind that I don't think RDFa + @prefix is the 
> solution we need; I'm just trying to point out that the current approach 
> is even worse than that.

I covered the problems with RDFa in the e-mail introducing microdata:

   http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html

I agree that introducing yet another proposal for this is unfortunate, but 
I couldn't find a way to address RDFa's issues (or Microformat's issues, 
or the issues with a number of other technologies I looked at) without 
changing RDFa in an incompatible manner. I believe the problems with 
introducing yet another technology to do this are minor compared to the 
problems with the existing technologies.

> Second issue: as the "decaffeinated RDFa" it is, the HTML5 Microdata 
> approach tends to fail where RDFa itself fails. It's nice that, thanks 
> to the <time> element, the problem with trying to reuse human-readable 
> dates as machine-readable is dodged; but there are other cases where 
> separate values might be needed: for example using a street address for 
> the human-readable representation of a location and the exact geographic 
> coordinates as the machine-readable (since not all micro-data parsers 
> can rely on Google Maps's database to resolve street addresses, you 
> know); or using a colored name (such as "lime green" displayed on lime 
> green color) as the human-readable representation of a color, and the 
> hexcode (like #00FF00) as the machine-readable representation. These are 
> just the cases from the top of my head, and this can't be considered in 
> any way a complete list. While *favoring* the reuse of human-readable 
> values for the machine-readable ones is appropiate, because it's the 
> widely most common case, *forcing* that reuse is a quite bad idea, 
> because it is *not* the *only* case.

Microdata doesn't force that reuse; you can use <meta> instead.

> Third issue: also a flaw inherited from RDFa, it can be summarized as 
> completelly ignoring the requirement I submitted to this list on April 
> 28th, in reply to Ian asking us to review the use cases [1]. I'll try to 
> illustrate it with a example, inspired by the original use-case: Let's 
> say someone's marking up a collection of iguanas (or cats, or even CDs, 
> doesn't really make a difference when illustrating this issue), making a 
> page for each iguana (or whatever) with all the details for it; and then 
> making an "index" page listing the maybe 20 iguanas with their name, 
> picture, and link to the corresponding page. Adding micro-data to that 
> "index", either with RDFa or with Ian's microdata proposal, would 
> involve stating 20 times in the markup something like "this is the 
> iguana's picture; this is the iguana's name; and this is the iguana's 
> URL". It would be preferable to be able to state something like "each 
> (row) <tr> in the <table> describes an iguana: the <img>s are each 
> iguana's picture, the contents of the <a>'s are the names, and the @href 
> of the <a>'s are the URLs to their main pages" just once. If I only need 
> to state the table headings once for the users to understand this 
> concept, why should a micro-data consumer require me to state it 20 
> times, once for each row? Please note how such a page would be quite 
> painful to maintain: any mistake in the micro-data mark-up would 
> generate invalid data and require a manual harvest of the data on the 
> page, thus killing the whole purpose of micro-data. And repeating 
> something 20 (or more) times brings a lot of chances to put a typo in, 
> or to miss an attribute, or any minor but devastating mistake like 
> these.

I agree entirely. I actually tried to find a workable solution to address 
this but unfortunately the only general solutions I could come up with 
that would allow this were selector-based, and in practice authors are 
still having trouble understanding how to use Selectors even with CSS. 
There's also the problem with separating the data from the rules that say 
how to interpret the data, which would likely lead to more problems than 
the typos one would get from repeating the itemprop=""s.

I'll probably look at this again in more detail when responding to the 
thread specifically on this topic.

> Last, but not least, I'm not sure if it was wise to start defining a 
> solution while some of the requirements seem to be still under 
> discussion.

Well, they were under discussion for about a year; at some point we have 
to just do something, or else this feature would have missed the HTML5 
train altogether.

On Wed, 13 May 2009, Eduard Pascual wrote:
> On Sun, May 10, 2009 at 12:32 PM, Ian Hickson <ian at hixie.ch> wrote:
> > [...]
> >     * Any additional markup or data used to allow the machine to understand
> >       the actual information shouldn't be redundantly repeated (e.g. on each
> >       cell of a table, when setting it on the column is possible).
> >
> > This isn't met at all with the current proposal. Unfortunately the only
> > general solutions I could come up with that would allow this were
> > selector-based, and in practice authors are still having trouble
> > understanding how to use Selectors even with CSS.
> 
> First, I'd like to ask for a clarification from Ian: what do you mean by 
> "autrhos are still having trouble understanding how to use Selectors"? 
> If you mean that they have trouble when trying to select something like 
> "the second cell of the first row that has a 'foo' attribute different 
> from 'bar' within tables that have four or more rows" or even more 
> obscure stuff, then I should agree: most authors will definitely have 
> trouble dealing with so complex cases, and I bet many will always have 
> such trouble. However, if you mean that authors can't deal with simple 
> class, id, and/or children/descendant selectors, then I think you are 
> seriously understimating authors.

I was referring to the kind of selectors one would need to make good use 
of a semantic-extraction mechanism using selectors.

> Actually, I was thinking about the cost of deploying implementations, 
> rather than writting them, since RDFa consumers are already out there 
> and working. This, however, strays a bit out of the original idea: it's 
> not really a matter of how big the cost is on its own, but of what do 
> you get for that cost. This is probably my own fault, but I still fail 
> to see what Ian's suggestion offers that RDFa doesn't; so my impression 
> is that these costs, even if they are small, are buying nothing, so they 
> are not worth it. If someone is willing to highlight what makes this 
> proposal worth the costs (ie: what makes it better than RDFa), I'm 
> willing to listen.

The main benefit as I see it is is simplicity. I've seen even RDFa 
advocates stumble over exactly what triples get generated from some 
particularly gnarly RDFa snippets. With microdata, there's really no way 
to form a gnarly snippet. Each group is introduced by an item="", each 
name/value pair is introduced by an itemprop="", and there's subject="" to 
handle non-nested cases. No prefixes, no CURIEs, no rel vs rev, no data 
types, no hanging triples, no resource="" vs src="", etc.

On Tue, 12 May 2009, Shelley Powers wrote:
> > 
> > If we can come up with a way of using the string "foaf:name" without 
> > having to declare "foaf" in each document, I'm totally in agreement. 
> > I've considered maybe registering the "foaf" URL scheme, or using some 
> > other punctuation character and having people register prefixes, but I 
> > don't know what punctuation character to use (':' and '.' are both 
> > taken).
>
> But then we would lose the extensibility, which is the power behind all 
> of this.

I don't see why registering particular prefixes involves losing 
extensibility... aren't URI's extensible?

> But regardless, the majority of people will include metadata markup by 
> installing a plug-in or module, and making a couple of choices. And if 
> you put together a good ten-minute tutorial for the average developer, 
> they'll have no problem with "foaf:name". Training and clarity of 
> communication is much ore important than form, it always has been with 
> technology.

I think you significantly underestimate the difficulty of getting Web 
authors interested in doing the right thing.

On Wed, 13 May 2009, Leif Halvard Silli wrote:
>
> CSS and selectors appears to be one of the best understood technologies 
> of the web.

That certainly hasn't been my experience, at least not with anything 
beyond the simplest of selectors.

On Wed, 13 May 2009, Giovanni Gentili wrote:
> >
> > If we can come up with a way of using the string "foaf:name" without 
> > having to declare "foaf" in each document, I'm totally in agreement. 
> > I've considered maybe registering the "foaf" URL scheme, or using some 
> > other punctuation character and having people register prefixes, but I 
> > don't know what punctuation character to use (':' and '.' are both 
> > taken).
> 
> put in HTML5 some predefined prefixes for @itemprop:
> 
> dc = http://purl.org/dc/terms/
> foaf = http://xmlns.com/foaf/0.1/
> vcard = http://www.w3.org/2001/vcard-rdf/3.0#
> owl = http://www.w3.org/2002/07/owl#
> rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns#
> rdfs = http://www.w3.org/2000/01/rdf-schema#
> sioc = http://rdfs.org/sioc/ns#
> skos = http://www.w3.org/2004/02/skos/core#
> xsd = http://www.w3.org/2001/XMLSchema#

If we're going to predefine things, we're better off predefining all the 
terms so that we don't need prefixes at all.

> also, instead of @item @itemprop @subject
> is better @item @prop @subj

Why is that better?

> or @rdf-typeof @rdf-property @rdf-about (and @rdf-rel)

This isn't really about RDF, so I think that would be confusing.

On Thu, 14 May 2009, Shelley Powers wrote:
>
> Actually, I believe there are other differences, as others have pointed 
> out.
> 
> http://www.jenitennison.com/blog/node/103
> 
> http://realtech.burningbird.net/semantic-web/semantic-web-issues-and-practices/holding-on-html5
> 
> Some of the differences have resulted in more modifications to the 
> underlying HTML5 spec, which is curious, because Ian has stated in 
> comments that support for RDF is only a side interest and not the main 
> purpose behind the microdata section.

Even side interests need some love.

On Thu, 14 May 2009, Philip Taylor wrote:
> 
> If I understand RDF correctly, the idea is that everything can be
> URIs, subjects and objects can instead be blank nodes, and objects can
> instead be literals. If we restrict literals to strings (optionally
> with languages), then I think all triples must follow one of these
> eight patterns:
> 
>   <urn:subject> <urn:predicate> <urn:object> .
>   <urn:subject> <urn:predicate> "object" .
>   <urn:subject> <urn:predicate> "object"@lang .
>   <urn:subject> <urn:predicate> _:X .
>   _:X <urn:predicate> <urn:object> .
>   _:X <urn:predicate> "object" .
>   _:X <urn:predicate> "object"@lang .
>   _:X <urn:predicate> _:Y .
> 
> These cases can be trivially mapped into HTML5 microdata as:
> 
>   <div item>
>     <link itemprop="about" href="urn:subject">
>     <link itemprop="urn:predicate" href="urn:object">
>   </div>
> 
>   <div item>
>     <link itemprop="about" href="urn:subject">
>     <meta itemprop="urn:predicate" content="object">
>   </div>
> 
>   <div item>
>     <link itemprop="about" href="urn:subject">
>     <meta itemprop="urn:predicate" content="object" lang="lang">
>   </div>
> 
>   <div item>
>     <link itemprop="about" href="urn:subject">
>     <meta itemprop="urn:predicate" item id="X">
>   </div>
> 
>   <link subject="X" itemprop="urn:predicate" href="urn:object">
> 
>   <meta subject="X" itemprop="urn:predicate" content="object">
> 
>   <meta subject="X" itemprop="urn:predicate" content="object" lang="lang">
> 
>   <meta subject="X" itemprop="urn:predicate" item id="Y">

On Fri, 15 May 2009, Philip Taylor wrote:
> 
> Hmm, I think I'm wrong here. 'id' has to be unique, which means this 
> pattern won't work if _:X is the object for triples with two different 
> subjects.

Right; this would require the "include pattern" idea from Microformats, 
which I have some ideas on (using a <ref itemprop="" href=""> element), 
but I think we should wait to see how microdata fares as is first (none of 
the use cases for microdata actually needed this).

On Mon, 18 May 2009, Eduard Pascual wrote:
>
> Ian's initial message goes step by step through the creation of this new 
> syntax; but does *not* mention at all *why* it was being created on the 
> first place. The insight into the choices taken is indeed a good think, 
> and I thank Ian for it; but he omitted to provide insight into the first 
> choice taken: discarding the multiple options already available (not 
> only Microformats and RDFa, but also other less discussed ones such as 
> eRDF, EASE, etc). Sure, there has been a lot of discussion on this 
> topic; and it's possible that the choice was taken as part of such 
> discussions. In any case, I think Ian should have clearly stated the 
> reasons to build a brand new solution when many others have been out for 
> a while and users have been able to try and test them.

I didn't list every solution I considered (such as eRDF and EASE) because 
those solutions were already so widely criticised by the supporters of 
technologies like RDFa and Microformats that I didn't really think there 
was any point going into more detail about those.

eRDF's main problem is the use of class=""; like Microformats, this causes 
confusion with authors.

RDF EASE separates the semantics from the data, which in general I believe 
would lead to a very brittle system.

Both also suffer from many of the indirection problems inherent of any 
prefix-based system.

However, like Microformats, nothing prevents either of these systems from 
being used with HTML5 today, as far as I can tell. Unlike RDFa, they do 
not require any changes to the underlying markup language. If they are 
better than microdata, then they will see adoption, and we can drop 
microdata from the HTML5 draft.

> Ok, the syntax is simpler for a subset of the use cases; but it leaves 
> entirely out the rest of use cases.

As far as I'm aware, microdata handles all the use cases that were 
mentioned that are also solved by RDFa and Microformats (as well as 
solving a number of other use cases that the other two don't solve, such 
as the drag-and-drop cases). If there are use cases that should be 
addressed that I did not address, please raise them.

> *extensibility*. And, with over two decades between versions of the 
> specs, this is a strong requirement: if a problem is noticed after HTML5 
> becomes "the standard", it's essential to be able to solve it without 
> waiting 10 or 20 years for HTML6 to come out. In addition, your alleged 
> "simplified" data model is actually an over-complication, as it is 
> defined in the form of restrictions and/or limitations over RDF's model. 
> Try to explain what can be represented in RDF, and what can be 
> represented with microdata, and you'll see what's simpler.

The only reason there was a long wait between HTML4 and HTML5 is that HTML 
was abandoned by the W3C. As soon as someone wanted to maintain HTML 
again, development resumed. There's no reason to believe it would take 20 
years to add a new feature -- it could be done overnight if there was the 
will and the need.

> Henri:
> > I consider it an advantage that reverse domains don't suggest that you 
> > should try dereferencing identifiers as if they were addresses.
>
> But they still try to look address-like enough. Lots of users will try 
> to "reorder" the domain and recover the address behind it (without 
> knowing that there isn't a real address behind it), with the obvious 
> confussion when they fail at all attempts (since something that doesn't 
> exist can't be recovered).

I don't see any reason to believe this is the case. Do people do this with 
Java identifiers? I've never heard of anyone trying to do that.

On Fri, 15 May 2009, Eduard Pascual wrote:
> On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak <mjs at apple.com> wrote:
> >
> > From my cursory study, I think microdata could subsume many of the use 
> > cases of both microformats and RDFa.
>
> Maybe. But microformats and RDFa can handle *all* of these cases.

Actually, this isn't true. For example, neither Microformats and RDFa have 
a DOM or integration with the drag-and-drop API in HTML5.

> Allright, an API may be a benefit. Most probably it is. However, a 
> similar API could have been built from RDFa, or eRDF, or EASE, or any 
> other already existing or new solution

As has been discussed, I did consider RDFa, but IMHO it has some fatal 
problems that make it inappropriate for text/html. eRDF's use of "class" 
is suboptimal, and it hasn't been found especially popular amongst the RDF 
community. EASE would be significantly more complicated for user agents to 
implement as a DOM API than the other three.

> Now microdata comes out, some drawbacks are highlighted in comparison 
> with RDFa (lack of typing,

As far as I can tell, per-instance typing (as opposed to per-class typing, 
where all properties of a particular name are implicitly typed) is not a 
requirement for any of the use cases that were listed and that I tried to 
address. If there are use cases that require per-instance typing, please 
do bring them up so that they can be considered.

> inability to depict the full RDF model, 

Depicting the RDF model isn't necessary to resolve the use cases that were 
brought up; that there is any RDF support at all is opportunistic rather 
than a design goal. (That is, adding RDF support was simple enough to do 
that I figured I might as well, rather than something that's key to the 
use cases listed.)

> Reversed domains are as ugly as CURIEs (but at least CURIEs resolve to 
> something useful, while reversed domains often don't resolve at all)

As Henri has pointed out, not resolving is actually a benefit; there has 
been ample evidence that making identifiers resolve causes more problems 
than it solves. For example:

   http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

> what does microdata provide to make up for its drawbacks?

A simpler, more usable syntax; a DOM API; and integration with the HTML5 
drag-and-drop model, amongst other things that other have listed.

> Of course. I'm willing to experiment. I experimented with RDFa some
> time ago and found it unsuitable for my needs. I have taken a look at
> microdata and it isn't any more suitable.

I will look carefully at your proposal shortly; hopefully that thread will 
clarify what your needs are.

> Technical problems have already been cited. Here's a summary, just 
> review the thread for further details on each:
>
> - Microdata can't handle non-string objects: even if I use the <time> 
> element to mark up a date with microdata, it will be taken as a string 
> rather than as a date. Since tools may be relying on explicit typing for 
> some tasks, this limitation renders microdata less usable than RDFa.

What are the use cases for which this matters? As far as I can tell, 
typing was not necessary for any of the use cases that were brought up, 
which are listed in the e-mails cited here:

   http://lists.w3.org/Archives/Public/public-html/2009May/0207.html

For example, consider the calendar use cases. The output format is vEvent, 
and the typing information doesn't affect the conversion.

> - Microdata is an entirely new syntax. It requires implementation and 
> deployment of consumers before it can be used at all; while RDFa has 
> already gone through this step. (This is not a very serious problem, but 
> is a cost to be considered.)

I think that relative to the long term goals here, no format has really 
gotten significant traction. RDFa and Microformats have the most usage, 
but even they see virtually no usage when examined at the scale of the Web 
as a whole.

This isn't a criticism of RDFa or Microformats; but I do think it is naive 
to claim that RDFa has "gone through the implementation and deployment 
stage". Note that implementation of microdata fetaures has been shown to 
be basically trivial -- two people implemented basically the entire 
microdata model within 24 hours of hearing of the proposal (in fact one of 
them did it between when I checked in the spec text and went to bed, and 
when I woke up the next morning).

> - Microdata can't represent the full RDF data model (while RDFa can): 
> some complex structures are just not expressable with microdata.

As far as I can tell none of the use cases actually need RDF at all, let 
alone the more complex features of RDF, so this doesn't seem to be a 
problem in practice.

(The only exception might be the validation use case, for which one 
possible solution is reusing RDFS/OWL, but personally my stance on 
validation is that schemas aren't a good solution and hard-coded tools are 
better, so I'm not really convinced of even this exception.)

> - Microdata inherits all of the flaws from RDFa (except for the use of 
> CURIEs, which some people here consider an inconvenient). For example, 
> marking up a list of items requires lots of redundant code.

Yes, it would be nice to find a better solution for this that doesn't 
involve external resources or Selectors.

> - Microdata relies on reversed domains. While some people argue these to 
> be better than CURIEs, they are equally horrendous for the average user, 
> and have the additional disadvantage that they don't map to anything 
> useful (if they map to something at all), while CURIEs map to the 
> descriptions and/or definitions of what they represent.

Not mapping to anything is an intentional design decision; I do not 
believe that having identifiers be resolvable as URIs is good design. I do 
not think that reversed-DNS identifiers (which are basically just opaque 
strings with dots in them) are anywhere near as horrendous as prefix-based 
solutions or even straight URIs, though straight URIs are supported in 
microdata if that is preferred.

On Fri, 15 May 2009, Shelley Powers wrote:
> 
> You don't have to take my word for it -- check out Philip's testing demo 
> for microdata. You get triples with the following:
> 
> http://www.w3.org/1999/xhtml/custom#com.damowmow.cat
> 
> http://philip.html5.org/demos/microdata/demo.html#output_ntriples
> 
> Not only do you face problems with link rot, you also face a significant 
> amount of confusion, as people look at that and go, "What the hell is 
> that?"

If anyone has a better suggestion for how to map reversed DNS names to RDF 
identifiers, I'd be glad to update the spec. (The only requirement is that 
it not be possible for two different conforming microdata identifiers to 
result in the same RDF identifier -- to this end, for instance, the 
"http://www.w3.org/1999/xhtml/custom#" prefix is not conforming in any 
itemprop="" names in HTML5.)

The current mapping is just there because I couldn't find a better 
solution. RDF doesn't seem to support non-URI identifiers.

> But hey, you've given me another idea. I think I'll create my own 
> vocabulary items, with the reversed DNS 
> http://www.w3.org/1999/xhtml/custom#com.sun.*. No, maybe 
> http://www.w3.org/1999/xhtml/custom#com.opera.*. Nah, how about 
> http://www.w3.org/1999/xhtml/custom#com.microsoft.*. Yeah, that's cool. 
> And there is no mechanism is place to prevent this, because unlike 
> "regular" URIs, where the domain is actually controlled by specific 
> entity, you've created the world famous W3C fudge pot. Anything goes.

What stops you from creating http://sun.com/* identifiers?

Reversed DNS names use the exact same registration mechanism as URIs.

> But Foobar takes a dive in the dot com pool, and foobar.com gets taken 
> over by a porn establishment. Yeah, I can't wait for people to explain 
> that one to the boss. Just because it doesn't link, won't mean it won't 
> end up on Twitter as a big, huge joke.

This seems to be a problem for URIs far more than reversed DNS names...

On Fri, 15 May 2009, Manu Sporny wrote:
> Kristof Zelechovski wrote:
> > 
> > (WHATWG wants HTML documents to be readable 1000 years from now.)
> 
> Is that really a requirement?

I believe Kristof is refering to my personal desire to write documentation 
that describes the Web infrastructure in enough detail that a software 
archeologist 1000 years from now, armed only with the spec and an archive 
of today's Web content, would be able to write a user agent to render the 
Web content as today's browsers do, without having to reverse engineer the 
expected behaviour from the content.

> Also, why 1000 years, that seems a bit arbitrary? =P

The number is arbitrary; I just mean a sufficiently long time that it is 
plausible that there would be no way to actually run today's software.

On Thu, 14 May 2009, Eduard Pascual wrote:
>
> But *why* restrict literals to strings?? Being unable to state that 
> "2009-05-14" is a date makes that value completely useless

This is demonstrably false -- vEvent blobs in HTML5 microdata have a 
defined mapping to raw iCalendar data, including roundtripping of dates, 
despite their being treated as strings.

> it would only be useful on contexts where a date is expected (bascially, 
> because it is a date), but it can't be used on such contexts because the 
> tool retrieving the value has no hint about it being a date. Same is 
> true for integers, prices (a.k.a. decimals plus a currency symbol), 
> geographic coordinates, iguana descriptions, and so on.

You can distinguish all of those by inspection.

In any case, why would you put a price in the same field as a date? If you 
use stringly typed _fields_, rather than _values_, you can sidestep the 
entire issue cleanly.

On Thu, 14 May 2009, Jonas Sicking wrote:
> 
> Some of the improvement suggestions that I have heard that sounds 
> interesting, though possibly for the next version of microdata.
> 
> * Support for specifying a machine-readable value, such as for dates, 
> colors, numbers, etc.

I expect we will add support for these based on demand, the same way we 
added <time> in the first place.

> I even wonder it would allow replacing the <time> element with a 
> standardized microformat, such as:
> 
> Christmas is going down on <span item="w3c.time"
> itemvalue="12-25-2009">The 25th day of December<span>!

I don't really understand how that would be better than dedicated 
elements.

> * Support for tabular data.

This would be nice if we can find a way to do it that doesn't put undue 
burdens on simple implementations. (e.g. I would imagine that while a 
microdata implementation today can be a few hundred lines total, adding 
support for the table model could easily double that.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'