[whatwg] Annotating structured data that HTML has no semantics for

Mon May 18 01:38:59 PDT 2009

On May 14, 2009, at 23:52, Eduard Pascual wrote:

> On Thu, May 14, 2009 at 3:54 PM, Philip Taylor <excors+whatwg at gmail.com 
> > wrote:
> It doesn't matter one syntax or another. But if a syntax already
> exists (RDFa), building a new syntax should be properly justified.

It was at the start of this thread:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html

> As
> of now, the only supposed benefit I have heard of for this syntax is
> that it avoids CURIEs... yet it replaces them with reversed domains??
> Is that a benefit?

There's no indirection. A decade of Namespaces in XML shows that both  
authors and implementors have trouble getting prefix-based indirection  
right.

(If we were limited to reasoning about something that we don't have  
experience with yet, I might believe that people can't be too inept to  
use prefix-based indirection. However, a decade of actual evidence  
shows that actual behavior defies reasoning here and prefix-based  
indirection is something that both authors and implementors get wrong  
over and over again.)

> I have been a Java programmer for some years, and
> still find that convention absurd, horrible, and annoying. I'll agree
> that CURIEs are ugly, and maybe hard to understand, but reversed
> domains are equally ugly and hard to understand.

Problems shared by CURIEs, URIs and reverse DNS names:
  * Long.
  * Identifiers outlive organization charts.

Problems that reverse DNS names don't have but CURIEs and URIs do have:
  * "http://" 7 characters of even extra length.
  * Affordance of dereferencability when mere identifier sementics are  
meant.

Problems that reverse DNS names and URIs don't have but CURIEs have:
  * Prefix-based indirection.
  * Violation of the DOM Consistency Design Principle if xmlns:foo used.

>> (I understand that if the microdata syntax offered no advantages  
>> over RDFa,
>> then it would be a wasted effort to diverge.
> Which are the advantages it offers?

The syntax is simpler for the use cases it was designed for. It uses a  
simpler conceptual model (trees as opposed to graphs). It allows short  
token identifiers. It doesn't use prefix-based indirection. It doesn't  
violate the DOM Consistency Design Principle.

On May 15, 2009, at 14:11, Eduard Pascual wrote:

> On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak <mjs at apple.com>  
> wrote:
>> [...]
>> From my cursory study, I think microdata could subsume many of the  
>> use cases
>> of both microformats and RDFa.
> Maybe. But microformats and RDFa can handle *all* of these cases.
> Again, which are the benefits of creating something entirely new to
> replace what already exists while it can't even handle all the cases
> of what it is replacing?

Compared to microformats, microdata defines the processing model and  
conformance criteria. The microformats community has failed to provide  
processing model and conformance criteria on similar level of detail.  
The processing model side is perceived to be such a serious issue that  
the lack of a unified microformats parsing spec is cited as a  
motivation to use RDFa instead of microformats.

>> It seems to me that it avoids much of what microformats advocates  
>> find objectionable
> Could you specify, please? Do you mean anything else than WHATWG's
> almost irrational hate toward CURIEs and everything that involves
> prefixes?

RDFa uses a data model that is an overkill for the use cases.

>> but at the same time it seems it can represent a full RDF data
>> model.
> No, it *can't* represent a full RDF model: it has already been shown
> several times on this thread.

That's a feature.

> Wait. Are you refering to microdata as an incremental improvement over
> RDFa?? IMO, it's rather a decremental enworsement.

That depends on the point of view. I'm sensing two major points of view:

1) Graphs are more general than trees. Hence, being able to serialize  
graphs is better.

2) Graphs are more general than trees. Hence, graphs are harder to  
design UIs for, harder to traverse and harder for authors to grasp.  
Hence, if trees are enough to address use cases, we should only enable  
trees to be serialized.

I subscribe to view #2, and it seems that trees are indeed enough for  
the use cases (that were stipulated by the pro-graph people!).

> - Microdata can't represent the full RDF data model (while RDFa can):
> some complex structures are just not expressable with microdata.

That's not a use case. That's "theoretical purity".

> - Microdata relies on reversed domains. While some people argue these
> to be better than CURIEs, they are equally horrendous for the average
> user, and have the additional disadvantage that they don't map to
> anything useful (if they map to something at all), while CURIEs map to
> the descriptions and/or definitions of what they represent.

I consider it an advantage that reverse domains don't suggest that you  
should try dereferencing identifiers as if they were addresses.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/