[whatwg] Annotating structured data that HTML has no semantics for

Mon May 18 04:14:32 PDT 2009

On May 18, 2009, at 12:18, Julian Reschke wrote:

> Henri Sivonen wrote:
>> There's no indirection. A decade of Namespaces in XML shows that  
>> both authors and implementors have trouble getting prefix-based  
>> indirection right.
>
> It's true that people get this wrong again and again. But it's also  
> true that lots of developers understand it once for all, and then  
> consistently get it right.
>
> The interesting question here is whether there's a better system.

  1) Centralized allocation of short names.
  2) Prefixing a short name by (an abbreviation of) the name of the  
vocabulary, which makes the probability of collision negligible once  
the designer has googled to check the probable absence of public  
collisions at minting time (e.g. "openid.delegate").

>>> I have been a Java programmer for some years, and
>>> still find that convention absurd, horrible, and annoying. I'll  
>>> agree
>>> that CURIEs are ugly, and maybe hard to understand, but reversed
>>> domains are equally ugly and hard to understand.
>> Problems shared by CURIEs, URIs and reverse DNS names:
>> * Long.
>> * Identifiers outlive organization charts.
>
> That depends on the choice of the URI scheme.

I guess one could use e.g. "data:,foo" URIs as a namespace URI, but  
why not just use "foo"?

>> Problems that reverse DNS names and URIs don't have but CURIEs have:
>> * Prefix-based indirection.
>
> HTML developers regularly have to deal with a much more complicated  
> indirection mechanism (CSS).

This would be a persuasive argument if we were reasoning about a  
feature we don't have experience with yet. However, experience shows  
prefix-based indirection is too hard. If at the same time CSS isn't  
too hard, I just have to accept the evidence from the real world even  
if it defies reasoning.

>> The syntax is simpler for the use cases it was designed for. It  
>> uses a simpler conceptual model (trees as opposed to graphs). It  
>> allows short token identifiers. It doesn't use prefix-based  
>> indirection. It doesn't violate the DOM Consistency Design Principle.
>
> (devil's advocate argument) - so how does the syntax behave for  
> those use cases it *hasn't* been designed for?

That's hard to test, because the use case search has been exhausted  
for the moment. It seems we'd need to wait to see new use cases to pop  
up.

>> RDFa uses a data model that is an overkill for the use cases.
>
> It would be interesting to understand which use cases that RDFa can  
> do are not supported by "microdata" (I don't understand enough about  
> the subject to try myself), and whether the potential advantage of  
> having a simpler model outweighs the disadvantage of not using  
> network effects and creating a competing syntax.

Are there use cases of RDFa that are currently known but that the call  
for use cases didn't turn up?

Either @prefix or RDFa-profiles would break the network effects of the  
deployment of outside-of-REC RDFa-in-XHTML-as-text/html, so if  
breaking network effects is on the table in the form of @prefix and  
RDFa-profiles, I don't see why microdata wouldn't be on the table as  
far as network effects go.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/