[whatwg] RDFa

Fri Aug 22 14:18:34 PDT 2008

On Fri, 22 Aug 2008, Ben Adida wrote:
> >
> > It would be helpful if you could send a separate message that is 
> > specifically asking for the changes you desire
> 
> Perfectly reasonable: we'll put together a precise proposal regarding: 
> (1) what would need to validate, (2) what would browsers be expected to 
> do, and (3) why we think this is useful.

Just to emphasise: I don't know if you trimmed the quote above just to 
make your e-mail shorter or for some other reason, but the rest of the 
paragraph was actually the most important part. To be explicit, the most
important details in any proposal like this are:

 * What the problem being addressed is.

 * What research shows that it is an important enough problem that it 
   should be addressed.

 * What the requirements are.

For more details on the process here, please see our FAQ:

   http://wiki.whatwg.org/wiki/FAQ#The_WHATWG_Process

> > Indeed, we have design principles that make addressing the needs of 
> > small communities an explicit non-goal.
> 
> How about adding one feature that will help make many small communities 
> happy, each in their own way? That's the power of RDF, and the idea 
> behind RDFa is to enable that distributed innovation within HTML.

Sure, we already have extension mechanisms in HTML for exactly this.

> > Use a unique name, e.g. include a domain name in the name, as in 
> > "license.creativecommons.org" or "home.foaf.w3.org", or use a name you 
> > know isn't used because it's an unusual name, e.g. "cc:license".
> 
> That doesn't scale (unless you expect people to actually use GUIDs with 
> timestamps)

Why would it scale any less than URIs? That's basically all URIs are.

Instead of a name of the form:

   "http://" + domain + "/" + vocabulary-name + "#" + name

...or a namespace-name combination of the form:

   "http://" + domain + "/" + vocabulary-name
   name

...you just use a name of the form:

   name + "." + vocabulary-name + "." + domain

The same varying information is in there, so it's exactly as unique.

> and it's extremely web-unfriendly, since you can't look up a concept to 
> figure out what it might mean.

Sure you can. Just search for it on a search engine. For example:

   http://www.google.com/search?q=hcard
   http://www.google.com/search?q=dc:title
   http://www.google.com/search?q=pingback

In fact, it works better than a URL:

   http://www.google.com/search?q=http://www.w3.org/1999/xhtml

Notice how the latter doesn't really give you a broad view of what XHTML 
is, whereas the former three examples give multiple pages about each name, 
from the official spec, to the wikipedia articles, to tutorials, etc.

> > But in any case HTML5 already has extension mechanisms, so the 
> > discussion should not be over whether RDFa is worth it or not, the 
> > discussion should be over what extension mechanisms RDF needs that 
> > HTML5 doesn't provide.
> 
> Some problems with existing extension mechanisms:
>
> - no way to make statements about another document (a PDF), etc... in a 
> way that is *consistent* across different data types.

Surely including metadata about another document is even worse than 
including metadata in a <script> block in terms of separating the metadata 
from the data?

However, even if you did want to do this, you could certainly come up with 
a consistent mechanism for it. It's not entirely clear to me what the 
requirements are here, but if one wanted to be able to give verb-object 
pairs for a remote page, then one could do something like this:

  <span class="annotation.example.org">
   <a href="ball.html">My Favorite Ball</a>
   by <span class="author.example.com">Dewey</span>,
   published by <span class="publisher.example.com">Dog Books</span>
  </span>

The keywords minted here are annotation.example.org, which indicates (in 
this hypothetical vocabulary) that the children of the element with that 
class are factoids about the page indicated by the <a> element child; and 
"author.example.com" and "published.example.com", which are keywords in an 
unrelated fictional vocabulary giving the author and publisher 
respectively.

This markup doesn't hide any metadata, so it's likely to remain more 
accurate than data hidden in attributes, too.

> - no way to relate two chunks of data within a page, e.g. my friend 
> Alice is the second cousin of my friend Bob.

Why can't you use id="bob"s and href="#bob", along with relevant rel or 
class values?

> - no way to build reusable vocabularies.

What barrier is there to building reusable vocabularies?

> > The failures of the past have had little to do with the syntax or 
> > expression mechanisms. They have to do with users simply not caring.
> 
> They don't care because there are no useful tools for them to care 
> about

"The tools will save us" is about as big a warning sign as you can get.

> > It's a verifiable fact! Just look at metadata like lang="", character 
> > encoding information, Content-Type headers, etc. It's so unreliable 
> > that any serious system that processes large amounts of data from 
> > multiple Web authors always ends up ignoring the metadata (or at best 
> > using it as a hint) and using heuristics to determine the real 
> > information.
> 
> Your assumption is untrue when you get to the Creative Commons 
> community, where lots of organizations and folks care about stating how 
> to give them attribution.

Lots do, but more don't, and thus the data will only be usable within 
certain walled gardens, just as with the examples I gave above.

Consider: Do you not think that lots of organizations and folks care about 
giving correct encoding information and Content-Type data?

> HTML5 should be able to serve smaller communities than "the whole web." 
> We're asking for a solution that is relevant to *lots* of small 
> communities, each in their own way.

Sure, like I said, we have lots of very versatile extension mechanisms 
already.

> > But as soon as this kind of thing is applied to people outside the 
> > tightnit community, the metadata becomes an utter mess, misused, 
> > wrong, missing, syntactically incorrect, semantically incorrect, 
> > unusable. We have shown time and time again that when metadata 
> > mechanisms face the wider Web community, they fail. Ignoring this 
> > doesn't make it go away.
> 
> You're looking at this in a fundamentally broken way.

I was going to say the same thing. :-)

> Henri:
> > For example, in PDF, do people *really* need all this cruft:
> 
> People don't need it, machines do.

No they don't. Again, consider the RDF-blobs-in-HTML-comments stuff. The 
machines don't need the RDF cruft around the metadata, they just need the 
license URI. Tools that process those license statements at scale don't do 
any RDF processing at all.

> If we had an attribute-value-only way of defining prefixes, would that 
> make you happier?

The problem is the prefixes, not the syntax used to declare them.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'