[whatwg] RDFa

Fri Aug 22 16:43:04 PDT 2008

Ian Hickson wrote:
> Just to emphasise: I don't know if you trimmed the quote above just to 
> make your e-mail shorter or for some other reason, but the rest of the 
> paragraph was actually the most important part. To be explicit, the most
> important details in any proposal like this are:
> 
>  * What the problem being addressed is.
> 
>  * What research shows that it is an important enough problem that it 
>    should be addressed.
> 
>  * What the requirements are.

I only meant to make the email shorter, thanks for emphasizing the
important parts.

I worry, though, given your statements below, that we're starting off on
less than an even footing in terms of receptiveness to our proposal. I'm
particularly worried about aspects that are inherently subjective, and
on which you're already quite negatively pre-disposed.

>> How about adding one feature that will help make many small communities 
>> happy, each in their own way? That's the power of RDF, and the idea 
>> behind RDFa is to enable that distributed innovation within HTML.
> 
> Sure, we already have extension mechanisms in HTML for exactly this.

Insufficient, as I've explained four times now :)

> Why would it scale any less than URIs? That's basically all URIs are.

Why would you reinvent URIs in a way that they can't be de-referenced?
Is that really a good design, in your opinion?

>> and it's extremely web-unfriendly, since you can't look up a concept to 
>> figure out what it might mean.
> 
> Sure you can. Just search for it on a search engine.

That's sort of good for humans, and that's assuming there's no bug in
the search engine algorithm where you get, say, Google-bombed. I'm not
sure a web design should be predicated on the existence of Google,
especially when it's not clear that Google will always be able to index
the entire web (it's not clear Google indexes the entire web even today.)

In particular, this is terrible for machines. If I want the
human-readable version of dc:title, or cc:attributionName, I can
automatically fetch them using the follow-your-nose principle of RDF.
Not so with your ad-hoc proposal.

> Notice how the latter doesn't really give you a broad view of what XHTML 
> is, whereas the former three examples give multiple pages about each name, 
> from the official spec, to the wikipedia articles, to tutorials, etc.

Sure, but again you're mixing up the use case of humans and machines.

> Surely including metadata about another document is even worse than 
> including metadata in a <script> block in terms of separating the metadata 
> from the data?

No, it's not. If I have an inline image or video on my web page, the
best compromise (wrt to usability) for where to put the metadata is the
web page itself.

> However, even if you did want to do this, you could certainly come up with 
> a consistent mechanism for it.

Microformats tried to do this, and the result is inconsistent parsing
across microformats.

> It's not entirely clear to me what the 
> requirements are here, but if one wanted to be able to give verb-object 
> pairs for a remote page, then one could do something like this:
> 
>   <span class="annotation.example.org">
>    <a href="ball.html">My Favorite Ball</a>
>    by <span class="author.example.com">Dewey</span>,
>    published by <span class="publisher.example.com">Dog Books</span>
>   </span>

This is quite contrived, precludes naturally written HTML, and is good
only for a small subset of use cases.

How would you point to the creator of an image? Likely again with some
ad-hoc structure that wouldn't be the same parsing model as the example
you give above.

The idea is to have a consistent syntax from which generic metadata can
be extracted. The solution you propose is *exactly* what got previous
solutions in trouble, with parsers having to be upgraded for every new
use case.

We want one parser, with variability and innovation in the vocabulary
definition only.

> This markup doesn't hide any metadata, so it's likely to remain more 
> accurate than data hidden in attributes, too.

RDFa doesn't hide metadata any more than your example does.

> Why can't you use id="bob"s and href="#bob", along with relevant rel or 
> class values?

You can with RDFa using the @about attribute, but with your ad-hoc
approach, you'd have to come up with yet another parsing model and
contrived element structure for explicitly identified chunks.

Believe me, we looked at a *lot* of solutions.

> What barrier is there to building reusable vocabularies?

The follow-your-nose principle is missing, which is fairly essential for
discovering the meaning of vocabularies (partially automatically, not by
doing a Google search.)

>>> The failures of the past have had little to do with the syntax or 
>>> expression mechanisms. They have to do with users simply not caring.
>> They don't care because there are no useful tools for them to care 
>> about
> 
> "The tools will save us" is about as big a warning sign as you can get.

I didn't say that. I said that when you preclude good tools, then you're
doomed. Tools are not sufficient, but they often play an important role.

>> Your assumption is untrue when you get to the Creative Commons 
>> community, where lots of organizations and folks care about stating how 
>> to give them attribution.
> 
> Lots do, but more don't, and thus the data will only be usable within 
> certain walled gardens, just as with the examples I gave above.

You know this about the Creative Commons community... how? We know that
a large portion of our users want to express this. We've known it since
the first version of the license when they begged us for a mechanism. We
also know we have 150M+ CC-licensed items on the web. And we are by no
means the complete RDFa community.

> Consider: Do you not think that lots of organizations and folks care about 
> giving correct encoding information and Content-Type data?

Why would they care when it looks okay to them in IE? I think you and I
agree that if the user sees no benefit, they won't do anything.

The difference is that I believe we *can* absolutely build an ecosystem
where the user sees significant benefit, but only if we have a
consistent, long-lived syntax. The shelf life of a microformat parser is
pretty short, because it needs to be updated with every new microformat.
The lifetime of an RDFa parser is quite long.

We're interested in publishers who *want* to put out machine readable
data. There are quite a number of them, though of course it's still a
small percentage of the web (but is it less than the # who want to use a
Progress Bar? I don't know....)

With a new class of browsers (or browser extensions at first),
publishers will have a clear motivation for publishing this
machine-readable data. They'll be able to check "whether or not it
works" in tools like Operator, or Fuzzbot, etc...

That's why I say the tools are necessary. Not sufficient, but necessary.
And the tools need the proper standards to survive.

Will this be useful to everyone on the web? No. But it will be useful to
a significant portion, IMO.

> Sure, like I said, we have lots of very versatile extension mechanisms 
> already.

And as I keep pointing out, the syntax enabled by the existing extension
mechanism is not generic enough for the classes of data folks need to
express. With RDFa, they would be generic enough.

>>> For example, in PDF, do people *really* need all this cruft:
>> People don't need it, machines do.
> 
> No they don't. Again, consider the RDF-blobs-in-HTML-comments stuff. The 
> machines don't need the RDF cruft around the metadata, they just need the 
> license URI. Tools that process those license statements at scale don't do 
> any RDF processing at all.

But... I already told you that we're trying to do more than just the
license statement. You can disagree with the goal, but you can't then
act like my solution is inconsistent with my goal. My solution is
inconsistent with what *you think my goal should be*. But my goal is
different. We need to express a *LOT* more than just a license statement.

The RDF-in-HTML comments was a horrible solution, we all agree, for many
reasons. That's why we're changing it.

> The problem is the prefixes, not the syntax used to declare them.

Like I said, I don't see our proposal getting much of a fair shot when
such broad, subjective statements are made. These statements dismiss
core concepts of sound design. You're also taking the opposite extreme
of the behavior you criticize, as if *no* web pages are created using
tools. But in fact, many web pages are. Facebook, MySpace, Flickr,
Google Base, all of the Yahoo content properties, etc... All of these
are produced using templates. Adding RDFa to them would be trivial.

If you are the sole estimator of what the average user will understand
or need, and if you already dismiss the needs we articulate in ccREL,
then it may not be worth our time to actually write up the proposal.

I'll have to think about it.

-Ben