[whatwg] RDFa

Ian Hickson ian at hixie.ch
Fri Aug 22 22:50:10 PDT 2008

On Fri, 22 Aug 2008, Ben Adida wrote:
> > Why would it scale any less than URIs? That's basically all URIs are.
> Why would you reinvent URIs in a way that they can't be de-referenced? 
> Is that really a good design, in your opinion?

It seems to work well for the Java community.

I'd ask the opposite question. Why would you reinvent keywords in a way 
that they seem to be resource addresses? Is that really a good design, in 
your opinion? URIs are long, typically difficult to remember, certainly 
difficult to type, cause people to dereference them when not necessary 
(witness the problems the W3C has had with load on their schema and 
namespace URIs), and, as shown in my last message, are actually more 
difficult to obtain information on than keywords. Indeed they are so 
long and hard to use that people usually try to work around them by using 
prefixes to shorten them ("...and now they have two problems").

> >> and it's extremely web-unfriendly, since you can't look up a concept 
> >> to figure out what it might mean.
> > 
> > Sure you can. Just search for it on a search engine.
> That's sort of good for humans

Well, yes, I was responding to your comment regarding looking up a concept 
to figure out what it might mean.

> and that's assuming there's no bug in the search engine algorithm where 
> you get, say, Google-bombed.

That's no more likely (indeed it's arguably less likely) than a URI 
pointing to a server that's down, or pointing to a server that's changed 
ownership so the page is now 404, or any number of other similar problems 
that can occur with URIs.

> I'm not sure a web design should be predicated on the existence of 
> Google, especially when it's not clear that Google will always be able 
> to index the entire web (it's not clear Google indexes the entire web 
> even today.)

It doesn't have to index the entire Web for this, it has to index the 
documentation of the identifier. Also, Google is by far not the only 
search engine. I think it would be ridiculous to think that search engines 
are going to go away, just like we can probably assume URIs aren't going 
to go away any time soon either.

> In particular, this is terrible for machines. If I want the 
> human-readable version of dc:title, or cc:attributionName, I can 
> automatically fetch them using the follow-your-nose principle of RDF. 
> Not so with your ad-hoc proposal.

Could you elaborate on this?

> > Notice how the latter doesn't really give you a broad view of what 
> > XHTML is, whereas the former three examples give multiple pages about 
> > each name, from the official spec, to the wikipedia articles, to 
> > tutorials, etc.
> Sure, but again you're mixing up the use case of humans and machines.

I was specifically responding to your request for a way to get information 
about a keyword, which, at least to me, seems like a problem faced by 
humans, not computers. After all, computers today don't understand 
anything regardless of how much information we give them.

> > However, even if you did want to do this, you could certainly come up 
> > with a consistent mechanism for it.
> Microformats tried to do this, and the result is inconsistent parsing 
> across microformats.

So Microformats didn't do a good job. There's no reason why one couldn't 
come up with a set of rules that was consistent and well-defined.

> >   <span class="annotation.example.org">
> >    <a href="ball.html">My Favorite Ball</a>
> >    by <span class="author.example.com">Dewey</span>,
> >    published by <span class="publisher.example.com">Dog Books</span>
> >   </span>
> This is quite contrived, precludes naturally written HTML, and is good
> only for a small subset of use cases.

To me that seems natural, not contrived. It could be defined to cater for 
all use cases. The key is just having a definition on which you can layer 
any data you want, just like RDFa but without having to use new markup.

> How would you point to the creator of an image? Likely again with some 
> ad-hoc structure that wouldn't be the same parsing model as the example 
> you give above.

The above is just a small example. However, if a community wanted to 
define a single, simple set of rules to mark up data in a consistent way, 
I posit that it would be quite possible to do so within the confines of 
HTML's extension mechanism. Indeed, HTML has a number of features that 
make it even better than RDFa:

 * All the data would be visible, not hidden in attributes
 * One could use mechanisms in HTML like <a> and <time> to mark up
   typed data in a way that the user agent could easily understand and 
   allow user interaction with (e.g. following links)
 * One could use mechanisms in HTML like <ins> and <del>, or <q> and 
   <blockquote>, to add even more expressiveness in a manner compatible 
   with HTML itself, e.g. marking up edits or attribution for data.

> The idea is to have a consistent syntax from which generic metadata can 
> be extracted. The solution you propose is *exactly* what got previous 
> solutions in trouble, with parsers having to be upgraded for every new 
> use case.

So define it in a generic fashion, just using the HTML extension 

> > Why can't you use id="bob"s and href="#bob", along with relevant rel 
> > or class values?
> You can with RDFa using the @about attribute, but with your ad-hoc 
> approach, you'd have to come up with yet another parsing model and 
> contrived element structure for explicitly identified chunks.

I'm sorry but I really don't see why HTML is more contrived than an extra 
set of layered attributes on top of HTML. Using the language's built-in 
semantics where possible instead of building a separate redundant set of 
semantics on top of it seems much less contrived to me.

> >> Your assumption is untrue when you get to the Creative Commons 
> >> community, where lots of organizations and folks care about stating 
> >> how to give them attribution.
> > 
> > Lots do, but more don't, and thus the data will only be usable within 
> > certain walled gardens, just as with the examples I gave above.
> You know this about the Creative Commons community... how?

I'm talking about when you get outside the "community" and into the wider 
world where people are using creative commons without being part of the 
community and sharing the values and ideals you espouse.

> > Consider: Do you not think that lots of organizations and folks care 
> > about giving correct encoding information and Content-Type data?
> Why would they care when it looks okay to them in IE? I think you and I 
> agree that if the user sees no benefit, they won't do anything.

You may be the first person in the W3C I've ever spoken to who doesn't 
think that giving correct encoding information and Content-Type data has 
user-visible benefits!

On Fri, 22 Aug 2008, Ben Adida wrote:
> Silvia Pfeiffer wrote:
> > I would like to understand exactly what changes to the existing HTML5 
> > spec would be required to support RDFa. Ben - can you clarify? Maybe 
> > the extension mechanism that Ian refers to already covers all the 
> > needs, but it has not been clarified.
> Thanks for the straight-forward question Silvia :) This is a very rough 
> answer, not a detailed proposal.

Again, this is skipping three critical steps:

 * What the problem being addressed is.

 * What research shows that it is an important enough problem that it 
   should be addressed.

 * What the requirements are.

Without detailing the above, it's impossible to evaluate the proposal.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list