[whatwg] RDFa

Fri Aug 22 12:53:49 PDT 2008

[Response to Ian and Henri in one email... but then I saw the other
responses and am breaking out the remainder responses in separate emails.]

Ian Hickson wrote:
> I've whitelisted your e-mail address so that you can post to the WHATWG 
> list without subscribing.

Thanks Ian, I think I unsubscribed a while back when I was busy with
other things, but really I should subscribe at this point, there's no
reason for me to have special status.

> However, if the e-mails on this thread were 
> intended to be a request that the RDFa attributes be considered for HTML5, 
> I must admit to having misunderstood the request.

Though I do think you should consider RDFa attributes in HTML5, I didn't
mean to start this thread just yet (we're in the middle of our
transition to Proposed Rec at W3C for RDFa in XHTML 1.1). I believe it
started when Matt mentioned ccREL and wondered what it would take to
support it in HTML5.

So, heck, not ideal timing on my end, but since the discussion has
begun, let's go for it :)

> I've addressed RDFa only in this message (as opposed to creative commons 
> markup). It would be helpful if you could send a separate message that is 
> specifically asking for the changes you desire

Perfectly reasonable: we'll put together a precise proposal regarding:
(1) what would need to validate, (2) what would browsers be expected to
do, and (3) why we think this is useful.

> That's weird. I wonder why these people are asking Creative Commons for 
> these tools and not asking other communities (e.g. the WHATWG community).

My guess is that people prefer to ask their own, smaller, community for
"best practices" and solutions, rather than try to find the underlying
standard that is the limiting factor. That's one of the reasons we
believe in RDFa: the underlying standard implements a small handful of
attributes, and individual communities get to manage their own
extensions by building, reusing, and extending vocabularies.

> Usually I find that when there is a need, the people with that need 
> approach multiple different groups trying to get their need met.

I'm not sure that applies for folks who are a bit less technical: they
wouldn't approach WHATWG when all they're thinking about is CC +
genomics, for example.

> I'm also curious as to why, if this is so commonly requested, similar 
> features such as hCard and hCalendar have seen limited uptake.

I suspect one of the reasons is insufficient tools to make use of hCard
and hCalendar in ways that are not already served by publishers adding a
vcard or an ical link. And I would venture to say that this is because

(1) tools have to be custom-built for every microformat, because the
syntax varies, and
(2) there's no reusability of data fields across microformats, which
means having each one as a separate XML/CSV download is just fine.

Also, there's no doubt that the data web will remain *much* smaller than
the human web for a while. That doesn't mean it's irrelevant. Even 0.1%
of the web is very big and potentially very useful.

> Indeed, we have design principles that make addressing the needs of 
> small communities an explicit non-goal.

How about adding one feature that will help make many small communities
happy, each in their own way? That's the power of RDF, and the idea
behind RDFa is to enable that distributed innovation within HTML.

> But I haven't seen the level of interest that, say, video or 
> offline Web applications have had. I haven't even seen the level of 
> interest that random HTML elements like <abbr> have received.

Not really comparable. We're trying to enable lots of applications in
the long run, while video and offline are obviously very immediate
short-term features. Also, <abbr> is an existing element, so sure if you
try to kill existing stuff, you're going to get vocal protests.

> The interest in technologies like RDF seems to be almost exclusively from people in the 
> metadata processing space.

Until we have the syntax and then the tools that build on that syntax to
make this more useful to end-users, that statement will remain true,
indeed. But we have to have some foresight into what cool applications
could be built if we just enable a few things, especially given the
interest we're already seeing. Tapping into the power of RDF from within
HTML is, in my opinion, one of those enabling approaches.

> Use a unique name, e.g. include a domain name in the name, as in 
> "license.creativecommons.org" or "home.foaf.w3.org", or use a name you 
> know isn't used because it's an unusual name, e.g. "cc:license".

That doesn't scale (unless you expect people to actually use GUIDs with
timestamps), and it's extremely web-unfriendly, since you can't look up
a concept to figure out what it might mean. The RDF folks figured out
how to do this a while ago. Why not tap into that expertise a bit?

> I honestly don't see significant interest in computer-readable metadata. 

But a lot of folks do, and it would cost HTML5 very little to let us all
co-exist happily :)

> But in any case HTML5 already has extension mechanisms, so the discussion 
> should not be over whether RDFa is worth it or not, the discussion should 
> be over what extension mechanisms RDF needs that HTML5 doesn't provide.

Some problems with existing extension mechanisms:

- no way to make statements about another document (a PDF), etc... in a
way that is *consistent* across different data types.

- no way to relate two chunks of data within a page, e.g. my friend
Alice is the second cousin of my friend Bob.

- no way to build reusable vocabularies.

> The failures of the past have had little to do with the syntax or 
> expression mechanisms. They have to do with users simply not caring.

They don't care because there are no useful tools for them to care
about, because the tools are too difficult to write when you don't have
a standard syntax that's generic enough.

>>> With things like licensing metadata, where the person who benefits the 
>>> most isn't the person who writes the data, users simply aren't going 
>>> to bother doing a good job.
>> That's an incorrect assumption.
> 
> It's a verifiable fact! Just look at metadata like lang="", character 
> encoding information, Content-Type headers, etc. It's so unreliable that 
> any serious system that processes large amounts of data from multiple Web 
> authors always ends up ignoring the metadata (or at best using it as a 
> hint) and using heuristics to determine the real information.

Your assumption is untrue when you get to the Creative Commons
community, where lots of organizations and folks care about stating how
to give them attribution. And certainly we're not the only area where
people want ways to express this data (I'll mention the UK National
Archives again, and folks like Manu Sporny working on audio markup.)

HTML5 should be able to serve smaller communities than "the whole web."
We're asking for a solution that is relevant to *lots* of small
communities, each in their own way.

> But as soon as this kind of thing is applied to people outside the 
> tightnit community, the metadata becomes an utter mess, misused, wrong,
> missing, syntactically incorrect, semantically incorrect, unusable. We 
> have shown time and time again that when metadata mechanisms face the 
> wider Web community, they fail. Ignoring this doesn't make it go away.

You're looking at this in a fundamentally broken way.

We don't need it to be perfect, and we don't need everyone to do the
right thing. We just need to *enable* people to do the right thing. And
we don't need the whole web to do it, either. In other words, maybe this
will never show up on Google's radar for the web as a whole. But it
would be a mistake to conclude that it's not useful to a large number of
folks.

Do you think that everyone will use the Progress Bar the way you intend
them to? No, of course not. But the ones who do use it in the proper way
will get the benefits. Same goes for RDFa.

> Note: I did read the ccREL paper before I wrote the previous message.

Thanks for taking the time to do that, I sincerely appreciate it. I'm
confused as to why you simplified our goal to "making a license
statement", but I'm glad you read the paper :)

Henri Sivonen writes:
> It really isn't HTML5-friendly, since it depends on the namespace mapping context at a node.

Well, we can discuss that part. But that's 10% of the syntax. The rest
is all simple attributes with clear meaning. No change to the elements,
no change to the structure of the HTML document. (And HTML already
ignores extra attributes.) That's pretty close to HTML5-friendly, I think.

Regarding the long discussion of "XML Namespaces." We don't use XML
namespaces. We use CURIEs = Compact URIs. We've chosen to bind them to
xmlns for now, but they are *not* XML namespaces. I disagree with you
strongly on indirection. I don't think this sub-discussion is all that
productive, though.

> If Hixie made a proposal about HTML syntax citing Google's needs, but
> there was something else going on at Google making the syntax moot, I
> think it would be relevant. (I guess metadata aiding
> translate.google.com is the recent example.)

You're claiming that because one of our videos doesn't contain the URL
in its actual content (though it does in its surrounding HTML, which is
all that's needed), then we're contradicting ourselves? That's silly.

I speak for CC in terms of metadata. Let me know where we are
inconsistent, and I'll be sure to fix it. So far, though, you're makign
some incorrect assumptions.

> This doesn't allow you to say things about *another* resource, but
> that's OK, because out-of-band metadata and data often travel their
> separate ways.

It's not okay for us. There are no good ways to embed metadata in media
files that the average user can understand. So we need it in the
enclosing HTML. With our approach, someone can take a chunk of HTML we
give them, and paste it right in their page. We need that chunk of HTML
to carry metadata with it.

> For example, in PDF, do people *really* need all this cruft:

People don't need it, machines do. You keep confusing the two, as if
we're asking people to manually write out the chunk of RDF/XML to which
you refer. That's highly misleading.

Machines need this so the PDF can be traced to an origin URL that
reverse-references it for consistency and some trust in where it's
coming from. Then, the license statement can be automatically parsed by
a tool that can then tell you "hey, make sure to give credit to 'Henri'."

> Copyright is hard. Sprinkling URIs and angle brackets doesn't make
> people grok copyright. RDF adds even more hardness that normal people
> don't grok.

Misleading and irrelevant.

We're not trying to simplify copyright with RDF. We're trying to
simplify copyright with programs that can help users through the process
of licensing their content, reusing content, etc... Our programs use RDF
to do so. People don't need to grok RDF, however.

> I think trying to break complex licenses [...]

Appreciate the feedback, but that's irrelevant to this conversation.
We've got a whole bunch of lawyers and techies who've chosen a direction
for how to help people with copyright, and we think overall we're doing
a reasonable job. We're happy to get your feedback in a more appropriate
forum, of course.

However, I don't think a technical standards group should be discussing
business model / marketing issues as part of its evaluation process.

> If RDFa is considered immutable at this point, I guess HTML5 is put
> in a "take it or leave it" situation. :-/ I'd choose leaving it if
> taking it comes with the qnames-in-content and Namespaces in XML
> baggage.

RDFa in XHTML1.1 is immutable. RDFa in HTML5 is not immutable, though it
would make little sense to change @property to, e.g., @data-property
(and it would make implementation that much harder across versions of
HTML). It might make sense to consider an additional alternative to
@xmlns, which is something we're considering for non-XML HTML.

If we had an attribute-value-only way of defining prefixes, would that
make you happier?

-Ben