[whatwg] Creative Commons Rights Expression Language

Thu Aug 21 11:53:18 PDT 2008

Hi folks,

Dan, thanks for looping me in on this thread. This is a rather long
email, but I'm trying to address all comments so far at once.

I appreciate all of the comments. I encourage folks who are commenting
on these issues to read (or at least skim) the ccREL document:

  http://www.w3.org/Submission/2008/SUBM-ccREL-20080501/

It explains exactly our thinking, why we went with RDFa, and why a
number of the alternatives proposed are insufficient for our needs (and
the needs of a number of other web publishers.)

Tab Atkins Jr. wrote:
> The whole thing would be best expressed as a microformat, as the
> entire thing can be made just as machine- and human-readable without
> having to introduce an entire new addition to html.  I think someone
> is a little confused about the important of CC...

Unfortunately, microformats do not give us the modularity we need. We
want to build tools that can answer the question "is this *item*,
entitled 'Sunset in Hawaii', usable for commercial purposes?" The
*item*, identified by a URL or present inline as a chunk of HTML, may be
a video, a photo, a song, a word document, geo-location information, or
any other type of data we haven't thought of yet.

Microformats do not offer the ability to build such generically
extensible vocabularies, because the parsing rules differ from one
microformat to the next. So, no way to define robust fields, e.g.
attribution_name, that work across microformats.

We're also concerned about how to mark up scientific data (genomes,
proteins, etc...) with licensing and other information, and it would be
unworkable to push that vocabulary standardization through the uF
community: you need the bioinformatics experts to do that, while we, CC,
define some of the legal options.

Modularity is key. Microformats are super interesting for simple data
sets, but they don't scale to our needs, and they're not very friendly
to distributed innovation, which is the very nature of the web, IMO.

RDFa is heavily inspired by microformats, and we give the uF community
significant credit for pushing forward a number of very important design
principles, e.g. DRY. We needed more, though: modularity, consistent
parsing across *all* vocabularies, the ability to refer to external
documents, and a few more interesting details.

It's worth noting that, when Creative Commons first started the ccREL
work (in 2004), we privately approached the microformats community to
see if there was openness towards this increased modularity and more
web-like, decentralized design. The answer was an emphatic no.

So we found folks who had needs similar to ours, coalesced at the W3C to
create a standard, kept an open mailing list and accepted all comments,
and produced RDFa through a wide-ranging collaboration.

Ian Hickson wrote:
> I don't think anyone is suggesting that all such ideas should go through 
> the Microformats community.

As far as I understand, that is in fact exactly what the microformat
community requests, in order to prevent vocabulary collisions and
enforce some minimal consistency across vocabularies.

> What is being suggested is that instead of 
> adding more features to HTML, the people who want to annotate their HTML 
> documents with metadata, like Creative Commons, merely use some of the 
> many existing HTML extension mechanisms, like class="", rel="", etc.

Not sufficient, and not for lack of trying either. Check out the ccREL
paper for more details. And note that we helped create a standard that
would serve *everyone*, not just Creative Commons, with very few
additional HTML attributes.

> Microformats.org has shown several things; one is that it is important to 
> actually make sure the problem you are solving is one that needs solving, 
> another is that it is possible to use the existing HTML extension 
> mechanisms to mark up very rich semantic data.

There are a number of folks who aren't served by the direction of
microformats (and there are plenty who are, of course.) Who gets to
decide which problems need solving?

We had a problem we needed solved, and the microformats community didn't
see it as relevant to their scope. Perfectly reasonable, of course, but
that doesn't mean there was no problem to be solved. In fact, now that
we have solved it for our "founding" community of users, we're seeing
interest from *lots* of web publishers who find the expressiveness of
RDFa more adequate for their needs.

That doesn't mean microformats are invalidated. It just means some
problems need a different solution.

I'm pretty sure the same driving force is behind your efforts on HTML5.
On a lot of issues, a small group of folks, sometimes just you, have
decided that a certain feature is necessary, and you've added it to the
HTML5 definition with public debate and discussion. I'm pretty sure a
lot of folks disagree with some of the features you've added, and I'm
also sure the features you're adding are important to a large set of users.

How is RDFa any different? It may be a smaller community, but our needs
are still relevant and important. Not to mention that our design
approach was specifically tailored to be HTML5-friendly. (You could
argue we didn't do a 100% perfect job on that front, but we certainly
tried and succeeded relatively well, IMO.)

Henri Sivonen writes:
> The RDFa spec doesn't make any additions to HTML. It only specifies
> additions to XHTML,

Yes, because XHTML is extensible while HTML is not. We tried to specify
a syntax that would be quite easily added to HTML5.

> and those additions use a Namespace-dependent
> anti-pattern, so they aren't portable to HTML.

Namespaces are an anti-pattern, really? Says who? The web is inherently
namespaced. Everything you go to is scoped to a URL prefix. There isn't
one "Paris" or one "New York," there is wikipedia/paris, and
nyc.gov/NewYork. So is it the ":" that bothers you? Is that really relevant?

Just look at what microformats are forced to do, which is effectively
re-inventing ad-hoc namespaces with "-" separators.

The "namespaces are bad" argument is the most mind-boggling web-tech
meme I've seen in a while.

> It seems to me that the Creative Commons community has more pressing
> needs that aren't related to RDF syntax. Specifically: Making people
> to refer to license URI at all,

We don't have that problem. We have 150M+ pages that link to the
Creative Commons URIs. We did have the issue of how people could specify
attribution information, e.g. please say "Ben Adida", not "Benjamin
Adida." Now, with ccREL, we have a solution.

> making them to identify which CC
> license they mean, making them understand what permissions they are
> giving irrevocably to others upon granting a license and making them
> understand what licenses used by others mean (NonCommercial,
> anyone?). Syntax doesn't solve any of these.

I appreciate the strategy advice, but let's stick to the tech. I don't
think it would be relevant to question Google's business plan when Ian
makes a tech proposal :)

> People don't know what they are doing when they flip those Flickr
> settings: 
> http://diveintomark.org/archives/2008/02/05/writing-with-ease#comment-11272

People make mistakes. Even people who use Creative Commons. We are not
attempting to achieve world peace. And we work very hard to learn from
these fairly rare cases.

> At least in a non-RDF context, pointing to the license by URI seems
> too hard. See 
> http://intertwingly.net/blog/2008/02/09/Mashups-Smashups#c1202660852

So, interestingly, one of the important goals of ccREL is to make it
*easier* to point to the right attribution URL and license URL by
letting tools do it for you automatically, e.g. your blogging tool can
figure it all out on its own, with *very* little code and no change to
the general model of publishing a simple, pre-baked HTML doc.

> Also note that even CC leadership omits the license URI.

So you want a URI in the video content itself? What good would that do?

With ccREL (and specifically RDFa), the surrounding HTML can easily say
"*this* video is licensed under *that* license."

Again, the problems you cite are the ones we're trying to solve with
ccREL, by enabling machines to be more helpful.

> Getting back to the comment thread on intertwingly.net, a later
> comment contained this gem: 
> http://intertwingly.net/blog/2008/02/09/Mashups-Smashups#c1202810109 
> My sarcasm detector isn't quite working, so I can't tell if the
> comment was *meant* to mock RDF, but the follow-up comment is spot
> on: 
> http://intertwingly.net/blog/2008/02/09/Mashups-Smashups#c1202870522

I think your argument is "copyright is hard, so RDF sucks."

Lots of things about RDF are complicated, and lots of things about
copyright are complicated. I'd say that Creative Commons has helped make
copyright *easier* to understand, not harder, though of course there are
cases where we have failed and where we're trying to improve.

Now, what does that have to do with expressing user intent in
machine-readable language, exactly? Is it harder to understand copyright
*because* of RDF and RDFa? I don't think so. I don't think those two
things are even related.

The point of ccREL and RDFa is to help express, in a machine-readable
way, the act of copyright licensing, attribution, and such. It's meant
to make machines helpful in expressing and interpreting these statements.

So, we educate folks about what their CC choices mean, and then we make
it easy for them to generate lawyer-readable documents and
machine-readable documents. But a user doesn't need to understand RDFa,
just like they don't need to understand the deep legal contract. All
they do is copy-and-paste some HTML, or use their blogging tools.

[.. a number of comments regarding the specifics of the RDFa syntax ...]

We discussed the syntax in a public group, and we came to consensus. I
don't see that you raised any issues or comments until 2 weeks ago,
which was long past our deadline for comments.

You mentioned the TAG: they raised issues, which we answered to their
satisfaction. Check out our issue tracker:

  http://www.w3.org/2006/07/SWD/track/

(By the way, if you care about the TAG, then there are a handful of
serious problems with HTML5, regarding follow your nose with URIs....
but I digress.)

There could always be an alternate syntax, but the one we have was
obtained through an open process of consensus. I suspect the same holds
true for HTML5: lots of options, pick one that works and is relatively
clean, and form consensus.

-Ben