[whatwg] Helping people seaching for content filtered by license

Tab Atkins Jr. jackalmage at gmail.com
Wed Jun 10 06:19:55 PDT 2009


On Wed, Jun 10, 2009 at 3:46 AM, Eduard Pascual<herenvardo at gmail.com> wrote:
> On Fri, May 8, 2009 at 9:57 PM, Ian Hickson <ian at hixie.ch> wrote:
>> [...]
>> This has some implications:
>>
>>  - Each unit of content (recipe in this case) must have its own
>>   independent page at a distinct URL. This is actually good practice
>>   anyway today for making content discoverable from search engines, and
>>   it is compatible with what people already do, so this seems fine.
>
> This is, on a wide range of cases, entirely impossible: while it might
> work, and maybe it's even good practice, for contents that can be
> represented on the web as a HTML document, it is not achievable for
> many other formats. Here are some obvious cases:
>
> Pictures (and other media) used on a page: An author might want to
> have protected content, but to allow re-use of some media under
> certain licenses. A good example of this are online media libraries,
> which have a good deal of media available for reuse but obviously
> protect the resources that inherently belong to the site (such as the
> site's own logo and design elements): Having a separate page to
> describe each resource's licensing is not easily achievable, and may
> be completelly out of reach for small sites that handle all their
> content by hand (most prominently, desginer's portfolio sites that
> offer many of their contents under some "attribution" license to
> promote their work).

Even on small sites, though, if they have a picture gallery they
almost certainly have the ability to view each picture individually as
well, usually by clicking on the picture itself.  That's the page
you'd put the license information on.

I think it's fundamentally rare to have a bunch of resources that (a)
*only* exist grouped together on a single page, and (b) need different
licenses.

> Software: I have stated this previously, but here it goes again: just
> like with media, it's impossible to simply put a "<link
> rel=license..." on a msi package or a tarball. Sure, the package
> itself will normally include a file with the text of the corresponding
> license(s), but this doesn't help on making the licensing discoverable
> by search engines and other forms of web crawlers. It looks like I
> should make a page for each of the products (or even each of the
> releases), so I can put the <link> tag there and everybody's happy...
> actually, this makes so much sense that I actually already have such
> pages for each of my release (even if there aren't many as of now);
> but I *can't* put the <link> on them, because my software is under
> more liberal licenses (mostly GPL) than other elements of the page
> (such as the site's logo, appearing everywhere on the page, which is
> CC-BY-NC-ND), and I obviously don't want such contents to appear on
> searches for "images that I can modify and use commercially", for
> example.

As Ian stated, <link rel="license"> does *not* mean "This entire page
is covered under the linked license", but rather "The primary content
of this page is covered under the linked license".  This is different
from preliminary definitions of rel="license", but it's how it is
overwhelmingly used in practice, and so HTML5 redefined it to match.

So, since you already create separate pages for each release, you're
completely fine.  ^_^

> Until now, the best way to approach this need I have seen would be
> RDF's "triple" concept: instead of saying "licensed under Y", I'm
> trying to say "X is licensed under Y", and maybe also "and X2 is
> licensed under Y2", and this is inherently a triple. I am, however,
> open to alternatives (at least on this aspect), as long as they
> provide any benefit other than mere vadilation (which I don't even
> care about anymore, btw) over currently deployed and available
> solutions. I am not sure whether Microdata can handle this case or not
> (after all, it is capable of expressing some RDF triples), but the
> fact is that I can make my content discoverable by google and yahoo
> using CCREL (quite suboptimal, and wouldn't validate on HTML5, but
> would still work), but I can't do so using Microdata (which is also
> suboptimal, would validate on HTML5, but doesn't work anywhere yet).

Of course microdata can handle it.  Assuming a theoretical Microdata
vocab for Creative Commons, you can do it with:

<div item>
  <div itemprop="cc.work">
    foo...
  </div>
  <a itemprop="cc.license"
href="http://creativecommons.org/license/cc-gpl">This work is licensed
under the GNU GPL, version 3 or later</a>
</div>

(You can also separate the license markup from your work by slapping
an id on your work and using @subject on the license link.)

Remember, Microdata and RDF are essentially identical in nearly all
realistic cases, with only a few small differences - namely that
Microdata forms a tree structure rather than a more general graph.
That's rarely relevant, however, and nearly all common metadata
annotations can be done just fine as a tree.

Though, of course, as long as your work was the primary content of the
page, you can skip Microdata entirely and just use @rel=license.
That's what CC does as well - their machine-readable form for the
previously linked license is merely:

<a rel="license"
href="http://creativecommons.org/licenses/GPL/2.0/"><img alt="Creative
Commons License" style="border-width:0"
src="http://i.creativecommons.org/l/GPL/2.0/88x62.png" /></a><br
/>This work is licensed under a <a rel="license"
href="http://creativecommons.org/licenses/GPL/2.0/">Creative Commons
GNU General Public License License</a>.

As long as the concept is nice and simple, we don't have to invoke any
sort of metadata language at all.

~TJ



More information about the whatwg mailing list