[whatwg] Allowing authors to keep track of where content originates

Wed May 6 16:07:38 PDT 2009

One of the use cases I collected from the e-mails sent in over the past 
few months was the following:

   USE CASE: Allow authors to keep track of where content originates.

   SCENARIOS:
     * A blog, say htmlfive.net, copies content wholesale from another, say
       blog.whatwg.org (as permitted and encouraged by the license). The
       author of the original content would like the reader of the reproduced
       content to know the provenance of the content. The reader would like
       to find the original blog post so he can leave comments for the
       original author.
     * Chaals could improve the Opera intranet if he had a mechanism for
       identifying the original source of various parts of a page, as that
       would let him contact the original author quickly to report problems
       or request changes.

   REQUIREMENTS:
     * Parsing rules should be unambiguous.
     * Should not require changes to HTML5 parsing rules.

The two scenarios are subtly different, so I'm going to handle them 
separately.

First, the blog syndication scenario:

     * A blog, say htmlfive.net, copies content wholesale from another, say
       blog.whatwg.org (as permitted and encouraged by the license). The
       author of the original content would like the reader of the reproduced
       content to know the provenance of the content. The reader would like
       to find the original blog post so he can leave comments for the
       original author.

This case is relatively easy: the original author need but ask for the 
editor of the syndicating site to include a link to the original content. 
If the editor isn't willing to do this, then there's nothing at the HTML 
language level that we can do to force him. In practice, with htmlfive.net 
syndicating blog.whatwg.org content, the editor of the former happily 
agreed to include a link to the original blog, and does so. The current 
setup doesn't link to the original article, but the titles aren't changed, 
so an author can relatively easily find the original content.

Similarly, "Planet"-style syndicators include links to the original 
entries, so this is already possible.

The odds of syndicators including these links can be improved a little by 
putting the link explicitly in the post markup in the feed, since 
typically syndicators just display the feeds verbatim.

This doesn't require any new parsing at all, so the requirements are met 
too.

Next, the mashup page:

     * Chaals could improve the Opera intranet if he had a mechanism for
       identifying the original source of various parts of a page, as that
       would let him contact the original author quickly to report problems
       or request changes.

Since this is an intranet, I again assume that we can rely on the authors 
and editors to cooperate.

HTML4 had a solution to this, the cite="" attribute on <blockquote> or 
<q>. Within a controlled environment, this can be used quite well, as Mark 
showed in late 2002. However, using <blockquote> for mashups is a bit 
weird, and not really in the spirit of the <blockquote> tag (though 
probably in the letter, admittedly). So I've added cite="" to the 
<section> and <article> elements, so that mashup authors can more easily 
keep track of where the sections come from.

The requirements collected as part of this effort for these scenarios are:

     * Parsing rules should be unambiguous.

The parsing rules here are the same as for <blockquote cite="">, which is 
very well-defined at this point.

     * Should not require changes to HTML5 parsing rules.

This doesn't affect any of the parsing rules.

In conclusion, this use case can be addressed with a combination of 
discussion with editors, including explicit links using <a href="">, and 
using the new cite="" attribute on <section> and <article>.

A number of further use cases remain to be examined. I will send further 
e-mail hopefully this week as I address them.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'