[whatwg] Annotating structured data that HTML has no semanticsfor

Fri May 15 07:17:43 PDT 2009

On Fri, May 15, 2009 at 1:44 PM, Kristof Zelechovski
<giecrilj at stegny.2a.pl> wrote:
> I do not think anybody in WHATWG hates the CURIE tool; however, the
> following problems have been put forward:
>
> Copy-Paste
>        The CURIE mechanism is considered inconvenient because is not
> copy-paste-resilient, and the associated risk is that semantic elements
> would randomly change their meaning.
Copy-paste issues with RDFa and similar syntaxes can take two forms:
The first is horfaned prefixes: when metadata with a given prefix is
copied, but then it's pasted in a context where the prefix is not
defined. If the user that is copy-pasting this stuff really cares
about metadata, s/he would review the code and make the relevant fixes
and/or copy the prefix declarations; the same way when an author is
copy-pasting content and wants to preserve formatting s/he would copy
the CSS stuff. If the user doesn't actually care about the metadata,
then there is no harm, because properties relying on an unmapped
prefix should yield no RDF output at all.
The second form is prefix clashes: this is actually extremely rare.
For example, someone copies code with FOAF metadata, and then pastes
it on another page: which are the chances that user will be using a
foaf: prefix for something else than FOAF? Sure, there are cases where
a clash might happen but, again, these are only likely to appear on
pages by authors who have some idea about metadata, and hence the
author is more than likely to review the code being pasted to prevent
these and other clashes (such as classes that would mean something
completelly different under the new page's CSS code, element id
clashes, etc). A last possibility is that the author doesn't have any
idea about metadata at all, but is using a CMS that relies on
metadata. In such case, it would be wise on the CMS's part to
pre-process code fragments and either map the prefix to what they mean
(if it's obvious) or remove the invalid data (if the CMS can't figure
out what it should mean).

>
> Link rot
>        CURIE definitions can only be looked up while the CURIE server is
> providing them; the chance of the URL becoming broken is high for
> home-brewed vocabularies.  While the vocabularies can be moved elsewhere, it
> will not always be possible to create a redirect.

Oh, and do reversed domains help at all with this? Ok, with CURIEs
there is a (relatively small) chance for the CURIE to not be
resolvable at a given time; reversed domains have a 100% chance to not
be resolvable at any time: there is always, at least, ambiguity: does
org.example.foo map to foo.example.org, example.org/foo, or
example.org#foo? Even better: what if, under example.org we find a
vocabulary at example.org/foo and another at foo.example.org? (Ok,
that'd be quite unwise, although it might be a legitimate way to keep
"deployed" and "test" versions of a vocabulary online at a time; but
anyway CURIEs can cope with it, while reversed domains can't).
Wherever there are links, there is a chance for broken links: that's
part of the nature of links, and the evolving nature of the web. But,
just because the chance of links being broken, would you deny the
utility of elements such as <a> and <link>? Reversed domains don't
face broken links because they are simply uncapable to link to
anything.

Now, please, I'd appreciate if you reviewed your "arguments" before
posting them: while the copy-paste issue is a legitimate argument, and
now we can consider whether this copy-paste-resilience is worth the
costs of microdata, that link rot argument is just a waste of
everyone's time, including yours. Anyway, thanks for that first
argument: that's exactly what I was asking for in the hope of letting
this discussion advance somewhere.

So, before we start comparing benefits against costs, can someone post
anymore benefits or does the "copy-paste-resilience" point stand alone
against all the costs and possible issues?

Regards,
Eduard Pascual