[whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

Sat May 23 14:23:59 PDT 2009

On Fri, May 22, 2009 at 5:26 AM, Eduard Pascual <herenvardo at gmail.com> wrote:
> On Thu, May 21, 2009 at 5:19 PM, Toby Inkster <mail at tobyinkster.co.uk> wrote:
>> [... some stuff about how will English change in a thousand years ...]
>>
>> A great help in clarifying your usage of terms is the inclusion of a
>> glossary. For example, I could write:
>>
>> <dl>
>>  <dt>name</dt>
>>  <dd>
>>    A name is a label for a noun, (human or animal,
>>    thing, place, product [as in a brand name] and even an
>>    idea or concept), normally used to distinguish one from
>>    another.
>>    (<a href="http://en.wikipedia.org/wiki/Name">source</a>)
>>  </dd>
>> </dl>
>>
>> With RDFa, the idea of a glossary can be used to reduce our reliance on
>> external vocabularies:
>>
>> <dl xmlns:foaf="http://xmlns.com/foaf/0.1/"
>>    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
>>  <dt about="[foaf:name]" property="rdfs:label">name</dt>
>>  <dd about="[foaf:name]" property="rdfs:comment" datatype="">
>>    A name is a label for a noun, (human or animal,
>>    thing, place, product [as in a brand name] and even an
>>    idea or concept), normally used to distinguish one from
>>    another.
>>    (<a rel="rdfs:seeAlso"
>>    href="http://en.wikipedia.org/wiki/Name">source</a>)
>>  </dd>
>> </dl>
>>
>> This doesn't completely eliminate the risk, but goes a long way to
>> mitigating it.
> Agreed. But CRDF would also allow that kind of glossary. What's your
> point with it?

To be more specific, this *sounds* like you're just generally
advocating for a referencable external vocabulary.  CRDF serializes
out to normal RDF without any magic, same as RDFa, and it uses
prefixes in essentially the same manner (though in a way that I
believe is slightly more compatible with the concerns raised by
Anne/Henri/others).

This is perhaps an argument against Microdata, but not CRDF.

> Again, let me insist that external file CRDF is only one of its
> possible usages. Actually, it only makes sense when it holds rules
> that apply to multiple documents (otherwise, <script> or inline uses
> would work better). If an author is already caring about keeping
> several documents live, then keeping one extra .crdf file live as well
> shouldn't be too difficult.
> Please, don't be missguided by Tab's "favoritism" towards external
> .crdf files. While they are a useful tool for some of the cases, they
> do not cover all the cases. <script> and inline uses are equally
> important and; IMO, one of the strongest points of CRDF is that it
> provides a unified syntax for all three usages, rather than having to
> rely on different formats for each thing (for example, using RDFa for
> inline stuff and EASE for external stuff would be, on the best case,
> messy).

Heh, guilty as charged.  While I strongly support keeping data in the
content, I also strongly support keeping the rules for *extracting*
that data out of the content.  My reasoning is basically identical to
the reasons given for avoiding @style and inline event handlers when
possible - it bloats the markup and is error-prone if used more than
once.  At minimum, I'd prefer just @class-ing my content and then
putting the CRDF in a <script> tag, but I'd typically prefer <link>ing
in a separate .crdf file, the same way I prefer <link>ing in a
separate .css file in my pages, even if it's only applying styles
relevant to that specific page.  It just keeps the page itself lighter
and easier to read.

The argument against putting the data extraction rules outside of the
document because the rule document may disappear is a weak one, I
think.  Ultimately, the page itself contains the data, and this can
always be extracted heuristically given sufficient intelligence (this
is, of course, a very large 'given').  Like Eduard said, just as CSS
takes the basic content of the page and formats it to be easier for
humans to read, CRDF takes the basic content of the page and formats
it to be easier for machines to read.  Without the CRDF the data is
unharmed and still extractable, it's just more difficult, just as your
average page is more difficult, but not impossible, to read with its
CSS missing.  Worst-case, you just need a human to interpret the data
on the page, which is what is necessary for legacy or noncompliant UAs
anyway.  Again, any data extraction proposal is ultimately
unnecessary; it merely makes it more convenient for humans by giving
some hints to their machines.  That's the whole reason we've focused
on data extraction languages, rather than just linking/embedding the
metadata itself into the document, just like we've focused on CSS to
provide layout/styling to a sensical-by-itself document, rather than
building the layout/styling directly into the document language.

~TJ