[whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

Wed May 20 15:10:28 PDT 2009

On Wed, May 20, 2009 at 11:56 AM, Toby Inkster <mail at tobyinkster.co.uk> wrote:
> Given that one of the objections people cite with RDFa is complexity,
> I'm not sure how this resolves things. It seems twice as complicated to
> me. It creates fewer new attributes, true, but number of attributes
> themselves don't create much confusion.
>
> e.g. which is a simpler syntax:
>
> <a href="http://foo.example.com/"
>   ping="http://tracker.example.com/">Foo</a>
>
> or:
>
> <a href="primary:url('http://foo.example.com/');
>         secondary:url('http://tracker.example.com/');">Foo</a>

I'm not sure how this example is relevant.  Links do one thing and do
it visibly; they benefit from a simple, straightforward syntax and a
proliferation of attributes that have direct meaning.  Any metadata
proposal, on the other hand, has attributes which acquire meaning only
through their values and the vocab being used, and there is a
necessary degree of indirection which makes things more difficult.

It would have actually been useful had the comparison been between
pseudo-CRDF and pseudo-RDFa, or better yet, actual CRDF and RDFa.
That way we have two things which can actually be compared.

> Stuffing multiple discrete pieces of information makes things harder for
> parsing, harder for authoring tools and harder for authors. In RDFa,
> each attribute performs a simple role - e.g. @rel specifies the
> relationship between two resources; @rev specifies the relationship in
> the reverse direction; @content allows you to override the
> human-readable text of an element. Combining these into a single
> attribute would not make things simpler.

You're leaving out @about, @property, @resource, @datatype, @typeof,
and numerous implicit uses of @href or @src, along with with implicit
chaining with contained nodes.  Please don't misrepresent the
simplicity of RDFa - it's a generic metadata extraction method, and is
rather complex.  So is CRDF, of course, but that's not disputed.

(Also, the argument against @rev is still going strong - in the RDFa
in XHTML document, section 6.3.2.2, the foaf:img relation is misused
in @rev, causing the RDF to state that Mark is an image of the <img>
resource!  @rev really is too confusing for standard use - just add
inverted @rel values when necessary.)

> Looking at the comparison given in section 4.2, CRDF appears to suffer
> from several disadvantages compared to RDFa:
>
> 1. It's pretty ugly.

We are going to have to massively disagree on this point.  ^_^  I love
CSS syntax.  It's small, elegant, and simple.  CRDF benefits from all
of this.  Inline CRDF isn't ideal, but it benefits from being
identical to standard CRDF syntax, as well as resembling inline CSS in
@style.

RDFa (and Microdata, to a lesser extent), on the other hand, look like
you invented a half-dozen versions of @style which all do something
different but all have to be used together to style your document.  To
me it looks like the uneditted HTML that Microsoft products will spew
out if you let them.

So, I guess beauty is in the eye of the beholder.  ^_^

> 2. It's more verbose - though only by eleven bytes by my reckoning, so
> this isn't a major issue.

When used inline, it may be.  It's not *intended* to be used inline,
though - that's just there for the occasional case when you absolutely
need to do so, just as @style is available but discouraged in favor of
external CSS.

When used as intended, as a separate CRDF file, you see immediate
savings as soon as you have two things with the same data structure.
I think I'm reasonable in assuming that most users of any metadata
solution will be doing so in medium-to-large quantities, not
individual isolated instances with unique structure.  They can deploy
a single CRDF file across their entire site, automatically allowing
metadata extraction from their content with no further effort.  At
worst, they have to add a few classes, perhaps some <span>s.

> 3. It divorces the CURIE prefix definitions from the use of CURIEs in
> the markup. This makes it more vulnerable to copy-paste problems. (As I
> understand <link rel="metadata"> in the proposal, CURIE prefix
> definitions can even be separated out into an external file. This
> obscures them greatly and will certainly be a cause of copy-paste
> issues!)

If you're using inline CRDF, then yeah, the prefix definitions may be
far from the content.  The prefixes are defined globally for the
document, and may appear anywhere.  In practice, inline CRDF should be
rare, and the prefixes should appear at the top of the .crdf file
where they can be easily seen.

> Apart from the fact that *sometimes* RDFa involves a bit of repetition,
> I don't see what problems this proposal is actually supposed to solve.

You're being disingenuous.  RDFa *always* requires *large* amounts of
verbose repetition whenever you're indicating the same metadata
structure multiple times.  I expect that this type of use will be by
far the most common if metadata embedding takes off as hoped for.
One-shot uses like on bios will be relatively rare (I expect most
metadata-enhanced bios to be those on social networks, where there are
large numbers of them with identical structure, perfect for a
Selector-based approach).

When you do have a one-off use, RDFa and CRDF are very similar in
verbosity.  I find the foo:bar syntax of CRDF easier to use personally
than the @property=foo @content=bar syntax of RDFa (and of course,
when using the elements' default contents, you can shorten the CRDF to
just "foo;" and can omit the @content attribute in RDFa).

> Repetition in practise seems to be something that page authors can deal
> with. We don't provide a mechanism for setting the src or alt attributes
> of multiple <img> elements which need to load the external image; or
> setting the class attribute of the third cell in every row of a table.

It is rarely, if ever, necessary to set multiple <img> elements to the
same @src or @alt.  When it is so, it's virtually always because the
<img> is part of the page template, which can be written once and then
trivially put into each page with any server-side language.  Metadata
structure, on the other hand, when repeated will usually be in the
content of the page, which needs to be individually authored.  If
you're lucky the data is stored in such a way that you can
automatically generate the html and thus the metadata structure as
well, but this is not always the case.

And while we don't ask for any way of setting the class of every third
cell in each row of a table, we *do* ask for a way of setting the
*style* of every third cell in each row of a table, without having to
specify that style on each cell individually.  Even specifying a class
on each cell individually, and then using CSS to target all of the
classed cells at once, is often too onerous, which is why we have
various ways of navigating the structure of an HTML document with
Selectors, such as "tr td:nth(3)".

> So again, while I can see that this proposal would "work", in what way
> is it supposed to be preferable to RDFa?

To reiterate from the points above, CRDF is roughly equally verbose as
RDFa when written inline (though with an imo simpler and easier
syntax), but allows large savings in both bandwidth for users and
cognitive load for authors by reusing the existing familiar syntax and
operation of CSS to apply metadata structure to your documents.

~TJ