[whatwg] Microdata

Sat Aug 22 16:27:11 PDT 2009

On Sat, Aug 22, 2009 at 11:51 PM, Ian Hickson<ian at hixie.ch> wrote:
>
> Based on some of the feedback on Microdata recently, e.g.:
>
>   http://www.jenitennison.com/blog/node/124
>
> ...and a number of e-mails sent to this list and the W3C lists, I am going
> to try some tweaks to the Microdata syntax. Google has kindly offered to
> provide usability testing resources so that we can try a variety of
> different syntaxes and see which one is easiest for authors to understand.
>
> If anyone has any concrete syntax ideas that they would like me to
> consider, please let me know. There's a (pretty low) limit to how many
> syntaxes we can perform usability tests on, though, so I won't be able to
> test every idea.
>

This would be more than just tweaking the syntax, but I think
appropriate to bring forth my CRDF proposal as a suggestion for an
alternative to Microdata. For reference, the latest version of the
document can be found at [1], and the discussion that has happenned
about it can be found at [2].

Rather than just saying "use that syntax", I'm including here what IMO
are the most prominent advantages (and potential issues) of that
proposal, in no particular order:

+ Optional use of selectors: while the ability to use selectors seems
quite useful, specially to handle "list" or "collection" cases, it has
been argued that users may have problems with elaborated selectors.
Since the last update of the CRDF document, this is addressed with the
expanded inline content model: it should possible to express with only
inline CRDF, and without using selectors at all, any semantics that
can be represented with RDFa, Microdata, EASE, or eRDF. In other
words: while CRDF can take full benefit of selectors to make better
and/or clearer documents, it can still handle most cases (those
actually handled by existing solutions) without them.

+ Microformats mapping: for good data (specifically, all content that
doesn't duplicate any "singular" property), CRDF allows trivially
mapping Microformat-marked data to an arbitrary RDF vocabulary (or
even to multiple, overlapping vocabularies), thus allowing its re-use
with RDF-related tools and/or combining it with RDF data from other
sources and/or marked with other syntaxes. In order to achieve 100%
compatibility with Microformats.org' processing model (including any
form of bad data), a minor addition to Selectors is suggested in the
document, although no substantial feedback has been given on it
(neither against nor in favor).

+ Microformats-like but decentralized: the main issue with
Microformats, at least with non-widespread vocabularies, is
centralization: it requires a criticall mass of use-cases to get the
Microformats community to engage in the process of creating a new
vocabulary. With CRDF, any author may build their own vocabulary
(implementing it as a CRDF mapping to RDF) and use it on their pages.
If a vocabulary later gains momentum and is adopted by a wide enough
set of authors, it'd be up to the Microformats community to decide
whether "standarize" it or not.

+ Prefix declarations go out of HTML: After so many discussions,
namespace prefixes has been the main source of criticism against RDFa.
One of these criticism is the range of technicall issues that arise
from the "xmlns:" syntax for defining namespace prefixes (in
"tag-soup" syntax). CRDF handles this case by taking away the
responsibility of prefix declarations from HTML: having a CSS-based
syntax, CRDF takes the obvious step and uses CSS's own syntax for
namespace declarations.

+ Entirely RDF based: while this might seem a purely theoretical
advantage, there is also a practical benefit: once extracted from the
webpage, CRDF data can be easily combined with any already existing
RDF data; and can be used with RDF-related tools.

- Copy-paste brittleness: IMO, the only serious drawback from CRDF;
but there are some points worth making:
  1) When used inline, CRDF can achieve the same resilience than RDFa,
which is quite close to Microdata's.
  2) I have noticed that some browsers can manage to copy-paste
CSS-styled content preserving (most of) format. It shouldn't be hard
for implementors to extend such functionality to CRDF. Of course, the
support for this is not consistent among browsers, and also seems to
vary for different paste targets. If there is some real interest, I
might do some testing with multiple browsers and paste targets (for
now, I have noticed that both IE and FF preserve most CSS formatting
(but not layout) when pasting to Word, but pasting to OOo Writter gets
rendered with the "default" formatting for the tags). It would be
interesting, on this aspect, to hear about browser vendors: would they
be willing to extend the CSS copy-paste capabilities to CRDF if it got
adopted?

- Prefix-based indirection: I'd bet that there are people on this list
ready to argue that namespace prefixes are a good thing; but it seems
that it raises some issues, so I'll include them and share my PoV on
the topic:
  1) For those who care about the use of widespread prefixes (like
"foaf" or "dc") being used for something weird, or the use of
different prefixes for these vocabularies, I wouldn't mind adding some
"default" prefix mappings for CRDF to address this.
  2) The "Follow your nose" topic is a bit more complex: IMO, a RDF
application that needs to successfully FYN to work is insane; OTOH, an
application that works just fine, but can FYN and provide *additional*
information *when available* is quite a good thing. This is an
implementation design issue, and CRDF can't do too much here: the best
thing I can think of is to state that applications should *attempt* to
FYN, and use or show to the user the extra info *when successful*, but
must still be able to use a document's information when FYN fails.
Actually, CRDF is already moving in such direction: the type inference
rules (still under construction) try to infer the properties' types
from the vocabularies first, but take the basic type of the value if
that fails (unless, of course, the CRDF code uses explicit typing).

- Entirely new: this is a minor disadvantage agains RDFa and
Microformats, but not when compared to Microdata. It is a disadvantage
because for already existing formats there are already existing
implementations, and it is minor because it shouldn't be hard for
browsers (and some other forms of UA's that also handle CSS) to
implement it reusing most of their CSS-related code. Again, I'd
appreciate some vendors' feedback on this.

That's what I can think of now. Of course, CRDF has some issues: it's
still work in progress, and it lacks implementations and
implementation feedback, but it also provides significant advantages
that, IMO, far outweigth the drawbacks.

Regards,
Eduard Pascual

[1] http://crdf.dragon-tech.org/crdf.pdf
[2] (multiple links: the threads got split by some reason, and the
archives also break threads at months' boundaries):
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019733.html
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019857.html
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-June/020284.html
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-July/020877.html