[whatwg] Microdata

Sat Aug 22 22:13:45 PDT 2009

On Saturday, August 22, 2009, Eduard Pascual <herenvardo at gmail.com> wrote:
> On Sat, Aug 22, 2009 at 11:51 PM, Ian Hickson<ian at hixie.ch> wrote:
>>
>> Based on some of the feedback on Microdata recently, e.g.:
>>
>>   http://www.jenitennison.com/blog/node/124
>>
>> ...and a number of e-mails sent to this list and the W3C lists, I am going
>> to try some tweaks to the Microdata syntax. Google has kindly offered to
>> provide usability testing resources so that we can try a variety of
>> different syntaxes and see which one is easiest for authors to understand.
>>
>> If anyone has any concrete syntax ideas that they would like me to
>> consider, please let me know. There's a (pretty low) limit to how many
>> syntaxes we can perform usability tests on, though, so I won't be able to
>> test every idea.
>>
>
> This would be more than just tweaking the syntax, but I think
> appropriate to bring forth my CRDF proposal as a suggestion for an
> alternative to Microdata. For reference, the latest version of the
> document can be found at [1], and the discussion that has happenned
> about it can be found at [2].
>
> Rather than just saying "use that syntax", I'm including here what IMO
> are the most prominent advantages (and potential issues) of that
> proposal, in no particular order:
>
> + Optional use of selectors: while the ability to use selectors seems
> quite useful, specially to handle "list" or "collection" cases, it has
> been argued that users may have problems with elaborated selectors.
> Since the last update of the CRDF document, this is addressed with the
> expanded inline content model: it should possible to express with only
> inline CRDF, and without using selectors at all, any semantics that
> can be represented with RDFa, Microdata, EASE, or eRDF. In other
> words: while CRDF can take full benefit of selectors to make better
> and/or clearer documents, it can still handle most cases (those
> actually handled by existing solutions) without them.
>
> + Microformats mapping: for good data (specifically, all content that
> doesn't duplicate any "singular" property), CRDF allows trivially
> mapping Microformat-marked data to an arbitrary RDF vocabulary (or
> even to multiple, overlapping vocabularies), thus allowing its re-use
> with RDF-related tools and/or combining it with RDF data from other
> sources and/or marked with other syntaxes. In order to achieve 100%
> compatibility with Microformats.org' processing model (including any
> form of bad data), a minor addition to Selectors is suggested in the
> document, although no substantial feedback has been given on it
> (neither against nor in favor).
>
> + Microformats-like but decentralized: the main issue with
> Microformats, at least with non-widespread vocabularies, is
> centralization: it requires a criticall mass of use-cases to get the
> Microformats community to engage in the process of creating a new
> vocabulary. With CRDF, any author may build their own vocabulary
> (implementing it as a CRDF mapping to RDF) and use it on their pages.
> If a vocabulary later gains momentum and is adopted by a wide enough
> set of authors, it'd be up to the Microformats community to decide
> whether "standarize" it or not.
>
> + Prefix declarations go out of HTML: After so many discussions,
> namespace prefixes has been the main source of criticism against RDFa.
> One of these criticism is the range of technicall issues that arise
> from the "xmlns:" syntax for defining namespace prefixes (in
> "tag-soup" syntax). CRDF handles this case by taking away the
> responsibility of prefix declarations from HTML: having a CSS-based
> syntax, CRDF takes the obvious step and uses CSS's own syntax for
> namespace declarations.
>
> + Entirely RDF based: while this might seem a purely theoretical
> advantage, there is also a practical benefit: once extracted from the
> webpage, CRDF data can be easily combined with any already existing
> RDF data; and can be used with RDF-related tools.
>
> - Copy-paste brittleness: IMO, the only serious drawback from CRDF;
> but there are some points worth making:
>   1) When used inline, CRDF can achieve the same resilience than RDFa,
> which is quite close to Microdata's.
>   2) I have noticed that some browsers can manage to copy-paste
> CSS-styled content preserving (most of) format. It shouldn't be hard
> for implementors to extend such functionality to CRDF. Of course, the
> support for this is not consistent among browsers, and also seems to
> vary for different paste targets. If there is some real interest, I
> might do some testing with multiple browsers and paste targets (for
> now, I have noticed that both IE and FF preserve most CSS formatting
> (but not layout) when pasting to Word, but pasting to OOo Writter gets
> rendered with the "default" formatting for the tags). It would be
> interesting, on this aspect, to hear about browser vendors: would they
> be willing to extend the CSS copy-paste capabilities to CRDF if it got
> adopted?
>
> - Prefix-based indirection: I'd bet that there are people on this list
> ready to argue that namespace prefixes are a good thing; but it seems
> that it raises some issues, so I'll include them and share my PoV on
> the topic:
>   1) For those who care about the use of widespread prefixes (like
> "foaf" or "dc") being used for something weird, or the use of
> different prefixes for these vocabularies, I wouldn't mind adding some
> "default" prefix mappings for CRDF to address this.
>   2) The "Follow your nose" topic is a bit more complex: IMO, a RDF
> application that needs to successfully FYN to work is insane; OTOH, an
> application that works just fine, but can FYN and provide *additional*
> information *when available* is quite a good thing. This is an
> implementation design issue, and CRDF can't do too much here: the best
> thing I can think of is to state that applications should *attempt* to
> FYN, and use or show to the user the extra info *when successful*, but
> must still be able to use a document's information when FYN fails.
> Actually, CRDF is already moving in such direction: the type inference
> rules (still under construction) try to infer the properties' types
> from the vocabularies first, but take the basic type of the value if
> that fails (unless, of course, the CRDF code uses explicit typing).
>
> - Entirely new: this is a minor disadvantage agains RDFa and
> Microformats, but not when compared to Microdata. It is a disadvantage
> because for already existing formats there are already existing
> implementations, and it is minor because it shouldn't be hard for
> browsers (and some other forms of UA's that also handle CSS) to
> implement it reusing most of their CSS-related code. Again, I'd
> appreciate some vendors' feedback on this.
>
> That's what I can think of now. Of course, CRDF has some issues: it's
> still work in progress, and it lacks implementations and
> implementation feedback, but it also provides significant advantages
> that, IMO, far outweigth the drawbacks.
>
> Regards,
> Eduard Pascual
>
> [1] http://crdf.dragon-tech.org/crdf.pdf
> [2] (multiple links: the threads got split by some reason, and the
> archives also break threads at months' boundaries):
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019733.html
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019857.html
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-June/020284.html
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-July/020877.html
>

-- 
Edward O'Connor