[whatwg] "Just create a Microformat for it" - thoughts on micro-data topic

Wed May 6 13:26:28 PDT 2009

On Wed, 6 May 2009, Manu Sporny wrote:
> 
> What I am saying is that the amount of due diligence that goes into a 
> particular vocabulary should be determined by the community that will 
> use the vocabulary.

This seems clear, yes, but surely any community wishing to create a 
vocabulary would want to go through the steps you outlined.

> Your position, that the vocabulary author decides the proper amount of 
> due diligence, is rejected in the Microformats community. In the 
> Microformats community, every vocabulary has the same amount of due 
> diligence applied to it.

In the Microformats community, the community is the author.

> So, maybe this requirement should be added to the micro-data 
> requirements list:
> 
> If micro-data is going to succeed, it needs to support a mechanism that 
> provides easy, distributed vocabulary development, publishing and 
> re-use.

Noted as:

     * Creating a custom vocabulary should be relatively easy.
     * Distributed vocabulary development should be possible; it
       should not require coordination through a centralised system.
     * It should be possible to publish and re-use custom
       vocabularies.

> > Surely all of the above apply equally to any RDFa vocabulary just as 
> > it would to _any_ vocabularly, regardless of the underlying syntax?
> 
> Not necessarily...

I am dismayed that anyone would consider developing a language or 
vocabulary without following the steps you outlined. They seem to me to be 
a fundamental cornerstone of any development process.

> > 6: Justifying your design is a key part of any language design effort 
> > also. Not doing this would lead to a language or vocabulary with 
> > unnecessary parts, making it harder to use.
> 
> What happens when the people you're justifying your design to are the
> gatekeepers?

The people to which you should justify your design are your users. They 
are the only real gatekeepers.

> In the Microformats community, this stage, especially if one of the 
> Microformat founders disagrees with your stance, can kill a vocabulary.

It can kill getting "Microformats.org" branding on your vocabulary, just 
like if Tim disagrees with something it can kill W3C branding on your 
language. Why does that matter? Just define your vocabulary elsewhere. 
Microformats just use the "class" attribute, there's no reason that can't 
be done outside Microformats.org (indeed it happens every day as people 
make up random class names for their style sheets).

HTML5 is a poster child for this (W3C said no, so we did it elsewhere). 

> > 7: With any language, part of designing the vocabulary is defining how 
> > to process content that uses it.
> 
> Not if there are clear parsing rules and it's easy to separate the
> vocabulary from the parsing rules.

So XML and RDF vocabulary designers don't have to define schemas?

Parsing rules are a tiny fraction of what you have to define as part of a 
language. Just look at the size of XML vocabulary definitions like SVG or 
XHTML2 or XForms or MathML.

> This should be a requirement for the micro-data solution:
> 
> Separation of concerns between the markup used to express the micro-data 
> (the HTML markup) and the vocabularies used to express the semantics 
> (the micro-data vocabularies).

Noted (Microdata solutions shouldn't change HTML parsing rules).

However, I don't think that's really relevant to point 7 here. Whatever 
mechanism we end up with, it'll be important to define the semantics of 
vocabularies. For example, if something is a "person", it can have a 
"gender", and it can have a "foot", but the "foot" can't have a "gender". 
Vocabularies also need to define how to handle errors (like when someone 
says a foot is female).

> > 9: The most important practical test of a language is the test of 
> > deployment. Getting feedback and writing code is naturally part of 
> > writing a format.
> 
> This statement is vague, so I'm elaborating a bit to cover the possible 
> readings of this statement:
> 
> Writing markup code (ie: HTML) should be a natural part of writing a 
> semantic vocabulary meant to be embedded in HTML.
> 
> Writing parser code (ie: Python, Perl, Ruby, C, etc.) should not be a 
> natural part of writing a semantic vocabulary - they wholly different 
> disciplines. Microformats require you to write both markup code and 
> parser code by design.

A vocabulary that is never processed by anyone is not a useful vocabulary. 
So someone eventually is going to have to process the vocabulary, if it's 
useful. The only sure way to know if processing a vocabulary is possible 
in a sane way is to try to do so, and that means writing code.

Now this code might be just a bunch of queries against an RDF quad store, 
or it could be some complicated C++ app that does something awesome. But 
without writing _something_, you can't really know for sure if the 
language or vocabulary is sane.

You really want to know if your vocabulary is sane long before people 
start investing money in using it, otherwise you'll have wasted their time 
when it turns out that it's not possible to make good use of the data due 
to some fundamental flaw in the vocabulary.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'