[whatwg] Writing authoring tools and validators for custom microdata vocabularies
Henri Sivonen
hsivonen at iki.fi
Wed May 20 00:27:10 PDT 2009
On May 20, 2009, at 04:36, Ian Hickson wrote:
> REQUIREMENTS:
> * There should be a definitive location for vocabularies.
If this means that vocabulary schemas should live in a predestined URI
subspace, I'm inclined to disagree with this requirement, because
1) for non-predefined vocabularies it would leave vocabulary
definition as decentralized but would make schemas centralized, which
doesn't make sense
2) for predefined vocabularies it would create a single point of
failure by elevating a given dereferencable URI to a special status.
> * It should be possible for vocabularies to describe other
> vocabularies.
I disagree with this requirement. Being able to define a schema
language in microdata is sufficiently different from other microdata
use cases that addressing this requirement could have adverse
complicating effects on other use cases. Furthermore, it is completely
unclear why schemas would need to be embedded in HTML pages.
> * Originating vocabulary documents should be discoverable.
Does this mean something like xsi:schemaLocation? I thought that the
RELAX NG community had debunked this as an anti-pattern for all other
cases except for use cases analogous to the Emacs modeline (i.e.
giving a generic XML editor a path to a *local* schema file in order
to choose autocompletion rules on a per-document basis). See http://www.imc.org/ietf-xml-use/mail-archive/msg00217.html
> * Machine-readable vocabulary information shouldn't be on a
> separate
> page than the human-readable explanation.
Why is this a requirement? It seems like a radical departure from the
practice of having DTD / XSD / RELAX NG schemas in addition to spec
prose in HTML or PDF.
> * There must not be restrictions on the possible ways
> vocabularies can
> be expressed (e.g. the way DTDs restricted possible grammars
> in SGML).
This seems to preclude any generic schema language as the One True
schema language.
> For other vocabularies, I recommend using RDFS and OWL, and having the
> tools support microdata as a serialisation of RDF.
I'm inclined to think this recommendation may not be the best one.
It seems that RDFS or OWL are obviously applicable to the result of
microdata to RDF conversion. However, RDFS and OWL are designed for
the RDF model, which is more general than the microdata model. Since
the microdata model is an array of trees (which may be considered one
big tree with the root being of a different type than the other
nodes), it would make sense--on the high level--to apply the same
techniques one would apply with XML trees: tree automata (like RELAX
NG), assertions on trees (like Schematron) and custom code operating
on trees.
While it would be possible to make new schema languages for microdata
applying the ideas from RELAX NG and Schematron, it would be easier to
use off-the-shelf RELAX NG and Schematron tools and to map microdata
to an XML infoset for validation. However, in order to usefully apply
RELAX NG or Schematron to a microdata-base infoset, the infoset
conversion should turn property names into element names. Since XML
places arbitrary limitations on element names (and element content),
this mapping would have exactly the same complications as mapping
microdata to RDF/XML.
--
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
More information about the whatwg
mailing list