[whatwg] Writing authoring tools and validators for custom microdata vocabularies
ian at hixie.ch
Tue May 19 18:36:33 PDT 2009
One of the use cases I collected from the e-mails sent in over the past
few months was the following:
USE CASE: It should be possible to write generalized validators and
authoring tools for the annotations described in the previous use case.
* Mary would like to write a generalized software tool to help page
authors express micro-data. One of the features that she would like to
include is one that displays authoring information, such as vocabulary
term description, type information, range information, and other
vocabulary term attributes in-line so that authors have a better
understanding of the vocabularies that they're using.
* John would like to ensure that his indexing software only stores
type-valid data. Part of the mechanism that he uses to check the
incoming micro-data stream is type information that is embedded in the
vocabularies that he uses.
* Steve, would like to provide warnings to the authors that use his
vocabulary that certain vocabulary terms are experimental and may
never become stable.
* There should be a definitive location for vocabularies.
* It should be possible for vocabularies to describe other vocabularies.
* Originating vocabulary documents should be discoverable.
* Machine-readable vocabulary information shouldn't be on a separate
page than the human-readable explanation.
* There must not be restrictions on the possible ways vocabularies can
be expressed (e.g. the way DTDs restricted possible grammars in SGML).
* Parsing rules should be unambiguous.
* Should not require changes to HTML5 parsing rules.
I couldn't find a good solution to this problem.
The obvious solution is to use a schema language, such as RDFS or OWL.
Indeed, that's probably the only solution that I can recommend. However,
as we discovered with HTML5, schema languages aren't expressive enough. I
wouldn't be surprised to find that no existing schema could accurately
describe the complete set of requirements that apply to the vCard, vEvent,
and BibTeX vocabularies (though I haven't checked if this is the case).
For any widely used vocabulary, I think the best solution will be
hard-coded constraints and context-sensitive help systems, as we have for
HTML5 validators and HTML editors.
For other vocabularies, I recommend using RDFS and OWL, and having the
tools support microdata as a serialisation of RDF. Microdata itself could
probably be used to express the constraints, though possibly not directly
in RDFS and OWL if these use features that microdata doesn't currently
expose (like typed properties).
Regarding some of the requirements, I actually disagree that they are
desireable. For example, having a definitive location for vocabularies has
been shown to be a bad idea for scalability, with the W3C experiencing
huge download volume for certain schemas. Similarly, I don't think that
the "turtles all the way down" approach of describing vocabularies using
the same syntax as the definition is about (self-hosted schemas) is
necessary or, frankly, particularly useful to the end-user (though it may
have nice theoretical properties).
In conclusion: I recommend using an existing RDF-based schema language in
conjunction with the mapping of microdata to RDF. Implementation
experience with how this actually works in practice in end-user schenarios
would be very useful in determining if something more is needed here.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg