[whatwg] Writing authoring tools and validators for custom microdata vocabularies
Henri Sivonen
hsivonen at iki.fi
Wed May 20 03:50:02 PDT 2009
On May 20, 2009, at 10:27, Henri Sivonen wrote:
> However, in order to usefully apply RELAX NG or Schematron to a
> microdata-base infoset, the infoset conversion should turn property
> names into element names. Since XML places arbitrary limitations on
> element names (and element content), this mapping would have exactly
> the same complications as mapping microdata to RDF/XML.
Here's an attempt at mapping microdata to XML:
* Have a root element (it doesn't matter what it's called) with
attribute xml:lang that has the language of the root element of the
HTML document.
* Have a child of root with local name 'title', namespace 'http://purl.org/dc/terms/title'
and content that is the content of HTML <title>
* For each link relation in the document, have a child of root that
has as its local name the ASCII-lowercased rel token (or ALTERNATE-
STYLESHEET for alternate stylesheet), namespace http://www.w3.org/1999/xhtml/vocab#
and no-namespace attribute 'url' that contains the absoluticized
href of the link relation.
* For each <meta name content>, have a child of root with the value
of the name attribute of the <meta> as local name, namespace http://www.w3.org/1999/xhtml/vocab#
and the value of the content attribute as element content. If the
language of the <meta> differs from root, have xml:lang with the
different language.
* For cites, do the link thing analogously to how cites are handled
in the RDF conversion.
* For items and properties:
- map the property name to XML namespace,local pair as follows and
use the result as the element name for the 'property element':
* If itemprop contains a colon: Locate the last # or / whichever
comes last but isn't the last character of the URI. Make the part up
to and including that character the namespace URI and the part after
the local name.
* Otherwise: Namespace is http://www.w3.org/1999/xhtml/custom#
and the propitem token is the local name.
- If value is a URL, put the URL value in an attribute called
'url' on the property element.
- If the value is itself an item, put the value of the item
attribute on the property element in the value of an attribute called
'type' in no namespace.
- Otherwise, put the string value in the content of the property
element and put the language of the property on the xml:lang attribute
of the property element if different from its nearest ancestor xml:lang.
Without actually trying, on the face of things, this kind of mapping
seems tractable to RELAX NG schemas.
And, as mentioned before, this breaks when:
1) The local name becomes non-NCName.
2) textContent in HTML contains non-XML characters
Use the infoset coercion rules for those. However, the Uhhhhhh
notation may be collided, because microdata property names aren't
lowercased.
--
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
More information about the whatwg
mailing list