[whatwg] Considering a lang- attribute prefix for machine translation and intelligibility
Charles Pritchard
chuck at jumis.com
Wed May 2 09:59:36 PDT 2012
There has been some discussion on the w3c/whatwg mailing lists about how
far we can mark up content with linguistic tags, such as marking word
and/or sentence boundaries.
In my authoring of web apps, I often write a short manual into a hidden
div, so that the vocabulary of my application can be processed by
translation services such as Google translate. Having content in the DOM
seems the most appropriate way to handle translation.
I'd like the group to consider the costs/benefits/alternatives to a
"lang-" attribute.
Such as <span lang-role="sentence">This is a sentence.</span>
The data- and aria- attributes have worked out well. We may want to make
room for one more.
Such a structure could be used to markup typical subject/object/verb and
clause sections; it could also be used to markup poetic texts as well as
defined meanings of content.
http://www.omegawiki.org/Expression:orange
This is an <span lang-meaning="DefinedMeaning:orange_(5821)">orange</span>.
Now this, this is <span
lang-meaning="DefinedMeaning:orange_(5822)">orange</span>.
In most cases there's no need to define sentence boundary, meaning or
otherwise. But, it'd sure be nice to have the ability to do so in a
standard manner.
I'd recommend role, meaning and prosody/pronunciation as the primary
targets. Character markup may be something to consider as it's come up
in SVG (rotate) and in CSS before. Doing a span for each character is
not practical, so we'd want a shorthand much as SVG has shorthand for
rotate.
-Charles
More information about the whatwg
mailing list