[whatwg] Annotating structured data that HTML has no semantics for

Mon May 11 20:06:14 PDT 2009

A cursory glance on the new section 5 raises two questions on  
indirection:

> (Note the <meta>s in the last example -- since sometimes the  
> information
> isn't visible, rather than requiring that people put it in and hide it
> with display:none, which has a rather poor accessibility story, I  
> figured
> we could just allow <meta> anywhere, if it has a property=""  
> attribute.)

That seems to be a solution optimised for extremely invisible metadata  
but not for metadata which differs from the human visible data.  
Imagine as an example the simple act of marking up a number (and  
ignoring what the number denotes).  For human consumption a thousands  
seperator is often used, the type of seperator differs by language,  
locale and context. Just in my little word I see on regular basis the  
point, the comma, the space, the thin space and sometimes the the  
apostrophe. Parsing different representations of numbers would be a  
chore. The value of textContent of the element <span  
itemprop="com.example.price">€ 1thinsp;000thinsp;000,—</ 
span> is clearly unusable, demanding an additional invisible <meta  
property="com.example.price" content="1000000">.

My irritation lies in the element proliferation, requiring one element/ 
attribute combination for machines, one element/text content  
combination for humans. Of course, any sane author would arrange both  
elements in a close relation, as parent/child or sibling but there  
would be still two different elements to maintain, leading to a higher  
cognitive load. Not just for authors but also for programmers: a  
fluctating price had to be actualized on two different elements; tree  
walking DOM scripts had to take meta-Elements in account. Furthermore  
it clashes with the familiar habit of other elements in HTML. A  
hyperlink is one element with a machine-readable attribute and human- 
readable text content. A citation is one element with a machine- 
readable reference and human-readable text content. The same model is  
used in <meter>, <progress>, <time>, <abbr> ... but not in user- 
defined objects. I'd prefer an additional @content-like attribute  
which supersedes the text content and maybe even the default values of  
the other value-bearing elements, reducing two different elements to  
maintain or change to just one.

> Instead, let us try using the regular "IDREF" functionality that  
> HTML uses
> in a variety of other places, like <label for="">. For this we'll  
> need a
> new attribute, but unfortunately we can't use about="" (which would  
> be the
> obvious name to use), because that would conflict with RDFa, so  
> instead
> we'll use subject="":

I'm slighty irritated by the implied change from active, possessive  
formulating (“The cat has the name Hedral.”) to something more passive- 
y (“Hedral is a name owned by that cat.“). My mental model for  
property relationships orients itself more on the former wording; link  
relationships are similar in that regard. @about/@subject are like  
@rev; a @resource alias @rel would feel more natural. There are  
practical relation by the missing @resource, I think. Imagine a  
document documenting an household and a household vocabulary which  
allows triples of <human>s which are in an <owner> relationship to a  
<cat>. Given an household of two humans and one cat; how does one  
markup the assumption that the cat has two owners?