[whatwg] Extensible microdata attributes

Tue Apr 26 19:54:57 PDT 2011

  On 4/26/2011 9:55 PM, Benjamin Hawkes-Lewis wrote:
> On Tue, Apr 26, 2011 at 2:32 PM, Brett Zamir<brettz9 at yahoo.com>  wrote:
>> That's kind of my purpose though. Sometimes, one does not wish to embed the
>> text itself, but one still wishes the data encoded so it can be retrieved by
>> other means. Why should extensible semantics be restricted to visible
>> information?
> http://microformats.org/wiki/principles
>
> http://tantek.com/log/2005/06.html#d03t2359
>
Thanks for the references. While this may be relevant for the likes of 
blogs and other documents whose requirements for semantic density is 
limited enough to allow such reshaping for practical effect and whose 
content is reshapeable by the content creator (as opposed to 
republishing of already completed books), for more semantically dense 
content, such as the types of classical documents marked up by TEI, it 
is simply not possible to expose text for each bit of semantic 
information or to generate new text to meet that need. And of course, 
even with microformats/microdata as it is now, the semantic content 
itself is not necessarily exposed just because text is visible on the page.

The issue of discoverability is I think more related to how it will be 
consumed or may be consumed. And even if some pieces of information are 
less discoverable, it does not mean that they have no value. For such 
rich documents, a lot of attention is being paid to these texts since 
they are themselves considered important enough to be worth the time.

If the Declaration of Independence of the United States was marked up 
with hidden information about prior emendations, their likely reasons, 
etc., or about suspected authors of particular passages, or the United 
Nations Declaration of Human Rights were marked up to indicate which 
countries have expressed reservations (qualifications) about which 
rights, while a browsing application or query tool ought to be able 
(optionally) expose this hidden information, there is no automatic need 
for the markup to be polluted with extra (hidden) (and especially 
URI-based or other non-textual) tags when an attribute would suffice.

For things that are truly important, there may be a great deal of care 
put into building up many layers which are meant to be peeled away, and 
it is worth allowing some of that information (particularly the 
non-textual information, e.g., the conditions of authorship, publisher, 
etc.), especially which the original publication did not expose, to be 
still selectively revealed to queries and deeper browsing.

If a site like Wikisource (the online library sister project of 
Wikipedia's) would be able to offer such officially sanctioned semantic 
attributes, classic texts could become enhanced in this way over time, 
with the wiki exposing the hidden semantic information, which indeed may 
not be as important as the visible text, but with queries by interested 
to users, any problems in encoding could be discovered just as well.

While I know most hip web authors and developers are minimalists, can't 
we all just get along? Can't those of us interested in such richness, 
and with a view to progressively enhancing documents into the far 
future, also be welcomed into the web?

Brett