[whatwg] Trying to work out the problems solved by RDFa

Calogero Alex Baldacchino alex.baldacchino at email.it
Fri Jan 9 12:07:47 PST 2009

Julian Reschke ha scritto:
> Calogero Alex Baldacchino wrote:
>> ...
>> This is why I was thinking about somewhat "data-rdfa-about", 
>> "data-rdfa-property", "data-rdfa-content" and so on, so that, for the 
>> purposes of an RDFa processor working on top of HTML5 UAs (perhaps in 
>> a test phase, if needed at all, of course), an element dataset would 
>> give access to "rdfa-about", instead of just "about", that is using 
>> the prefix "rdfa-" as acting as a namespace prefix in xml (hence, as 
>> if there were "rdfa:about" instead of "data-rdfa-about" in the markup).
>> ...
> That clashed with the documented purpose of data-*.

Hmm, I'm not sure there is a clash, since I was suggesting a *custom* 
and essentially *private* mechanism to experiment with RDFa in 
conjunction with HTML serialization, for the *small-scale* needs of some 
organizations willing to embed RDFa metadata in text/html documents, and 
to exchange them with each other by using a convention likely avoiding 
name clashes with other private metadata. Since I think it's unlikely to 
find data-rdfa-* used with different semantics in the very same page, 
and in a small-scale scenario involving a few *selected* sources for 
RDFa-modelled information, it should be likely to know in advance that 
someone else is using the same conventions. Such a modelled document 
might be used in conjunction with an external RDFa processor, thus 
avoiding any direct support in a browser.

However, such a convention might be enough "clash-free" to work on a 
wider scale, thus it might become widespread and provide an evidence 
that the web /needs/, or at least /has chosen/ to use RDFa as (one of) 
the most common way to embed metadata in a document, and such might be 
enough to add a native support for the whole range of "RDFa" attributes, 
eventually along with support for earlier experimental ones (such as 
"data-rdfa-*" and "rdfa:*" ones, for backward compatibility). And 
actually I can't see much of a problem if a private-born feature became 
the base of a widespread and widely accepted convention (I'm not saying 
the spec should name data-rdfa-* as a mean to implement RDFa, instead I 
think that, if a general agreement on if and how RDFa must be spec'ed 
out and implemented can't be found, such an experiment might be proposed 
to the semantic web industry and wait for the results - given a lack in 
support might prevent any interested party to use RDFa and HTML5 

> *If* we want to support RDFa, why not add the attributes the way they 
> are already named???

For instance, to experiment whether it is worth to change the "if we 
want" into "we do want", without requiring an early implementation and 
specification, nor relying on if and what a certain browser vendor might 
want to experiment differently from others (such a convention would only 
require support for HTML5 datasets and a script or a plugin capable to 
handle them as representing RDFa metadata). -- the point here is that 
after introducing data-* attributes as a mean to support custom 
attributes any browser vendors might decide to drop support for other 
kind of custom attributes in html serialization (that is, for attributes 
being neither part of the language nor data-* ones), therefore if they 
(or any of them) decided to avoid to support RDFa attributes until they 
were introduced in a specification there might be no mean to experiment 
with them (in general, that is cross-browser) without resorting either 
to data-* or to "rdfa:*" (the latter in xhtml).

Anyway, /in general/ what should a browser do with RDFa metadata, on a 
*wide scale*, other than classifying a portion of the open web (e.g. in 
its local history), eventually allowing users to select trusted sources?

Actually, I don't think such would bring enough benefits for *average* 
users, compared to the risk to get a lot of spam metadata from 
/heterogeneous/ sources. I really don't expect average users to 
understand how to filter sites basing on metadata reliability (and just 
for the purpose to use a metadata-based query interface, because a site 
with wrong metadata might still contain usefull informations); instead 
they might just try and use a query interface the same way they use a 
default search bar, get wrong results (once spam metadata became 
widespread) and decide the mechanism doesn't work fine (eventually 
complaining for that). A somewhat antispam filter might help, but I 
think that understanding if metadata are reliable, that is if they 
really correspond to a web page content, is an odd problem to be solved 
by a bot without a good degree of Artificial Intelligence (filtering 
emails by looking for suspicious patterns is far easier than 
implementing a filter capable to /understand/ metadata, /understand/ 
natural language and compare /semantics/ ).

As well, I don't expect the great majority of web pages to contain 
"valid" metadata: most people would not care of them, and a potentially 
growing number might copy&paste code containing metadata from other 
sites as a kind of template, then edit the content and ignore any 
metadata, thus breaking reliability. I do think wide-scale use of 
metadata coming from heterogeneous sources can be more harmful than 
useful. *If* we do agree that small-scale needs is the main context 
where RDFa can bring benefits, perhaps a custom mechanism and external 
plugins are all we need; otherwise, it should be proved that /misused/ 
and /abused/ metadata can be filtered out *easily* and *automatically*, 
without requiring average users to understand the problem, nor affecting 
the overall efficiency. IMHO.

>> ...
>> However, AIUI, actual xml serialization (xhtml5) allows the use of 
>> namespaces and prefixed attributes, thus couldn't a proper namespace 
>> be introduced for RDFa attributes, so they can be used, if needed, in 
>> xhtml5 documents? I think such might be a valuable choice, because it 
>> seems to me RDFa attributes can be used to address such cases where 
>> metadata must stay as close as possible to correspondent data, but a 
>> mistake in a piece of markup may trigger the adoption agency or 
>> foster parenting algorithms, eventually causing a separation between 
>> metadata and content, thus possibly breaking reliability of gathered 
>> informations. From this perspective, a parser stopping on the very 
>> first error might give a quicker feedback than one rearranging 
>> misnested elements as far as it is reasonably possible (not 
>> affecting, and instead improving, content presentation and users' 
>> "direct" experience, but possibly causing side-effects with metadata).
>> ...
> That would make RDFa as used in XHTML 1.* and RDFa used in HTML 5 
> incompatible. What for?
> > ...
> BR, Julian

Because I'm not sure RDFa can work fine with HTML serialization. To 
clarify that, let me take and modify an example from W3C Recommendation 
(without pretending it to be a good example to build a good worst-case 
scenario, but just to give an idea):

   I'm holding
   <span property="cal:summary">
     one last summer Barbecue
   </span>, to meet friends and have a party before the end of holidays
   <span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
     September 16th at 4pm

Now let consider it written as:

  I'm holding
  <span property="cal:summary">
    one last summer Barbecue
 <!-- now the </span> close tag is missing here -->,
  to meet friends and have a party before the end of holidays
  <span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
    September 16th at 4pm

The above would result in a parse error as an xml-serialized document, 
since the document isn't well formed. Instead, as part of an 
html-serialized document, the above fragment would be processed anyway, 
improving users' experience (with respect to a page stopping rendering 
on a missing close tag), but potentially causing metadata to be 
imprecisely binded to any data, thus potentially harming automated data 
extraction (for some purpose). Therefore, perhaps using such metadata 
only inside xml serialized pages might give a quick feedback on such a 
problem as soon as the author checked a page appearance (which I think 
would be the very first check, as well as I think about no one would 
check the _whole_ range of possible queries people might make over a 
document, to look for errors).

*If* this is meaningful, supporting RDFa attributes as "rdfa:*" might 
ensure that xml serialization is preferred by people really needing to 
use this kind of metadata (while leaving a chance to experiment RDFa 
with html serialization, because no one can be prohibited to use 
data-<prefix>-* for this purpose beside a proper script or plugin), 
whereas introducing "about", "property", "content", "datatype" and so on 
directly in html namespace, as attributes shared by all elements, would 
make the choice of one serialization or the other indifferent, thus 
leading to every possible side-effects html serialization may cause.

As a side note, It seems that people from the W3C are evaluating a 
resort to extensibility to introduce RDFa attributes into xml-serialized 
html documents, and they also have some doubts whether allow use of RDFa 
attributes within html serialization or not:

"The HTML WG is encouraged to provide a mechanism to permit 
independently developed vocabularies such as Internationalization Tag 
Set (ITS), Ruby, and RDFa to be mixed into HTML documents. /Whether this 
occurs through the extensibility mechanism of XML, *whether it is also 
allowed in the classic HTML serialization*, and whether it uses the DTD 
and Schema modularization techniques/, is for the HTML WG to determine."
(from <http://www.w3.org/2007/03/HTML-WG-charter#deliverables>)

WBR, Alex
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 Meetic: il leader italiano ed europeo per trovare l'anima gemella online. Provalo ora
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8291&d=9-1

More information about the whatwg mailing list