[whatwg] Fuzzbot (Firefox RDFa semantics processor)

Sun Jan 11 19:22:28 PST 2009

Toby A Inkster ha scritto:
> Calogero Alex Baldacchino wrote:
>
>> The concern is about every kind of metadata with respect to their
>> possible uses; but, while it's been stated that Microforamts (for
>> instance) don't require any purticular support by UAs (thus they're
>> backward compatible), RDFa would be a completely new feature, thus html5
>> specification should say what UAs are espected to do with such new
>> attributes.
>
> RDFa doesn't require any special support beyond the special support 
> that is required for Microformats. i.e. nothing. User agents are free 
> to ignore the RDFa attributes. In that sense, RDFa already "works" in 
> pretty much every existing browser, even going back to dinosaurs like 
> Mosaic and UdiWWW.
>
> Agents are of course free to offer more than that. Look at what they 
> do with Microformats: Firefox for instance offers an API to handle 
> Microformats embedded on a page; Internet Explorer offers its "Web 
> Slices" feature.
>

Well, at the beginning of this thread the possible need to interchange 
RDF metadata and merge triples from different vocabularies was suggested 
as a use case for RDFa serialization of RDF, and this would hint a 
requirement for supporting an  RDFa processor in every conforming UA. 
This also opens a question about what else might be needed beside 
collecting triples (is an API to build custom query applications enough, 
or should some query feature be provided by browsers? are there possible 
problems involved (like possible spam through fake metadata in cached 
ads)? possible solutions to prevent or moderate it?).

If, otherwise, nothing special must be done by browsers with RDFa 
attributes, and instead their main use is for script or plugin or 
server-side computations, or for "free" support by UA, these ones would 
be no way different from any other kind of custom attributes (thus 
should a validation requirement be let's accept every attribute?), 
herein included data-*, but for the /intended use/, which may make the 
difference but is something only a human can understand, and no 
validator can check (from this point of view, validating RDFa 
attributes, whatever else attribute, or just html5 attributes and custom 
data-* ones would be the same, as validating would not be a concern as 
it isn't for proprietary CSS extensions).

>> For what concerns html serialization, in particular, I'd consider 
>> some code like [...] which is rendered properly
>
>
> Is it though? Try adding the following CSS:
>
>     span[property="cal:summary"] { font-weight: bold; }
>
> And you'll see that CSS doesn't cope with a missing ending tag in that 
> situation either.
>
> If you miss out a non-optional end tag, then funny things will happen 
> - RDFa isn't immune to that problem, but neither is the DOM model, 
> CSS, microformats, or anything else that relies on knowing where 
> elements end. A better comparison would be a missing </p> tag, which 
> is actually allowed in HTML, and HTML-aware RDFa processors can 
> generally handle just fine.

That's definetely *not* the same issue. As I've replied in a previous 
mail, people *do not* need proper styling to understend prose, they just 
need to understand the prose language, then their /brains/ will cope 
with the rest, thus the above example results in some acceptable 
graceful degradation (it may or may not be the wanted presentation, 
depending on where the closing </span> was to be positioned (it wouldn't 
be the right presentation in this case), but it is not too harmful 
anyway). Bots based on metadata, instead *do need* reliable metadata to 
work properly, unless they're made smart enough to debug the code 
they're fed (should Artificial Intelligence be a requirement? - no 
sarcasm here).

If broken/wrong presentation caused by a missing end tag had ever been 
an issue, html-serialization would have been deprecated in favour of 
xml-one (if something really "problematic" happened, authors would 
notice it on their very first test by opening a page in a browser, 
whereas an extensive and complete debug for triples might be an odd 
problem in a large document). In contrast with that, any break in 
metadata semantics caused by html-serialization can only be a sever 
issue for a metadata-based bot (because it needs accurate metadata, 
while a non-very-accurate presentation is not a great concern for human 
beings in most cases, and if no particular presentation is attached to 
those spans, but instead they're used just to add semantics through 
metadata, as it happens to embedd RDF through RDFa attributes, a 
side-effect may arise), thus html-serialization may be more prone to 
side-effects than xml-serialization (which stops on validation errors, 
being in turn a possible cause for side-effects with metadata), from 
this point of view -- that is, since RDFa semantics is more reliable in 
a more well-formed document, xml-serialization might help to debug some 
errors, while it is not a strict requirement for content presentation, 
and instead finding more or less emboldened words is better for users 
than finding a page which is not rendered at all, thus the differences 
between xhtml and html.

But if it's or will be agreed that inaccurate metadata are reliable, or 
that uncertain reliablility is not an issue for wide-scale semantic web 
applications, well, I really don't know what to say apart than I just 
have a different opinion.

However, that was just the first example I was able to produce just to 
give an idea; better examples can surely be thought out. What if, for 
instance, foster parenting or adoption agency caused metadata to be put 
far from (part of) their correspondent data? Style is inherited, but a 
wrong triple is a wrong triple (from this perspective, a parse error 
/might/ highlight some misplaced metadata more quickly than a raw debug 
of triples).

My point is that html-serialization is enough robust with respect to 
presentational issues, in most cases (it's the same for non-screen 
media), but it might not be the same for RDFa modelled metadata, which 
require a greater "well-formedness" than content presentation to be 
enough reliable, since RDFa is conceived with the purpose to allow RDF 
serialization into xml documents in first place, without the possible 
validation problems arising by direct use of xml-serialized RDF, and as 
an alternative to RELAX NG (since strict xml parsers, as for xhtml, are 
more diffused) -- it's in the first chapter of RDFa specification: 
"1.Motivation".

That is, RDFa is born as an xml-related feature in primis, thus I think 
that concerning whether it can work as well in another kind of document 
(not if it may work, but if it may work as well in different documents 
or if it can work better in some than in other ones) is legitimate -- of 
course the same concern may apply to eRDF as well as to other kinds of 
metadata.

>
>> considering RDFa relies on namespaces (thus,
>> adding RDFa attributes to HTML5 spec would require some features from
>> xml extensibility to be added to html serialization).
>
>
> RDFa *does not* rely on XML namespaces. RDFa relies on eight 
> attributes: about, rel, rev, property, datatype, content, resource and 
> typeof. It also relies on a CURIE prefix binding mechanism. In XHTML 
> and SVG, RDFa happens to use XML namespaces as this mechanism, because 
> they already existed and they were convenient. In non-XML markup 
> languages, the route to define CURIE prefixes is still to be decided, 
> though discussions tend to be leaning towards something like:
>
> <html prefix="dc=http://purl.org/dc/terms/ 
> foaf=http://xmlns.com/foaf/0.1/">
> <address rel="foaf:maker" rev="foaf:made">This document was made by <a 
> href="http://joe.example.com" typeof="foaf:Person" rel="foaf:homepage" 
> property="foaf:name">Joe Bloggs</a>.</address>
> </html>
>

Well, yes, that's a possible solution to be considered. Anyway, that 
would require (at least) another new attribute to be specc'ed out, with 
possible new concerns. For instance, a missing space between prefix/URI 
pairs might compromise its good parsing (while space separated curies, 
for instance, being shorter than absolute URIs, can focus a major 
attention on typing errors in hand-written code, but this is a 
subtlety), thus a separate attribute for each URI might be more robust 
(for instance something like xmlns-* or just ns-* in the <html> tag, 
similar to xmlns:* but not clashing with xml namespace mechanism, on the 
same line as data-* but with a different "scope"). Also, something like 
the eRDF use of <link> elements to declare namespaces (or mappings from 
prefixes to curies, to be more consistent with RDFa conventions) inside 
the head element might work, because an html document is likely to 
present such declarations once at the beginning. However, each solution 
would have its own "pros" and "cons", wile xml namespaces perfectly fit 
the purpose, even because (one of) their main use is to represent 
prefixed attributes or elements names taken from an RDF vocabulary which 
is in turn an XML 'format' and to embed them in another kind of 
document, that is to represent something coming from a different namespace.

> This discussion seems to be about "should/can RDFa work in HTML5?" 
> when in fact, RDFa already can and does work in HTML5 - there are 
> approaching a dozen interoperable implementations of RDFa, the 
> majority of which seem to handle non-XHTML HTML. Assuming that people 
> see value in RDFa, and assuming that the same people see value in 
> using HTML5, then these people will use RDFa in HTML5. The question we 
> should be discussing is not "should it work?" (because it already 
> does), but rather, "should it validate?"
>

There should be also people seeing value in eRDF, at least enough people 
for eRDF being supported by SearchMonkey. It is sure that these people 
see value in using eRDF within html documents, since eRDF is conceived 
to work with HTML "natively", that is without any need to change HTML 
(by introducing new attributes or using unrecognized ones); 
nevertheless, eRDF can't be valid HTML5 because of the "profile" 
attribute, which has been dropped. Should eRDF validate instead? Should 
we prefere eRDF to RDFa or viceversa? Should we treat them the very same 
way? Or should we just wait and see which one works better for people, 
to avoid an early specification of something later possibly 
demonstrating to be less useful than originally thought, for instance 
because most people decided to use something else?

WBR, Alex

 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

 Sponsor:
 Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8551&d=12-1