[whatwg] Fuzzbot (Firefox RDFa semantics processor)
Calogero Alex Baldacchino
alex.baldacchino at email.it
Sun Jan 11 19:22:28 PST 2009
Toby A Inkster ha scritto:
> Calogero Alex Baldacchino wrote:
>
>> The concern is about every kind of metadata with respect to their
>> possible uses; but, while it's been stated that Microforamts (for
>> instance) don't require any purticular support by UAs (thus they're
>> backward compatible), RDFa would be a completely new feature, thus html5
>> specification should say what UAs are espected to do with such new
>> attributes.
>
> RDFa doesn't require any special support beyond the special support
> that is required for Microformats. i.e. nothing. User agents are free
> to ignore the RDFa attributes. In that sense, RDFa already "works" in
> pretty much every existing browser, even going back to dinosaurs like
> Mosaic and UdiWWW.
>
> Agents are of course free to offer more than that. Look at what they
> do with Microformats: Firefox for instance offers an API to handle
> Microformats embedded on a page; Internet Explorer offers its "Web
> Slices" feature.
>
Well, at the beginning of this thread the possible need to interchange
RDF metadata and merge triples from different vocabularies was suggested
as a use case for RDFa serialization of RDF, and this would hint a
requirement for supporting an RDFa processor in every conforming UA.
This also opens a question about what else might be needed beside
collecting triples (is an API to build custom query applications enough,
or should some query feature be provided by browsers? are there possible
problems involved (like possible spam through fake metadata in cached
ads)? possible solutions to prevent or moderate it?).
If, otherwise, nothing special must be done by browsers with RDFa
attributes, and instead their main use is for script or plugin or
server-side computations, or for "free" support by UA, these ones would
be no way different from any other kind of custom attributes (thus
should a validation requirement be let's accept every attribute?),
herein included data-*, but for the /intended use/, which may make the
difference but is something only a human can understand, and no
validator can check (from this point of view, validating RDFa
attributes, whatever else attribute, or just html5 attributes and custom
data-* ones would be the same, as validating would not be a concern as
it isn't for proprietary CSS extensions).
>> For what concerns html serialization, in particular, I'd consider
>> some code like [...] which is rendered properly
>
>
> Is it though? Try adding the following CSS:
>
> span[property="cal:summary"] { font-weight: bold; }
>
> And you'll see that CSS doesn't cope with a missing ending tag in that
> situation either.
>
> If you miss out a non-optional end tag, then funny things will happen
> - RDFa isn't immune to that problem, but neither is the DOM model,
> CSS, microformats, or anything else that relies on knowing where
> elements end. A better comparison would be a missing </p> tag, which
> is actually allowed in HTML, and HTML-aware RDFa processors can
> generally handle just fine.
That's definetely *not* the same issue. As I've replied in a previous
mail, people *do not* need proper styling to understend prose, they just
need to understand the prose language, then their /brains/ will cope
with the rest, thus the above example results in some acceptable
graceful degradation (it may or may not be the wanted presentation,
depending on where the closing </span> was to be positioned (it wouldn't
be the right presentation in this case), but it is not too harmful
anyway). Bots based on metadata, instead *do need* reliable metadata to
work properly, unless they're made smart enough to debug the code
they're fed (should Artificial Intelligence be a requirement? - no
sarcasm here).
If broken/wrong presentation caused by a missing end tag had ever been
an issue, html-serialization would have been deprecated in favour of
xml-one (if something really "problematic" happened, authors would
notice it on their very first test by opening a page in a browser,
whereas an extensive and complete debug for triples might be an odd
problem in a large document). In contrast with that, any break in
metadata semantics caused by html-serialization can only be a sever
issue for a metadata-based bot (because it needs accurate metadata,
while a non-very-accurate presentation is not a great concern for human
beings in most cases, and if no particular presentation is attached to
those spans, but instead they're used just to add semantics through
metadata, as it happens to embedd RDF through RDFa attributes, a
side-effect may arise), thus html-serialization may be more prone to
side-effects than xml-serialization (which stops on validation errors,
being in turn a possible cause for side-effects with metadata), from
this point of view -- that is, since RDFa semantics is more reliable in
a more well-formed document, xml-serialization might help to debug some
errors, while it is not a strict requirement for content presentation,
and instead finding more or less emboldened words is better for users
than finding a page which is not rendered at all, thus the differences
between xhtml and html.
But if it's or will be agreed that inaccurate metadata are reliable, or
that uncertain reliablility is not an issue for wide-scale semantic web
applications, well, I really don't know what to say apart than I just
have a different opinion.
However, that was just the first example I was able to produce just to
give an idea; better examples can surely be thought out. What if, for
instance, foster parenting or adoption agency caused metadata to be put
far from (part of) their correspondent data? Style is inherited, but a
wrong triple is a wrong triple (from this perspective, a parse error
/might/ highlight some misplaced metadata more quickly than a raw debug
of triples).
My point is that html-serialization is enough robust with respect to
presentational issues, in most cases (it's the same for non-screen
media), but it might not be the same for RDFa modelled metadata, which
require a greater "well-formedness" than content presentation to be
enough reliable, since RDFa is conceived with the purpose to allow RDF
serialization into xml documents in first place, without the possible
validation problems arising by direct use of xml-serialized RDF, and as
an alternative to RELAX NG (since strict xml parsers, as for xhtml, are
more diffused) -- it's in the first chapter of RDFa specification:
"1.Motivation".
That is, RDFa is born as an xml-related feature in primis, thus I think
that concerning whether it can work as well in another kind of document
(not if it may work, but if it may work as well in different documents
or if it can work better in some than in other ones) is legitimate -- of
course the same concern may apply to eRDF as well as to other kinds of
metadata.
>
>> considering RDFa relies on namespaces (thus,
>> adding RDFa attributes to HTML5 spec would require some features from
>> xml extensibility to be added to html serialization).
>
>
> RDFa *does not* rely on XML namespaces. RDFa relies on eight
> attributes: about, rel, rev, property, datatype, content, resource and
> typeof. It also relies on a CURIE prefix binding mechanism. In XHTML
> and SVG, RDFa happens to use XML namespaces as this mechanism, because
> they already existed and they were convenient. In non-XML markup
> languages, the route to define CURIE prefixes is still to be decided,
> though discussions tend to be leaning towards something like:
>
> <html prefix="dc=http://purl.org/dc/terms/
> foaf=http://xmlns.com/foaf/0.1/">
> <address rel="foaf:maker" rev="foaf:made">This document was made by <a
> href="http://joe.example.com" typeof="foaf:Person" rel="foaf:homepage"
> property="foaf:name">Joe Bloggs</a>.</address>
> </html>
>
Well, yes, that's a possible solution to be considered. Anyway, that
would require (at least) another new attribute to be specc'ed out, with
possible new concerns. For instance, a missing space between prefix/URI
pairs might compromise its good parsing (while space separated curies,
for instance, being shorter than absolute URIs, can focus a major
attention on typing errors in hand-written code, but this is a
subtlety), thus a separate attribute for each URI might be more robust
(for instance something like xmlns-* or just ns-* in the <html> tag,
similar to xmlns:* but not clashing with xml namespace mechanism, on the
same line as data-* but with a different "scope"). Also, something like
the eRDF use of <link> elements to declare namespaces (or mappings from
prefixes to curies, to be more consistent with RDFa conventions) inside
the head element might work, because an html document is likely to
present such declarations once at the beginning. However, each solution
would have its own "pros" and "cons", wile xml namespaces perfectly fit
the purpose, even because (one of) their main use is to represent
prefixed attributes or elements names taken from an RDF vocabulary which
is in turn an XML 'format' and to embed them in another kind of
document, that is to represent something coming from a different namespace.
> This discussion seems to be about "should/can RDFa work in HTML5?"
> when in fact, RDFa already can and does work in HTML5 - there are
> approaching a dozen interoperable implementations of RDFa, the
> majority of which seem to handle non-XHTML HTML. Assuming that people
> see value in RDFa, and assuming that the same people see value in
> using HTML5, then these people will use RDFa in HTML5. The question we
> should be discussing is not "should it work?" (because it already
> does), but rather, "should it validate?"
>
There should be also people seeing value in eRDF, at least enough people
for eRDF being supported by SearchMonkey. It is sure that these people
see value in using eRDF within html documents, since eRDF is conceived
to work with HTML "natively", that is without any need to change HTML
(by introducing new attributes or using unrecognized ones);
nevertheless, eRDF can't be valid HTML5 because of the "profile"
attribute, which has been dropped. Should eRDF validate instead? Should
we prefere eRDF to RDFa or viceversa? Should we treat them the very same
way? Or should we just wait and see which one works better for people,
to avoid an early specification of something later possibly
demonstrating to be less useful than originally thought, for instance
because most people decided to use something else?
WBR, Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8551&d=12-1
More information about the whatwg
mailing list