[whatwg] Trying to work out the problems solved by RDFa
Calogero Alex Baldacchino
alex.baldacchino at email.it
Fri Jan 9 20:35:27 PST 2009
Ben Adida ha scritto:
> Ian Hickson wrote:
>
>> We have to make sure that whatever we specify in HTML5 actually is going
>> to be useful for the purpose it is intended for. If a feature intended for
>> wide-scale automated data extraction is especially susceptible to spamming
>> attacks, then it is unlikely to be useful for wide-scale automated data
>> extraction.
>>
>
> It's no more susceptible to spam than existing HTML, as per my previous
> response.
>
>
Perhaps this is why general purpose search engines do not rely
(entirely) on metadata and markup semantics to classify content, nor
does Yahoo with SearchMonkey. SearchMonkey documentation points out that
metadata never affects page ranks, nor is semantics interpreted for any
purpose; metadata only affects additional informations presented to the
user at the user will, and if the user chose to get informations of a
certain kind (gathered by a certain data service), thus spammy metadata
can be thought as circumscribed in this case, they might corrupt
SearchMonkey additional data, but not the user's overall experience with
the search engine. From this point of view, SearchMonkey is some kind of
wide-range but small-scale use case (with respect to each tool and each
site the user might enable), because the user can easily choose which
sources to trust (e.g. which data services to use, or which sites to
look for additional infos), and in any case he can get enough infos
without metadata.
On the other hand, a client UA implementing a feature entirely based on
metadata couldn't easily circumscribe abused metadata and bring valid
informations to the user attention, nor could the average user take
easily trusted and spammy sites apart, because he wouldn't understand
the problem (and a site with spammy metadata might still contain
informations users were interested in previously, or in a different
context), whereas in SearchMonkey the average user would notice
something doesn't work in enhanced results, but he'd also get the basic
infos he was looking for. Thus there are different requirements to be
taken into account for different scenarios (SearchMonkey and client UA
are such different scenarios)
Moreover, SearchMonkey is a kind of centralised service based on
distributed metadata, it doesn't need collaboration by any other UA
(that is, it doesn't need support for metadata in other software) by
default (whereas it allows custom data services to autonomously extract
metadata, but always for the purposes of SearchMonkey), it only requires
that web sites adhering to the project (or just willing to provide
additional infos) embed some kind of metadata only for the purpose of
making them available to SearchMonkey services, or at least that authors
create appropriate metadata and send them to Yahoo (in the form of
dataRSS embedded in a Atom document). That is, SearchMonkey seems to me
a clear example of a use case for metadata not requiring any changes to
html5 spec, since any kind of supported metadata are used by
SearchMonkey as if they were custom, private metadata; whatever happens
to such metadata client-side, even if they're just stripped by a
browser, doesn't really matter.
Furthermore, SearchMonkey supports several kinds of metadata, not only
RDFa, but also eRDF, microformats and dataRSS external to the document.
So, why should SearchMonkey be the reason to introduce explicit support
to RDFa and not also for eRDF, which doesn't require new attributes, but
just a parser? One might think one solution is better than the other,
and this might be true in theory, but what really counts is what people
do find easier to use, and this might be determined by experience with
SearchMonkey (that is, let's see what people use more often, then decide
what's more needed).
Moreover, RDFa is thought for xhtml, thus it can't be introduced in html
serialization just by defining a few new attributes: a processor would
or might need some knowledge over /namespaces/, thus the whole "family"
of *xmlns* attributes (with and without prefixes) should be specified
for use with the html serialization, unless an alternative mechanism,
similar to the one chosen for eRDF, were defined, and maybe such would
result in a new, hybrid mechanism (stitching together pieces from eRDF
and RDFa). Buf if we introduce xmlns and xmlns:<prefix> into html
serialization, why not also prefixed attributes? That is, can RDFa be
introduced into html serialization "as is", without resorting to the
whole xml extensibility? This should be taken into account as well,
because just adding new attributes to the language might work fine for
xml-serialized documents, but might not for html-serialized ones. This
means RDFa support might be more difficult than it may seem at first
glance, whereas it might not be needed for custom and/or small scale use
cases (and I think SearchMonkey is one such case).
>> Nobody is suggesting that user agents derive any behavior from <title>, so
>> it doesn't matter if <title> is spammed or not.
>>
>
> And RDFa does not mandate any specific behavior, only the ability to
> express structure. The power lies in products like SearchMonkey that
> make use of this structure with innovative applications.
>
> Can one imagine tools that make poor use of this structured data so that
> they incentivize spam? Absolutely. Is this the bar for HTML5? If bad or
> poorly conceived applications can be imagined, then it's not in the
> standard?
>
>
I think the right question should be whether there are effective counter
measures to circumscribe bad uses and make possible damages less
significant then advantages from good uses. When a feature in the
standard is thought to be a possible security (or privacy) issue,
counter-measures are proposed. Since spam is a possible immediate issue
for abused metadata, especially in wide-scale and automated data
extraction, we should also think to possible counter-measures to be
specc'ed out along with RDFa attributes.
WBR, Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Innammorarsi è facile con Meetic, milioni di single si sono iscritti, si sono conosciuti e hanno riscoperto l'amore. Tutto con Meetic, prova anche tu!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8292&d=10-1
More information about the whatwg
mailing list