[whatwg] Trying to work out the problems solved by RDFa

Calogero Alex Baldacchino alex.baldacchino at email.it
Sat Jan 3 18:23:57 PST 2009


Toby A Inkster ha scritto:
> Calogero Alex Baldacchino wrote:
>
>> My concern is: is RDFa really suitable for everyone and for Web
>> automation? My own answer, at first glance, is no. That's because RDF(a)
>> can perhaps address nicely very niche needs, where determining how much
>> data can be trusted is not a problem, but in general misuses AND
>> deliberate abuses may harm automation heavily
>
> If your agent isn't going to trust the data gleaned from RDFa, then 
> why should it trust the data gleaned from the web page's natural 
> language? If the page has been authored by a reprobate that cannot be 
> trusted to put honest and correct data in a few RDFa attributes, why 
> should we trust their prose text?
>

If you sell computers but your site talks about cars I'll never buy a 
notebook from you; thus you're not cheating me, but yourself and 
damaging your business. But if you believe cars are searched more often 
than computers (just an example), one may use false metadata to cheat 
any UAs relying on metadata instead of prose, and take me on a store 
selling computers instead of cars.

Reliability of metadata (with respect to the described data) is an issue 
separated from reliability of content: it's not up to any UA to 
understand AND filter content basing on the author being trusted to be 
saing the truth (such would be a form of censorship), but if I ask the 
UA to bring me a page talking about horses, I don't want it to bring me 
a page talking about v.i.a.g.r.a. (that's spam), thus it is up to any UA 
relying on metadata to understand AND filter them basing on their 
reliability.

> An oft-quoted answer is that the prose text is "visible" whereas the 
> RDFa is somehow "invisible". Apart from the fact that UIs which make 
> use of data pulled in from RDFa will make this data visible, there is 
> also the fact that RDFa, unlike an external RDF/XML file, or some 
> metadata embedded in a <script> block, makes use of as much visible 
> data as possible: visible links, visible text, etc.
>
>     <p>My name is <span property="foaf:name"
>       about="#me">Toby Inkster</span>.</p>
>
> If you can't trust someone to correctly mark up what their name is, 
> then why trust them to mark up what deserves <em>phasis? Why believe 
> the <address> they provide? What if the instance they marked up with 
> <dfn> is not really the defining one? What if a <var> is really a 
> constant?
>

I don't really need a proper markup to understand a name is a name, a 
variable is a variable, a definition is a definition, and so on; you can 
use plain text and I'll understand your content the same way. If one 
makes a mistake when combining a <dfn> with an anchor, the result may be 
a broken link, perhaps making me look for a better site. If one's 
misusing <var> or <em>, the worst possible consequence is a bad 
presentation, and a bad presentation can be an attempt to cheat a UA (as 
when people puts a lot of keywords in a page and style them with the 
same color as the background to cheat search engines), but such is only 
if it is a deliberate choice, not a misuse (and I'm concerning mainly on 
abuses) -- anyway, it is easier to cheat a UA by the mean of false 
metadata than cheating a human person by the mean of wrong markup.

If some markup is like,

<p>We sell <a href="www.cheatingcarseller.com" property="foaf:name" 
content="Toby Inkster">cars</a></p>

in any advertisement, I'll notice it's about cars and I'll choice 
whether to follow it or not, basing on my interest at the moment, but if 
I query "Toby Inkster" to a semantic UA blindly relying on metadata, I 
might get a page of a cars webstore instead of your homepage (for instance).

Furthermore, I started my replies from a Charles McCathieNevile's mail, 
explicitly talking about trusted data and (mainly) small use cases, not 
a wide-scale web automation. If there's no agreement about what kind of 
needs are best addressed by RDFa, maybe I have to agree with people 
saying that technology must grow and become more mature (or, at least, 
better understood) before it is merged into HTML5 specification (and 
2023 is far enough to accomplish such a goal :-) ). And I re-throw my 
suggestion to map RDFa attribute to data-rdfa-* attributes and build 
RDFa processor plugins for most common browsers, to test HTML5 and RDFa 
convergence in a wider scale before having browser natively supporting 
RDFa in HTML5 documents (for the purpose of a test - but not only - I 
don't think "data-rdfa-property" vs "rdfa:property" vs "property" would 
be much of a problem).

I'm not saying RDFa is a bad thing, or it is useless, I just don't think 
any kind of markup can fit perfectly the semantic of "random" content 
for the purposes of a "global", wide-scale and automatic classification 
of content.

Best regards,
Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Incrementa la visibilita' della tua azienda con l'invio di newsletter e campagne email marketing.
* Con investimento di soli 250 Euro puoi incrementare la tua visibilita'
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8350&d=4-1



More information about the whatwg mailing list