[whatwg] Trying to work out the problems solved by RDFa
Calogero Alex Baldacchino
alex.baldacchino at email.it
Sat Jan 3 08:51:53 PST 2009
Charles McCathieNevile ha scritto:
>>> The results of the first set of Microformats efforts were some pretty
>>> cool applications, like the following one demonstrating how a web
>>> browser could forward event information from your PC web browser to
>>> phone via Bluetooth:
>> It's a technically very interesting application. What has the adoption
>> rate been like? How does it compare to other solutions to the problem,
>> like CalDav, iCal, or Microsoft Exchange? Do people publish calendar
>> events much? There are a lot of Web-based calendar systems, like
>> or WebCalendar. Do people expose data on their Web page that can be used
>> to import calendar data to these systems?
> In some cases this data is indeed exposed to Webpages. However,
> anecdotal evidence (which unfortunately is all that is available when
> trying to study the enormous collections of data in private intranets)
> suggests that this is significantly more valuable when it can be done
> within a restricted access website.
>>> In short, RDFa addresses the problem of a lack of a standardized
>>> semantics expression mechanism in HTML family languages.
>> A standardized semantics expression mechanism is a solution. The lack
>> of a solution isn't a problem description. What's the problem that a
>> standardized semantics expression mechanism solves?
> There are many many small problems involving encoding arbitrary data
> in pages - apparently at least enough to convince you that the data-*
> attributes are worth incorporating.
> There are many cases where being able to extract that data with a
> simple toolkit from someone else's content, or using someone else's
> toolkit without having to tell them about your data model, solves a
> local problem. The data-* attributes, because they do not represent a
> formal model that can be manipulated, are insufficient to enable
> sharing of tools which can extract arbitrary modelled data.
That's because the data-* attributes are meant to create custom models
for custom use cases not (necessarily) involving interchange and (let me
say) "agnostic extraction" of data. However, data-* attributes might be
used to "emulate" support for RDFa attributes, so that each one might be
mapped to, let's say, a "data-rdfa-<attribute>" one and viceversa (I
don't think "data-rdfa-about" vs "about" would make a great difference,
at least in a test phase, since it wouldn't be much different from
"rdfa:about", which might be used to embed RDFa attributes in a somewhat
xml language (e.g. an "external" markup embedded in a xhtml document
through the extension mechanism)).
Since it seems there are several problems which may be addressed (beside
other, more custom models) by RDFa for organization-wide internal use
and intranet publication, without the explicit requirement of external
interchange, when both HTML5 specific features and RDFa attributes are
felt as necessary, it shouldn't be too difficoult to create a custom
parser, comforming to RDFa spec and availing of data-* attributes, to be
plugged in a certain browser supporting html5 (and data-*) for internal
test first, then exposed to the community, so that html5+rdfa can be
tested on a wider scale (especially once alike parsers are provided for
all main browsers), looking for a widespread adoption to point out an
effective need to merge RDFa into HTML5 spec (or to standardize an
approach based on data-* attributes).
That is, since RDFa can be "emulated" somehow in HTML5 and tested
without changing current specification, perhaps there isn't a strong
need for an early adoption of the former, and instead an "emulated"
mergence might be tested first within current timeline.
>> What is the cost of having different data use specialised formats?
> If the data model, or a part of it, is not explicit as in RDF but is
> implicit in code made to treat it (as is the case with using scripts
> to process things stored in arbitrarily named data-* attributes, and
> is also the case in using undocumented or semi-documented XML formats,
> it requires people to understand the code as well as the data model in
> order to use the data. In a corporate situation where hundreds or tens
> of thousands of people are required to work with the same data, this
> makes the data model very fragile.
I'm not sure RDF(a) solves such a problem. AIUI, RDFa just binds (xml)
properties and attributes (in the form of curies) to RDF concepts,
modelling a certain kind of relationships, whereas it relies on external
schemata to define such properties. Any undocumented or semi-documented
XML formats may lead to misuses and, thus, to unreliably modelled data,
and it is not clear to me how just creating an explicit relationship
between properties is enough to ensure that a property really represents
a subject and not a predicate or an object (in its wrongly documented
schema), if the problem is the correct definition of the properties
themselves. Perhaps it is enough to parse them, and perhaps it can
"inspire" a better definition of the external schemata (if the RDFa
"vision" of data as triples is suitable for the effective data to
model), but if the problem is the right understanding of "what
represents what" because of a lack in documentations, I think that's
something RDF/RDFa can't solve.
I think the same applies to data-* attributes, because _they_ describe
data (and data semantics) in a custom model and thus _they_ need to be
documented for others to be able to manipulate them; the use of a custom
script rather than a built-in parser does not change much from this
point of view.
> [not clear what the context was here, so citing as it was]
>>> > I don't think more metadata is going to improve search engines. In
>>> > practice, metadata is so highly gamed that it cannot be relied upon.
>>> > In fact, search engines probably already "understand" pages with far
>>> > more accuracy than most authors will ever be able to express.
>>> You are correct, more erroneous metadata is not going to improve search
>>> engines. More /accurate/ metadata, however, IS going to improve search
>>> engines. Nobody is going to argue that the system could not be gamed. I
>>> can guarantee that it will be gamed.
>>> However, that's the reality that we have to live with when introducing
>>> any new web-based technology. It will be mis-used, abused and
>>> The question is, will it do more good than harm? In the case of RDFa
>>> /and/ Microformats, we do think it will do more good than harm.
>> For search engines, I am not convinced. Google's experience is that
>> natural language processing of the actual information seen by the actual
>> end user is far, far more reliable than any source of metadata. Thus
>> Google's perspective, investing in RDFa seems like a poorer investment
>> than investing in natural language processing.
> Indeed. But Google is something of an edge case, since they can afford
> to run a huge organisation with massive computer power and many
> engineers to address a problem where a "near-enough" solution brings
> themn the users who are in turn the product they sell to advertisers.
> There are many other use cases where a small group of people want a
> way to reliably search trusted data.
I think the point with general purpose search engines is another one:
natural language processing, whereas being expensive, grants a far more
accurate solution than RDFa and/or any other kind of metadata can bring
to a problem requiring data must never need to be trusted (and, instead,
a data processor must be able to determine data's level of trust without
any external aid). Since there is no "direct" relationship between the
semantics expressed by RDFa and the real semantics of a web page
content, relying on RDFa metadata would lead to widespread cheats, as it
was when the keywords meta tag was introduced. Thus, a trust
chain/evaluation mechanism (such as the use of signatures) would be
needed, and so a general purpose search engine relying on RDFa would
seem to be working more as a search directory, where human beings
analyse content to classify pages, resulting in a more accurate result,
but also in a smaller and very slowly growing database of classified
sites (since obviously there will always be far more sites not caring of
metadata and/or of making their metadata trusted, than sites using
trusted RDFa metadata).
(the same reasoning may apply to a local search made by a browser in its
local history: results are reliable as far as the expressed semantics is
reliable, that is as far as its source is reasonably trusted, which may
not be true in general - in general, misuses and deliberate abuses
whould be the most common case without a trust evaluation mechanism,
which, in turn, would restrict the number of pages where the presence of
rdf(a) metadata is really helpful).
My concern is that any data model requiring any level of trust to
achieve a good-working interoperability may address very small (and
niche) use cases, and even if a lot of such niche use cases might be
grouped in a whole category consistently addressed by RDFa (perhaps
beside other models), the result might not be an enough significant use
case fitting actual specification guidelines (which are somehow hostile
to (xml) extensibility, as far as I've understood them) -- though they
might be changed when and if really needed.
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Con Meetic trovi milioni di single, iscriviti adesso e inizia subito a fare nuove amicizie
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8290&d=3-1
More information about the whatwg