[whatwg] What RDF does Re: Trying to work out...

Tue Feb 3 19:16:26 PST 2009

Charles McCathieNevile ha scritto:
> On Fri, 09 Jan 2009 12:54:08 +1100, Calogero Alex Baldacchino 
> <alex.baldacchino at email.it> wrote:
>
>> I admit I'm not very expert in RDF use, thus I have a few questions. 
>> Specifically, maybe I can guess the advantages when using the same 
>> (carefully modelled, and well-known) vocabulary/ies; but when two 
>> organizations develop their own vocabularies, similar yet different, 
>> to model the same kind of informations, is merging of data enough? 
>> Can a processor give more than a collection of triples, to be then 
>> interpreted basing on knowledge on the used vocabulary/ies?
>
> RDF consists of several parts. One of the key parts explains how to 
> make an RDF vocabulary self-describing in terms of other vocabularies.
>
>>  I mean, I assume my tools can extract RDF(a) data from whatever 
>> document, but my query interface is based on my own vocabulary: when 
>> I merge informations from an external vocabulary, do I need to 
>> translate one vocabulary to the other (or at least to modify the 
>> query backend, so that certain curies are recognized as representing 
>> the same concepts - e.g. to tell my software that 'foaf:name' and 
>> 'ex:someone' are equivalent, for my purposes)? If so, merging data 
>> might be the minor part of the work I need to do, with respect to 
>> non-RDF(a) metadata (that is, I'd have tools to extract and merge 
>> data anyway, and once I translated external metadata to my format, I 
>> could use my own tools to merge data), specially if the same model is 
>> used both by mine and an external organization (therefore requiring 
>> an easier translation).
>
> If a vocabulary is described, then you can do an automated translation 
> from one RDF vocabulary to another by using your original query based 
> in your original vocabulary. This is one of the strengths of RDF.
>

Certainly, this is a strong benefit. However, when comparing different 
vocabularies in depth to their basic description (if any), I guess there 
may be a chance to find vocabularies which are not described in terms of 
each other, or of a third common vocabulary, thus a translation might be 
needed anyway. This might be true for small-time users developing a 
vocabulary for internal use before starting an external partnership, or 
regardless of the partnership (sometimes, small-time users may find it 
easier/faster to "reinvent the wheel" and modify it to address evolving 
problems; potentially someone might be unable to afford an extensive 
investigation to find an existing vocabulary fulfilling his requirments, 
or to develope a new one in conjunction with a partner having similar 
but slightly different needs, and thus potentially leading to a longer 
process to mediate respective needs. In such a case, I wouldn't expect 
that such a person will look for existing, more generic vocabularies 
which can describe the new one in order to ensure the widest possible 
interchange of data - that is, until a requirement for interchange 
arises, designing a vocabulary for that might be an overengineered task, 
and once the requirement was met, addressing it with a translation or 
with a description in term of a vocabulary known to be involved (each 
time the problem recurres) might be easier/faster than engineering a 
good description once and for all).

Anyway, let's assume we're going to deal with well-described 
vocabularies. Is the automated translation a task of a parser/processor 
creating a graph of triples, or a task of a query backend? And what are 
the requirements for a UA, from this perspective? Must it just parse the 
triples and create a graph or also take care of a vocabulary 
description? Must it be a complete query backend? Must it also provide a 
query interface? How much basic or advanced must the interface be? I 
think we should answer questions like this, and try and figure out 
possible problems arising with each answer and possible related 
solutions, because the concern here should be what UAs must do with RDF 
embedded in a non-RDF (and non-XML) document.

>>  Thus, I'm thinking the most valuable benefit of using RDF/RDFa is 
>> the sureness that both parties are using the very same data model, 
>> despite the possible use of different vocabularies -- it seems to me 
>> that the concept of triples consisting of a subject, a predicate and 
>> an object is somehow similar to a many-to-many association in a 
>> database, whereas one might prefer a one-to-many approach - though, 
>> the former might be a natural choice to model data which are usually 
>> sparse, as in a document prose.
>
> I don't see the ananlogy, but yes, I think the big benefit is being 
> able to ensure that you know the data model without knowing the 
> vocabulary a priori - since this is sufficient to automate the process 
> of merging data into your model.
>

I understand the benefit with respect to well-known and/or 
well-described vocabularies, but I wonder if an average small-time user 
would produce a well-described or a very-custom vocabulary. In the 
latter case, a good knowledge of a foreing vocabulary should be needed 
before querying it and I guess the translation can't be automated, but 
requires an understanding level which might be close to the one needed 
to translate from a (more or less) different model. In this case, the 
benefit of an automated merging of data from similar models might be 
lost in front of a non-automated translation which might be as difficult 
as translating from different models (but with a sufficient verbal 
documentation - that is with a natural language description, which 
should be easier to produce than a code-level description), given that 
translated data should be easy to merge.

I'm pushing this concept because I think it should be clear what 
scenario is more likely to happen, to avoid to introduce features 
perfectly designed for the same people who can develop a "perfect" 
vocabulary with a "perfect" generic description, and I suppose to be the 
same who can afford to develop a generic toolkit on their own, or to 
adjust an existing one (thus, they might be pleased with a basic support 
and a basic API), but not for most small-time users, who might develop a 
custom vocabulary the same way they develop a custom model, thus needing 
more custom tools (again, a basic support and a basic API might satisfay 
their needs, more than a complete backend working fine with 
well-described vocabularies but not with completely unknown ones, thus 
requiring a custom developement anyway).

Assuming this is true, there should be an evidence that the same people 
who'd produce a "bad" vocabulary do not prefer a completely custom 
model, because, if they were the great majority, we would risk to invest 
resources (on the UAs side, if we made of it a general requirement) to 
help people who may be pleased with the help, but not really need it 
(because they're not small-time users maybe, and can do it on their own 
without too much effort -- this doesn't mean that their requirements are 
less significant and worth to be taken into account, but in general UA 
developers might not be very happy to invest their resources to 
implement something which is or appear overengineered with respect to 
the real needs "in the wild", thus we should carefully establish how 
strong is the need to support RDFa and accurately define support 
requirements for UAs).

> cheers
>
> Chaals
>

WBR, Alex

 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

 Sponsor:
 Blu American Express: gratuita a vita! 
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8613&d=4-2