[whatwg] Fuzzbot (Firefox RDFa semantics processor)

Sun Jan 11 21:23:04 PST 2009

Ian Hickson wrote:
> 
>> They have already solved some problems with RDF and wish only to adapt 
>> this generalized solution to work in HTML, while you wish to re-solve 
>> all of these problems from the ground up.
> 
> I don't necessarily wish to resolve the problems -- if they have existing 
> good solutions, I'm all in favour of reusing them. I just want to know 
> what those problems are that we're solving, so that we can make sure that 
> the solutions we're adopting are in fact solving the problems we want to 
> solve. It would be irresponsible to add features without knowing why.
> 

I would assume that our resident proponents are already satisfied that 
their higher-level problem have been solved, and this is why they're 
frustrated that you won't just let them map their existing solutions 
into HTML all in one fell swoop.

I'm not sure I'd put myself into the "RDF proponent" bucket, but I do 
know one use-case of RDF that I've encountered frequently so I'll post 
it as a starting point.

The FOAF schema for RDF[0] addresses the problem of making personal 
profile data machine-readable along with some of the relationships 
between people. From the outside looking in, it seems that the goal they 
set themselves was to make machine-readable the sort of information you 
find on a social networking site.

One problem this can solve is that an agent can, given a URL that 
represents a person, extract some basic profile information such as the 
person's name along with references to other people that person knows. 
This can further be applied to allow a user who provides his own URL 
(for example, by signing in via OpenID) to bootstrap his account from 
existing published data rather than having to re-enter it.

Google Social Graph API[1] apparently makes use of FOAF (when serialized 
as XML) as one of the sources of data so that given a URL that 
represents a person it can return a list of URLs that represent friends 
of that person.

The Google Profiles application[2] makes use of the output of the Social 
Graph API to suggest URLs that a user might want to list on his profile 
page, so the user only needs to fill in a couple of URLs by hand.

So, to distill that into a list of requirements:

- Allow software agents to extract profile information for a person as 
often exposed on social networking sites from a page that "represents" 
that person.

   There is a number of existing solutions for this:
     * FOAF in RDF serialized as XML, Turtle, RDFa, eRDF, etc
     * The vCard format
     * The hCard microformat
     * The PortableContacts protocol[3]
     * Natural Language Processing of HTML documents

- Allow software agents to determine who a person lists as their friends 
given a page that "represents" that person.

   Again, there are competing solutions:
     * FOAF in RDF serialized as XML, Turtle, RDFa, eRDF, etc
     * The XFN microformat[4]
     * The PortableContacts protocol[3]
     * Natural Language Processing of HTML documents

-----------------------------------------------

Assuming that the above is a convincing problem domain, now let's add in 
the following requirement:

- Allow the above to be encoded without duplicating the data in both 
machine-readable and human-readable forms.

Now our solution list is reduced to (assuming we consider both 
requirements together):
     * FOAF in RDF serialized as RDFa or eRDF
     * The hCard microformat + the XFN microformat
     * Natural Language Processing of HTML documents

All three of the above options address the use-cases as I stated them -- 
the Social Graph API apparently uses all three if you're willing to 
consider a MySpace-specific "screen-scraper" as Natural Language 
Processing -- so what would be the advantages of the first solution?

  * Existing RDF-based systems can use an off-the-shelf RDFa or eRDF 
parser and get the same data model (RDF triples of FOAF predicates) that 
they were already getting from the XML and Turtle RDF serializations, 
reducing the amount of additional work that must be done to consume this 
format.

  * FOAF has an extensive vocabulary that's based on fields that have 
been observed on social networking sites, while hCard is built on vCard 
which has a more constrained scope intended for the sort of entries 
you'd expect to find in an "address book".

  * FOAF has been adopted -- usually in the RDF-XML serialization -- by 
some number of social networking sites (e.g. LiveJournal) so they are 
presumably already somewhat familiar with the FOAF vocabulary and may 
therefore be able to adopt it more easily in the RDFa or eRDF 
serializations.

Though there are of course also some disadvantages:

  * Some sites are already publishing XFN and/or hCard so consuming 
software would need to continue to support these in addition to 
FOAF-in-HTML-somehow, which is more work than supporting only XFN and 
hCard. (In other words, "XFN/hCard already work today")

  * RDFa requires extensions to the HTML language, while XFN, hCard and 
NLP do not.

  * Many existing FOAF parsers are not actually RDF parsers but are 
rather using stock XML parsers and assuming a particular tree layout, so 
they would not be able to reuse any code in processing triples from RDFa 
or eRDF.

-------------------------------------

Is this the sort of thing you're looking for, Ian?

Much of the above section could be applied to any other RDF vocabulary 
with a bit of search and replace, but I'll leave that to others since 
FOAF is the only RDF vocabulary with which I have any experience.

(and if I've misrepresented any of the facts about FOAF or RDF I'm happy 
to be corrected. I'm writing this only in an attempt to move the 
discussion forward; I'm currently neutral on whether RDFa should be 
adopted into HTML5.)

[0]http://www.foaf-project.org/
[1]http://code.google.com/apis/socialgraph/
[2]http://www.google.com/support/accounts/bin/answer.py?answer=97703&hl=en
[3]http://portablecontacts.net/
[4]http://www.gmpg.org/xfn/