[whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector
shelleyp at burningbird.net
Sat Jan 17 08:55:01 PST 2009
The debate about RDFa highlights a disconnect in the decision making
related to HTML5.
The purpose behind RDFa is to provide a way to embed complex information
into a web document, in such a way that a machine can extract this
information and combine it with other data extracted from other web
pages. It is not a way to document private data, or data that is meant
data is for external extraction and combination.
An earlier email between Martin Atkins and Ian Hickson had the following:
"On Sun, 11 Jan 2009, Martin Atkins wrote:
> One problem this can solve is that an agent can, given a URL that
> represents a person, extract some basic profile information such as the
> person's name along with references to other people that person knows.
> This can further be applied to allow a user who provides his own URL
> (for example, by signing in via OpenID) to bootstrap his account from
> existing published data rather than having to re-enter it.
> So, to distill that into a list of requirements:
> - Allow software agents to extract profile information for a person
> exposed on social networking sites from a page that "represents" that
> - Allow software agents to determine who a person lists as their friends
> given a page that "represents" that person.
> - Allow the above to be encoded without duplicating the data in both
> machine-readable and human-readable forms.
> Is this the sort of thing you're looking for, Ian?
Yes, the above is perfect. (I cut out the bits that weren't really "the
problem" from the quote above -- the above is what I'm looking for.)
The most critical part is "allow a user who provides his own URL to
bootstrap his account from existing published data rather than having to
re-enter it". The one thing I would add would be a scenario that one would
like to be able to play out, so that we can see if our solution would
enable that scenario.
"I have an account on social networking site A. I go to a new social
networking site B. I want to be able to automatically add all my
friends from site A to site B."
There are presumably other requirements, e.g. "site B must not ask the
user for the user's credentials for site A" (since that would train people
to be susceptible to phishing attacks). Also, "site A must not publish the
data in a manner that allows unrelated users to obtain privacy-sensitive
data about the user", for example we don't want to let other users
determine relationships that the user has intentionally kept secret .
It's important that we have these scenarios so that we can check if the
solutions we consider are actually able to solve these problems, these
scenarios, within the constraints and requirements we have."
It would seem that Ian agrees with a need to both a) provide a way to
document complex information in a consistent, machine readable form and
that b) the purpose of this data is for external consumption, rather
than internal use. Where the disconnect comes in is he believes that
RDF, and the web page serialization technique, RDFa, are only one of a
set of possible solutions.
Yet at the same time, he references how the MathML and SVG people
provide sufficient use cases to justify the inclusion of both of these
into HTML5. But what is MathML. What does it solve? A way to include
mathematical formula into a document in a formatted manner. What is SVG?
A way to embed vector graphics into a web page, in such a way that the
individual elements described by the graphics can become part of the
So, why accept that we have to use MathML in order to solve the problems
of formatting mathematical formula? Why not start from scratch, and
devise a new approach?
So, why accept that we have to use SVG in order to solve the problems of
vector graphics? Why not start from scratch, and devise a new approach?
Come to think of it, I think we should also question the use of the
canvas element. After all, if the problem set is that we need the
ability to animate graphics in a web page using a non-proprietary
technology, then wouldn't something like SVG work for this purpose?
Isn't the canvas element redundant? But then, perhaps we should start
over from the beginning and just create a new graphics capability from
scratch, and reject both canvas and SVG.
We don't reject MathML, though. Neither do we reject SVG or canvas. Or
any other of a number of entities being included in HTML5, including
SQL. Why? Because they have a history of use, extensive documentation as
to purpose and behavior, and there are a considerable number of
implementations that support the specifications. It doesn't make sense
to start from scratch. It makes more sense to make use of what already
I have to ask, then: why do we isolate RDF, and RDFa for special
handling? If we can accept that SQL is a natural database query
mechanism, and SVG is a natural for vector graphics, and the canvas
element is the proper choice for a script-enabled bitmaps, and
MathML...well, you get the picture-if we can accept that these mature,
well documented representatives of each of their genres as the de facto
implementation, enough to incorporate each into HTML5, why then do we
demand that RDF and its web page serialization technique, RDFa, must
"prove" themselves, when we don't demand the same from other external
objects and specifications?
To do so is not consistent. To continue to do so demonstrates that
perhaps other issues are at play in regards to RDF/RDFa.
Martin provided a use case that Ian acknowledges is justified. Ipso
facto, we do not need to continue providing use cases for this type of
requirement. We have established that the requirement/need/desire to
incorporate data into a web page that is consistently machine readable,
which can be consistently extracted, and consistently combined with data
from other documents using automated processes is a legitimate need. RDF
was designed specifically for this purpose, is a mature specification,
with extensive documentation, and one can find many different
implementations of its use. The use of RDF for FOAF is just one of many
uses, RSS 1.0 was another, and a version of RDF embedded within photos,
CC licensing--these are all based on the same model.
In other words, if we accept that SVG is the de facto implementation of
vector graphics (as compared to something such as, say, VML), and we
accept the same for MathML, the canvas element, SQL, and so on, to not
accept RDF as the de facto implementation for the purpose behind which
it was designed, is to single out RDF/RDFa for "special handling" within
the group. To demand more from it, then has been demanded from any other
element included in HTML5.
In particular, as has been documented elsewhere, very little is needed
to support RDFa within HTML5. The requirements are much less than those
for the canvas element, SVG, MathML, and even SQL. So the task, itself,
is not daunting. Not as daunting as, say, the alt attribute.
This then returns us to my earlier supposition: To not support RDF/RDFa
as the de facto implementation of complex, structured data is not
consistent. To continue to do so demonstrates that perhaps other issues
are at play in regards to RDF/RDFa. Such inconsistencies are not in the
best interest when developing a new specification meant for widespread
use on the web. If, as I believe, the inconsistency reflects an
underlying bias against the concept behind RDF, which is that true web
semantics is based on structured data, not natural language processing,
or not exclusively based on natural language processing, then I believe
it's important to highlight such bias, and deal with it accordingly.
More information about the whatwg