[whatwg] Trying to work out the problems solved by RDFa

Fri Jan 2 09:52:35 PST 2009

On Fri, Jan 2, 2009 at 12:12 AM, Charles McCathieNevile
<chaals at opera.com> wrote:
> On Fri, 02 Jan 2009 05:43:05 +1100, Andi Sidwell <andi at takkaria.org> wrote:
>
>> On 2009-01-01 15:24, Toby A Inkster wrote:
>>>
>>> The use cases for RDFa are pretty much the same as those for
>>> Microformats.
>>
>> Right, but microformats can be used without any changes to the HTML
>> language, whereas RDFa requires such changes.  If they fulfill the same use
>> cases, then there's not much point in adding RDFa.
>
> ...

Why the non-response?  This is precisely the point of contention.
Things aren't added to the spec on a whim.  Things get added when it
is demonstrated that authors will significantly benefit from the
inclusion of the feature in the language.  Microformats (used as an
example only) use only features already in the language, and thus do
not need any spec support.  If they already solve the problem
adequately, then there is no need to go further.

>>> So why RDFa and not Microformats?
>
> (I think the question should be why RDFa is needed *as well as* µformats)

This is correct.  Microformats exist already.  They solve current
problems.  Are there further problems that Microformats don't address
which can be solved well by RDFa?  Are these problems significant
enough to authors to be worth addressing in the spec, or can we wait
and let the community work out its own solutions further before we
make a move?  We generally want to wait until a given item is truly
established before speccing it, so that we can work with existing
use-cases and solve known problems.  To do otherwise risks us
inventing use-cases that don't commonly exist in reality, solving
non-problems while leaving gaping holes that will cause authors
problems down the line.

For an example (used several times, but that's because it's a really
good example), consider <video>.  Flash-based video players are
already extremely common.  We know how people use them, we know what
authors generally expect from them, and we know what problems exist
with how they are currently implemented and used.  We also feel that
extending the language would allow us to solve these problems, and
help authors significantly.  Thus, <video>.

Microformats are the metadata equivalent of Flash-based video players.
 They are hacks used to allow authors to accomplish something not
explicitly accounted for in the language.  Are there significant
problems with this approach?  Is metadata embedding used widely enough
to justify extending the language for it, or are the current hacks
(Microformats, in this case) enough?  Are current metadata embedding
practices mature enough that we can be relatively sure we're solving
actual problems with our extension?  These are all questions that must
be asked of any extention to the language.

>>> Firstly, RDFa provides a single unified parsing algorithm that
>>> Microformats do not. ...
>
>> This is not necessarily beneficial.  If you have separate parsing
>> algorithms, you can code in shortcuts for common use-cases and thus optimise
>> the authoring experience.
>
> On the other hand, you cannot parse information until you know how it is
> encoded, and information encoded in RDFa can be parsed without knowing more.
>
> And not only can you optimise your parsing for a given algorithm, you can
> also do for a known vocabulary - or you can optimise the post-parsing
> treatment.

What is the benefit to authors of having an easily machine-parsed
format?  (Note: this is completely separate from the question of the
benefits of metadata at all.)  Are they greater than the benefits of a
format that is harder to parse, but easier for authors to write?

>
>>  Also, as has been pointed out before in the distributed extensibility
>> debate, parsing is a very small part of doing useful things with content.
>
> Yes. However many of the use cases that I think justify the inclusion of
> RDFa are already very small on their own, and valuable when several
> vocabularies are combined. So being able to do off-the-shelf parsing is
> valuable, compared to working out how to parse a combination of formats
> together.

Can you provide these use-cases?  The discussion has an astonishing
dearth of use-cases by which we can evaluate the effectiveness of
proposals.

>>> Secondly, as the result of having one single parsing algorithm,
>>> decentralised development is possible. If I want a way of marking up my
>>> iguana collection semantically, I can develop that vocabulary without
>>> having to go through a central authority.
>>
>> You can develop vocabularies without going through a central authority
>> already, via class or id, and many people already do.
>>
>>> Because URIs are used to
>>> identify vocabulary terms, I can be sure that my vocabulary won't clash
>>> with other people's vocabularies.
>>
>> Again, you can do this with class, by putting your domain name in the
>> class attribute.  It also depends on how much of an issue you think clashes
>> will be with an iguana collection-- I would suggest that due to the
>> specialised nature of the markup, clashes would be quite unlikely.
>
> It depends how many people work on iguana collections - or Old Norse and
> Anglo Saxon text, which was the use case that got me involved in the Web in
> the very early 90s. It turns out that people don't, in the µformats world,
> use unambiguous names, especially when they are privately developing their
> own information. By contrast, those who come from an RDF world do this by
> habit.

Is this a problem that needs to be solved in the spec, or is it one
that can be solved socially?  More importantly, is it a problem that
needs to be solved at all?  Is there any indication that use of
ambiguous names produces significant problems for authors?

>>> It can be argued that going through a
>>> community to develop vocabularies is beneficial, as it allows the
>>> vocabulary to be built by "many minds" - RDFa does not prevent this, it
>>> just gives people alternatives to community development.
>>
>> RDFa does not give anything over what the class attribute does in terms of
>> community vs individual development, so this doesn't really speak in RDFa's
>> favour.
>
> In principle no, but in real world usage the class attribute is considered
> something that is primarily local, whereas RDFa is generally used by people
> who have a broader outlook on the desirable permanence and re-usability of
> their data.

Can we extract a requirement from this, then?

>>> Lastly, there are a lot of parsing ambiguities for many Microformats.
>>> One area which is especially fraught is that of scoping. The editors of
>>> many current draft Microformats[1] would like to allow page authors to
>>> embed licensing data - e.g. to say that a particular recipe for a pie is
>>> licensed under a Creative Commons licence. However, it has been noted
>>> that the current rel=license Microformat can not be re-used within these
>>> drafts, because virtually all existing rel=license implementations will
>>> just assume that the license applies to the whole page rather than just
>>> part of it. RDFa has strong and unambiguous rules for scoping - a
>>> license, for example, could apply to a section of the page, or one
>>> particular image.
>>
>> Are there other cases where this granularity of scoping would be genuinely
>> helpful?  If not, it would seem better to work out a solution for scoping
>> licence information...
>
> Yes.
>
> Being able to describe accessibility of various parts of content, or point
> to potential replacement content for particular use cases, benefits
> enormously from such scoping (this is why people who do industrial-scale
> accessibility often use RDF as their infrastructure). ARIA has already taken
> the approach of looking for a special-purpose way to do this, which
> significantly bloats HTML but at least allows important users to satisfy
> their needs to be able t produce content with certain information included.
>
> Government and large enterprises produce content that needs to be
> maintained, and being able to include production, cataloguing, and similar
> metadata directly, scoped to the document, would be helpful. As a trivial
> example, it would be useful to me in working to improve the Web content we
> produce at Opera to have a nice mechanism for identifying the original
> source of various parts of a page.

Can we distill this into use-cases, then?  You, as an author, want to
be able to specify the original source of a piece of content.  What's
the practical use of this?  Does it require an embedded,
machine-readable vocabulary to function?  Are existing solutions
adequate (frex, footnotes)?

>> What would you do with scoped copyright information, anyway?  I can see
>> images being an issue, but ideally information about a resource should be
>> kept in that resource, and as such the licence should be embedded in the
>> image rather than given by a Web page.  In the case of particular sections
>> having particular licences, is there any practical use of marking up
>> different sections with different licences over just doing that with text?
>
> Mash-ups. If they have a use-case, and I think it is widely accepted that
> they do, then it would seem obvious that being able to identify the source
> of each part, and any conditions that vary between different sources, is a
> use case.

Not quite.  Specifically, is there any practical use for marking up
various sections of a site with licensing information specific to that
section *in an embedded, machine-readable manner*?  Are the existing
solutions adequate (frex, simply putting a separate copyright notice
on each section, or noting the various copyrights on a licensing
page)?

(Note: I responded to your email rather than the OP because it
presented better points to respond to.)

~TJ