[whatwg] RDFa

Sat Aug 23 08:16:47 PDT 2008

+cc: Paul Miller of Talis, who worked on the AHDS report mentioned below.

Henri Sivonen wrote:
> On Aug 23, 2008, at 02:43, Ben Adida wrote:
> 
>> Why would you reinvent URIs in a way that they can't be de-referenced?
> 
> To avoid having misleading affordances.
> http://en.wikipedia.org/wiki/Affordance
> 
>> We want one parser, with variability and innovation in the vocabulary 
>> definition only.
> 
> Having one parser seems appealing compared to using the native 
> mechanisms of each of HTML (<meta>, <link>), PDF (document information 
> dictionary), PNG (tEXt chunk), etc. at first, but the vision that tools 
> handle this all when you remix culture already requires the tools to 
> support reading and writing the file formats they remix. When you 
> already have format-native key-value read/write capability, the ability 
> to build and mine RDF *graphs* becomes an additional burden.

It may not be obvious to those who haven't followed the history, or who 
were at school at the time, but many of us did indeed invest a lot of 
time and effort using name/value metadata structures in HTML. For 
example, the Dublin Core project began with this technology base 
beginning back in 1994/5, and the experience of metadata implementors 
using it was one of the drivers for the creation of RDF. At the time 
there no WHATWG to talk to, but the metadata community *did* talk to W3C.

See http://dublincore.org/about/history/

Early on, the Dublin Core community found a lot of pressure for 
feature-creep: new elements/terms to address the needs of various groups 
who liked Dublin Core, but wanted some specifics added. This situation 
gave rise to the 'Warwick Framework', defined in 1996 - 
http://www.dlib.org/dlib/july96/lagoze/07lagoze.html
[[
  While there was consensus among the attendees that the concept of a 
simple metadata set is useful, there were a number of fundamental 
questions concerning the real utility of the Dublin Core as it was 
defined at the end of the preceding workshop. Does the very loosely 
defined Dublin Core really qualify as a "standard" that can be read and 
processed programmatically? Should the number of the core elements be 
expanded, to increase semantic richness, or reduced, to improve 
ease-of-use by authors and/or web publishers? Will authors reliably 
attach core metadata elements to their content? Should a core metadata 
set be restricted to only descriptive cataloging information or should 
it include other types of metadata such as administrative information, 
linkage data, and the like? What is the relationship of the Dublin Core 
to other developing work in metadata schemes, particularly in those 
areas such as rights management information (terms and conditions)?

The workshop attendees concluded that the answer to these questions and 
the route to progress on the metadata issue lay in the formulation a 
higher-level context for the Dublin Core. This context should define how 
the Core can be combined with other sets of metadata in a manner that 
addresses the individual integrity, distinct audiences, and separate 
realms of responsibility of these distinct metadata sets.
]]

For an implementor report typical of the experience from this era, ie. 
with name/value pairs, see the UK Arts and Humanities Data Service 
document http://ahds.ac.uk/public/metadata/discovery.html which was 
presented at the Oct'97 Helsinki workshop of the Dublin Core. At the 
time I was involved with the ROADS internet cataloguing project and can 
vouch that we hit a similar ceiling with attribute/value metadata.

 From the appendix, http://ahds.ac.uk/public/metadata/disc_09.html ... 
here are some of attribute/value structures they were forced to squash 
their metadata records into.

DC.creator.corporateName.1
	Canterbury Archaeological Trust

DC.creator.phone.1
	+44 227 462062

DC.creator.personalName.2
	Paul Miller

DC.creator.affiliation.2
	Archaeology Data Service

...this expresses name, affiliation and contact information for a number 
of contributors to a work. Another example describes several 
contributors along with their roles (actor, director, etc). Again the 
attribute/value representations contained numeric indexes 
('DC.creator.role.9') to disambiguate which individual was being described.

>>> What barrier is there to building reusable vocabularies?
>>
>> The follow-your-nose principle is missing, which is fairly essential for
>> discovering the meaning of vocabularies (partially automatically, not by
>> doing a Google search.)
> 
> The partial automation with RDFa doesn't go very far. If a program 
> automatically dereferences http://creativecommons.org/ns# and parses the 
> result as RDFa, the program now has a human-readable string for each 
> property--not exactly something that the program can act on further 
> without human help.

Looking at this example,

           <div id="license" about="#license" typeof="rdf:Property">
               <h4>cc:license</h4>
               A <a rel="rdfs:domain" href="#Work">Work</a> <span 
property="rdfs:label">has license</span> a <a rel="rdfs:range" 
href="#License">License</a>. <br />

               (a <a rel="rdfs:subPropertyOf" 
href="http://purl.org/dc/terms/license">subproperty of dc:license</a>, 
<a rel="owl:sameAs" 
href="http://www.w3.org/1999/xhtml/vocab#license">the same as 
xhtml:license</a>)
           </div>

Actually we can do a fair bit more than simply have human readable 
strings. For example from the CC case, we've got a sub-property 
relationship between cc:license and dc:license. RDF often (more often, 
even) has relationships amongst classes too, and between classes and 
properties. So for example, the SIOC vocabulary defines a class 
sioc:User as a subclass of foaf:OnlineAccount; this is mechanically 
evident from http://rdfs.org/sioc/ns#    .... similarly, 
http://trac.usefulinc.com/doap defines the DOAP vocabulary, schema here 
- http://usefulinc.com/ns/doap# (webserver misconfigured re mimetype 
right now). DOAP defines a class doap:Project that subclasses FOAF's 
'Project' class, and which comes with a number of properties describing 
opensource software projects. Again this is mechanically evident. As the 
ccREL paper explains, and I can confirm w.r.t. FOAF, it is very useful 
to allow related projects to define related classes and properties but 
manage their evolution separately. It's a strategy for making 
incremental progress without a single project/organization carrying the 
burden of total coordination. Edd and friends in the DOAP project, for 
example, can keep developing new properties for describing projects. 
Elsewhere in the Web, we can be annotating the URI for 'foaf:Project' 
eg. with translations. 
http://svn.foaf-project.org/foaftown/foaf18n/foaf-kr.rdf tells us that a 
Korean rdfs:label for http://xmlns.com/foaf/0.1/Project is "프로젝트 (어 
떤 형태의 협업).". The DOAP list is busy figuring out how they might 
want (within DOAP or elsewhere, depending on complexity) to model 
customer relationships w.r.t. DOAP's notion of project, see 
http://lists.usefulinc.com/pipermail/doap-interest/2008-August/000338.html 
... but whatever they come up with will be linked back to other 
information about FOAF's broader notion of Project.

So while it is useful to have human readable strings (including 
translations) we also get simple relationships between independently 
defined vocabulary terms. RDFS basics here are sub-property, sub-class, 
range and domain. Without clear Web identifiers for vocabulary terms I 
believe this kind of distributed, collaborative approach becomes 
significantly harder. And I believe the experience of many in the Dublin 
Core metadata scene since the mid-90s backs this up...

cheers,

Dan

--
http://danbri.org/

For an example of browsing this kind of data structure btw see 
http://mqlx.com/~david/parallax/