danbri at danbri.org
Sat Aug 23 08:16:47 PDT 2008
+cc: Paul Miller of Talis, who worked on the AHDS report mentioned below.
Henri Sivonen wrote:
> On Aug 23, 2008, at 02:43, Ben Adida wrote:
>> Why would you reinvent URIs in a way that they can't be de-referenced?
> To avoid having misleading affordances.
>> We want one parser, with variability and innovation in the vocabulary
>> definition only.
> Having one parser seems appealing compared to using the native
> mechanisms of each of HTML (<meta>, <link>), PDF (document information
> dictionary), PNG (tEXt chunk), etc. at first, but the vision that tools
> handle this all when you remix culture already requires the tools to
> support reading and writing the file formats they remix. When you
> already have format-native key-value read/write capability, the ability
> to build and mine RDF *graphs* becomes an additional burden.
It may not be obvious to those who haven't followed the history, or who
were at school at the time, but many of us did indeed invest a lot of
time and effort using name/value metadata structures in HTML. For
example, the Dublin Core project began with this technology base
beginning back in 1994/5, and the experience of metadata implementors
using it was one of the drivers for the creation of RDF. At the time
there no WHATWG to talk to, but the metadata community *did* talk to W3C.
Early on, the Dublin Core community found a lot of pressure for
feature-creep: new elements/terms to address the needs of various groups
who liked Dublin Core, but wanted some specifics added. This situation
gave rise to the 'Warwick Framework', defined in 1996 -
While there was consensus among the attendees that the concept of a
simple metadata set is useful, there were a number of fundamental
questions concerning the real utility of the Dublin Core as it was
defined at the end of the preceding workshop. Does the very loosely
defined Dublin Core really qualify as a "standard" that can be read and
processed programmatically? Should the number of the core elements be
expanded, to increase semantic richness, or reduced, to improve
ease-of-use by authors and/or web publishers? Will authors reliably
attach core metadata elements to their content? Should a core metadata
set be restricted to only descriptive cataloging information or should
it include other types of metadata such as administrative information,
linkage data, and the like? What is the relationship of the Dublin Core
to other developing work in metadata schemes, particularly in those
areas such as rights management information (terms and conditions)?
The workshop attendees concluded that the answer to these questions and
the route to progress on the metadata issue lay in the formulation a
higher-level context for the Dublin Core. This context should define how
the Core can be combined with other sets of metadata in a manner that
addresses the individual integrity, distinct audiences, and separate
realms of responsibility of these distinct metadata sets.
For an implementor report typical of the experience from this era, ie.
with name/value pairs, see the UK Arts and Humanities Data Service
document http://ahds.ac.uk/public/metadata/discovery.html which was
presented at the Oct'97 Helsinki workshop of the Dublin Core. At the
time I was involved with the ROADS internet cataloguing project and can
vouch that we hit a similar ceiling with attribute/value metadata.
From the appendix, http://ahds.ac.uk/public/metadata/disc_09.html ...
here are some of attribute/value structures they were forced to squash
their metadata records into.
Canterbury Archaeological Trust
+44 227 462062
Archaeology Data Service
...this expresses name, affiliation and contact information for a number
of contributors to a work. Another example describes several
contributors along with their roles (actor, director, etc). Again the
attribute/value representations contained numeric indexes
('DC.creator.role.9') to disambiguate which individual was being described.
>>> What barrier is there to building reusable vocabularies?
>> The follow-your-nose principle is missing, which is fairly essential for
>> discovering the meaning of vocabularies (partially automatically, not by
>> doing a Google search.)
> The partial automation with RDFa doesn't go very far. If a program
> automatically dereferences http://creativecommons.org/ns# and parses the
> result as RDFa, the program now has a human-readable string for each
> property--not exactly something that the program can act on further
> without human help.
Looking at this example,
<div id="license" about="#license" typeof="rdf:Property">
A <a rel="rdfs:domain" href="#Work">Work</a> <span
property="rdfs:label">has license</span> a <a rel="rdfs:range"
href="#License">License</a>. <br />
(a <a rel="rdfs:subPropertyOf"
href="http://purl.org/dc/terms/license">subproperty of dc:license</a>,
href="http://www.w3.org/1999/xhtml/vocab#license">the same as
Actually we can do a fair bit more than simply have human readable
strings. For example from the CC case, we've got a sub-property
relationship between cc:license and dc:license. RDF often (more often,
even) has relationships amongst classes too, and between classes and
properties. So for example, the SIOC vocabulary defines a class
sioc:User as a subclass of foaf:OnlineAccount; this is mechanically
evident from http://rdfs.org/sioc/ns# .... similarly,
http://trac.usefulinc.com/doap defines the DOAP vocabulary, schema here
- http://usefulinc.com/ns/doap# (webserver misconfigured re mimetype
right now). DOAP defines a class doap:Project that subclasses FOAF's
'Project' class, and which comes with a number of properties describing
opensource software projects. Again this is mechanically evident. As the
ccREL paper explains, and I can confirm w.r.t. FOAF, it is very useful
to allow related projects to define related classes and properties but
manage their evolution separately. It's a strategy for making
incremental progress without a single project/organization carrying the
burden of total coordination. Edd and friends in the DOAP project, for
example, can keep developing new properties for describing projects.
Elsewhere in the Web, we can be annotating the URI for 'foaf:Project'
eg. with translations.
http://svn.foaf-project.org/foaftown/foaf18n/foaf-kr.rdf tells us that a
Korean rdfs:label for http://xmlns.com/foaf/0.1/Project is "프로젝트 (어
떤 형태의 협업).". The DOAP list is busy figuring out how they might
want (within DOAP or elsewhere, depending on complexity) to model
customer relationships w.r.t. DOAP's notion of project, see
... but whatever they come up with will be linked back to other
information about FOAF's broader notion of Project.
So while it is useful to have human readable strings (including
translations) we also get simple relationships between independently
defined vocabulary terms. RDFS basics here are sub-property, sub-class,
range and domain. Without clear Web identifiers for vocabulary terms I
believe this kind of distributed, collaborative approach becomes
significantly harder. And I believe the experience of many in the Dublin
Core metadata scene since the mid-90s backs this up...
For an example of browsing this kind of data structure btw see
More information about the whatwg