hsivonen at iki.fi
Sun Aug 24 12:17:23 PDT 2008
On Aug 23, 2008, at 18:16, Dan Brickley wrote:
> It may not be obvious to those who haven't followed the history, or
> who were at school at the time, but many of us did indeed invest a
> lot of time and effort using name/value metadata structures in HTML.
> For example, the Dublin Core project began with this technology base
> beginning back in 1994/5, and the experience of metadata
> implementors using it was one of the drivers for the creation of
> RDF. At the time there no WHATWG to talk to, but the metadata
> community *did* talk to W3C.
I don't doubt that there's metadata that doesn't fit into name-value
pairs nicely. However, the title of the work, the license for the work
as a whole, attribution wish (a natural-language string with
potentially multiple names, commas and "and"s) and a single
attribution URL all fit into name-value pairs, so for CC licensing, a
graph seems like an overkill.
Of course, there's the issue of conveying that data for each subwork
of a larger work that remixes many works. But can we expect John Q.
Public to convey that data so that there's something to be DRY with in
a case where the subworks aren't independent files that could carry
their own metadata? That is, if the larger work remixes multiple
photos in a single Theora video stream or into one large JPEG file,
can we really expect tools (or John Q. Public manually) to be able to
address into the larger work in such a way that any syntax other than
natural language identifies which subwork had which license and
> Does the very loosely defined Dublin Core really qualify as a
> "standard" that can be read and processed programmatically?
Thanks for the pointers to history. I wasn't aware that the Dublin
Core community had itself documented this fundamental problem with
Dublin Core so early on. I have ran into this problem myself when in a
past project I inherited a metadata spec that my predecessors had
modeled after Dublin Core without having experience of developing
> +44 227 462062
In this particular instance, it seems to me that the main problem
isn't that the metadata doesn't fit into key-value pairs but that the
metadata that doesn't probably doesn't *really* need to be recorded as
metadata. If you are creating a document search engine, does the user
ever want to search documents by the authors' phone numbers? If the
user searches by other criteria, does the phone number *really* need
to be extractable for display in search results?
I realize it sounds offensive to suggest that someone doesn't need the
metadata they say they need, but when I worked (briefly) on metadata
for long-term preservation of digital files in the National Archives
of Finland, it became apparent pretty quickly that at least some
metadata specs aren't driven by considering what absolutely *must* be
there to satisfy realistic use cases but by modeling what *could* be
said about the domain and inventing fields for everything *just in
> Looking at this example,
> <div id="license" about="#license" typeof="rdf:Property">
> A <a rel="rdfs:domain" href="#Work">Work</a> <span
> property="rdfs:label">has license</span> a <a rel="rdfs:range"
> href="#License">License</a>. <br />
> (a <a rel="rdfs:subPropertyOf" href="http://purl.org/dc/terms/license
> ">subproperty of dc:license</a>, <a rel="owl:sameAs" href="http://www.w3.org/1999/xhtml/vocab#license
> ">the same as xhtml:license</a>)
> Actually we can do a fair bit more than simply have human readable
> strings. For example from the CC case, we've got a sub-property
> relationship between cc:license and dc:license.
> So while it is useful to have human readable strings (including
> translations) we also get simple relationships between independently
> defined vocabulary terms.
And in www-archive:
On Aug 23, 2008, at 23:59, Ben Adida wrote:
> Henri Sivonen wrote:
>> Also, in this case, the prefix cc is actually more persistent than
>> URI, since Creative Commons has changed the namespace URI of its RDF
>> vocabulary without changing the canonical prefix (from
>> http://web.resource.org/cc/ to http://creativecommons.org/ns#).
> Highly misleading statement, since we are also creating equivalences
> between the old and new namespace. That's the power of RDF.
How common is it that user-facing applications that use RDF metadata
dereference namespace URIs, load declarations of equivalence or
subclass relationships between properties and successfully map
vocabularies created after the creation of the application to the
vocabulary understood by the application? Are there known instances of
applications that were programmed to process http://web.resource.org/
cc/ metadata in a XML-wise correct way (i.e. not using regular
expressions matching on "cc:") and that automatically processed http://creativecommons.org/ns#
metadata right by autodiscovering the equivalence? (These are not
rhetorical questions. I really don't know and am curious. My intuiting
suggests that this wouldn't be a common occurrence.)
Where some see "the power of RDF", others see "the RDF tax". There's a
tradeoff between making the common case simple and making things
powerful for the less common and more complex cases. The simple case
is finding out what license a document is under. Compared to looking
up a string value by unstructured opaque string key from within the
file, it's very different to extract an RDF graph from a file,
defererence all namespace URIs using a network connection relying on
hosts being reachable, load data describing equivalence and subclass
relations--perhaps recursively--and simplify until the application
sees a value connected to a property it is programmed to know about.
hsivonen at iki.fi
More information about the whatwg