[whatwg] Link rot is not dangerous

Shelley Powers shelleyp at burningbird.net
Fri May 15 10:25:28 PDT 2009


Dan Brickley wrote:
> On 15/5/09 18:20, Manu Sporny wrote:
>> Kristof Zelechovski wrote:
>>> Therefore, link rot is a bigger problem for CURIE
>>> prefixes than for links.
>>
>> There have been a number of people now that have gone to great lengths
>> to outline how awful link rot is for CURIEs and the semantic web in
>> general. This is a flawed conclusion, based on the assumption that there
>> must be a single vocabulary document in existence, for all time, at one
>> location. This has also lead to a false requirement that all
>> vocabularies should be centralized.
>>
>> Here's the fear:
>>
>> If a vocabulary document disappears for any reason, then the meaning of
>> the vocabulary is lost and all triples depending on the lost vocabulary
>> become useless.
>>
>> That fear ignores the fact that we have a highly available document
>> store available to us (the Web). Not only that, but these vocabularies
>> will be cached (at Google, at Yahoo, at The Wayback Machine, etc.).
>>
>> IF a vocabulary document disappears, which is highly unlikely for
>> popular vocabularies - imagine FOAF disappearing overnight, then there
>> are alternative mechanisms to extract meaning from the triples that will
>> be left on the web.
>>
>> Here are just two of the possible solutions to the problem outlined:
>>
>> - The vocabulary is restored at another URL using a cached copy of the
>> vocabulary. The site owner of the original vocabulary either re-uses the
>> vocabulary, or re-directs the vocabulary page to another domain
>> (somebody that will ensure the vocabulary continues to be provided -
>> somebody like the W3C).
>> - RDFa parsers can be given an override list of legacy vocabularies that
>> will be loaded from disk (from a cached copy). If a cached copy of the
>> vocabulary cannot be found, it can be re-created from scratch if 
>> necessary.
>>
>> The argument that link rot would cause massive damage to the semantic
>> web is just not true. Even if there is minor damage caused, it is fairly
>> easy to recover from it, as outlined above.
>
> A few other points:
>
> 1. It's for the community of vocabulary-creators to help each other 
> out w.r.t. hosting/publishing these: I just nudged a friend to put 
> another 5 years on the DNS rental for a popular namespace. I think we 
> should put a bit more structure around these kinds of habit, so that 
> popular namespaces won't drop off the Web through accident.
>
> 2. digitally signing the schemas will become part of the story, I'm 
> sure. While it's a bit fiddly, there are advantages to having other 
> mechanisms beyond URI de-referencing for knowing where a schema came from
>
> 3. Parties worried about external dependencies when using namespaces 
> can always indirect through their own namespace, whose schema document 
> can declare subclass/subproperty relations to other URIs
>
> cheers
>
> Dan
>
>
>
>
The most important point to take from all of this, though, is that link 
rot within the RDF world is an extremely rare and unlikely occurrence. 
I've been working with RDF for close to a decade, and link rot has never 
been an issue.

One of the very first uses of RDF, in RSS 1.0, for feeds, is still in 
existence, still viable. You don't have to take my word, check it out 
yourselves:

http://purl.org/rss/1.0/

Even if, and I want to strongly emphasize "if" link rot does occur, both 
Manu and Dan have demonstrated multiple ways of ensuring that no meaning 
is lost, and nothing is broken. However, I hope that people are open 
enough to take away from their discussions that  they are trying to 
treat this concern respectfully, and trying to demonstrate that there's 
more than one solution. Not that this forms a "proof" that "Oh my god, 
if we use RDF, we're doomed!"

Also don't lose sight that this is really no more serious an issue than, 
say, a company originating "com.sun.*" being purchased by another 
company, named "com.oracle.*".  And you can't say, "Well that's not the 
same", because it is.

The only "safe" bet is to designate some central authority and give them 
power over every possible name. Then we run the massive risk of this 
system failing (and this applies to microdata's reverse DNS as well as 
RDF's URI), or it being taken over by an entity that sees such a data 
store as a way to make a great profit. We also defeat the very principle 
on which semantic data on the web abides, and that's true whether you're 
support microdata or RDF.

Shelley






More information about the whatwg mailing list