[whatwg] Link rot is not dangerous

Laurens Holst laurens.nospam at grauw.nl
Sat May 16 16:58:58 PDT 2009


Tab Atkins Jr. schreef:
>> Ho, ho, you’re making a big leap there! By me explaining that dereferencible
>> URIs are not needed to make RDF work on a core level, which makes RDF
>> robust, do not jump to the conclusion that it is of no benefit! URIs are
>> there for the benefit of linking, and help discoverability a lot (just like
>> HTML hyperlinks do). Spidering the semantic web in a follow-your-nose style
>> is effective. Incidentally, if an ontology disappears from its original
>> address, this kind of spidering will likely lead you to a copy thereof
>> stored elsewhere. For example on a different spider which has the triples
>> cached.
>>     
>
> You had just stated in the previous email, however, that few (if any)
> major consumers of RDFa *use* what is located on the far end of the
> URI.  If they're not even paying attention to it, where is the value
> in it?
>   

I said that the ontologies were not used by many RDF consumers. This is 
because they can be computationally expensive, especially for large data 
sets, not because they are useless.

I think the most clear way I can put this is by comparison:

Your argument is like arguing against XML or JSON Schemas, concluding 
that because they are externally referenced and not used by most XML or 
JSON applications, they are useless, and in fact that XML and JSON 
themselves are useless. This is clearly false; removing a reference to a 
schema from a document, or a document not having a schema, does not make 
the document itself useless, nor the document format it is expressed in.

Although RDF Schema and OWL are definitely part of the ‘RDF ecosystem’, 
they are built on top of the base RDF framework and they are not in 
themselves required for RDF to function. However the schema does provide 
a useful description about the document structures and has the ability 
to express certain semantics, and is thus a worthy technology in its own 
right.

> I don't really understand the 'discoverability' argument here, at
> least in the context of it being similar to HTML hyperlinks.
> Hyperlinks are useful for people because they make it simple to
> navigate to a new page.  You just click and it works, no need to
> copypasta the address into a new browser window.
>   

By what means the user dereferences the link is not relevant. The fact 
that an URI is there, identifying a unique location on the world wide 
web, and thus contributing to the web of linked documents that we call 
the World Wide Web. Without links and URIs, there would be no ‘web’. 
There would be a big set of networked yet isolated computers that all 
live in their own walled garden.

Links provide discoverability of data provided elsewhere, by indicating 
a location. Users can find other documents because of this. Search 
engines like Google can spider the web based on this.

The Web of Linked Data is Tim Berners-Lee’s vision of a WWW for data.

> I'm also not sure how a rotted link helps you compare vocabularies
> with other spiders, which in a hypothetical world you are
> communicating with (at this point we're *far* into theory, not
> practice).  Any uniquifier would allow you to compare things in the
> same way, no?
>   

Just a simple rdfs:seeAlso statement referencing it in one single place 
will allow a spider to ‘follow its own nose’ and find the triples of the 
ontology in the republished location. This republication can be 
anywhere, a new ontology location, or a copy cached by another spider 
that republishes the triples it harvests on the web (such as archive.org 
[1]).

I agree we’re getting far into the theory-not-practice realm, which is 
why Shelley is right in saying that in practice vocabularies are served 
from a location that is well cared for, e.g. using services like purl to 
provide permanent URLs, or having a solid organisational backing, and 
Philip Taylor’s list [2] does not do much to discredit this.

[Side note: To point out some flaws in Philip’s list, many of the sites 
in his ‘404’ and ‘not responding’ list are experimental URLs. 
Additionally, the list fails to list usage frequency. Finally, it does 
not (and can not, obviously) list whether there was any RDF Schema at 
those locations in the first place. Because, as I explained before, I 
can make up the following RDF triple right here on the spot, and there 
would be nothing wrong with it:

_:a rdf:type <http://grauw.nl/rdf#Game>

The type referenced in this triple’s subject has no ontology at this 
location. The fact that it is a type is inferred by it being referenced 
through rdf:type, and that is enough. There is no requirement that this 
type resolves into a document containing RDF Schema triples. A creative 
example of this on the list is “java:java.util.Date”.]

>> You are now only considering the ontologies, that is, types and properties.
>> You’re forgetting (or ignoring) that in RDF, objects are also named with
>> URIs so that data at other locations can refer to it. You know, that ‘web of
>> linked data’ people refer to, core principle of RDF. No ‘simple’ scheme
>> based on what Ian proposed can provide a sufficient level of uniqueness for
>> that. URIs are the best and most natural fit for use as web-scale
>> identifiers.
>>     
>
> Define 'sufficient', as used here.  I believe that this is an area
> where absolute uniqueness is not a requirement.  Worst case, you get a
> little bit of data pollution with weird triples being produced by
> badly-written pages.  Perhaps your browser offers to add an event to
> your calendar when no event shows up on the page, or a fraction of a
> search engine's microdata collection is spurious.  Neither of these
> are big deals.
>
> That being said, I agree that URIs provide a very convenient source of
> uniqueness.  Ian's microdata allows them to be used either in normal
> form or in reverse-domain form; either way provides the necessary
> uniqueness.
>   

I am talking about individual triples for MANY pieces of data here. Take 
for example the identifier of the band Coldplay on Zitgist:

  <http://zitgist.com/music/artist/cc197bad-dc9c-440d-a5b5-d52ba2e14234>

A reverse domain version of such an identifier would look like this:

  com.zitgist.music.artist.cc197bad-dc9c-440d-a5b5-d52ba2e14234

How exactly is this really shorter, or different other than ‘for the 
sake of doing it different’ and failing to build upon the well-known 
concept of URIs? Note that you can browse to the above URL in your 
browser of choice and view the data.

Also, creating a framework to configure DNS servers to resolve to useful 
documents for these domains will be pretty tedious. If you ask me, Hixie 
using the ‘reverse DNS’ notation in his Microdata proposal is just a 
trick to pretend he is using something that is different from what RDF 
uses. If the domain were not ‘reversed’, people would see the similarity 
with URIs too easily.

Note that in RDF, if you do not need this global identifying, you can 
easily create anonymous nodes called blank nodes (‘bnodes’). Also, URIs 
can be written in relative form, making a triple statement often as 
simple as about="#laurens".

Example of some completely anonymous statements using bnodes (aside from 
using basic RDF building blocks):

_:a rdf:type _:Game
_:Game rdf:type rdfs:Class
_:Game rdfs:label "Game"

So as you can see, RDF also caters for the use cases you mentioned above 
where uniqueness is not required. In RDFa, you achieve this by using a 
‘typeof’ attribute without corresponding ‘about’ attribute.

If you reuse properties from widely-used vocabularies though (such as 
FOAF, or Dublin Core), it seems obvious that they need to be identified 
globally to avoid namespace conflicts. Instead of long 
‘org.foaf-project.Person’ identifiers as Hixie proposes, RDF uses URIs 
and most RDF serialisations go for a (shorter) prefix-based 
‘foaf:Person’ solution, which IMO is pretty user-friendly.

~Laurens

[1] http://web.archive.org/web/*/http://www.grauw.nl/foaf.rdf
[2] http://philip.html5.org/data/rdf-namespace-status.txt

-- 
Note: New email address! Please update your address book.

~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~
Laurens Holst, student, Utrecht University, the Netherlands
Website: www.grauw.nl. Backbase employee; www.backbase.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: laurens_nospam.vcf
Type: text/x-vcard
Size: 111 bytes
Desc: not available
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20090517/4c372592/attachment.vcf>


More information about the whatwg mailing list