[whatwg] Link rot is not dangerous
Laurens Holst
laurens.nospam at grauw.nl
Sat May 16 16:58:58 PDT 2009
Tab Atkins Jr. schreef:
>> Ho, ho, you’re making a big leap there! By me explaining that dereferencible
>> URIs are not needed to make RDF work on a core level, which makes RDF
>> robust, do not jump to the conclusion that it is of no benefit! URIs are
>> there for the benefit of linking, and help discoverability a lot (just like
>> HTML hyperlinks do). Spidering the semantic web in a follow-your-nose style
>> is effective. Incidentally, if an ontology disappears from its original
>> address, this kind of spidering will likely lead you to a copy thereof
>> stored elsewhere. For example on a different spider which has the triples
>> cached.
>>
>
> You had just stated in the previous email, however, that few (if any)
> major consumers of RDFa *use* what is located on the far end of the
> URI. If they're not even paying attention to it, where is the value
> in it?
>
I said that the ontologies were not used by many RDF consumers. This is
because they can be computationally expensive, especially for large data
sets, not because they are useless.
I think the most clear way I can put this is by comparison:
Your argument is like arguing against XML or JSON Schemas, concluding
that because they are externally referenced and not used by most XML or
JSON applications, they are useless, and in fact that XML and JSON
themselves are useless. This is clearly false; removing a reference to a
schema from a document, or a document not having a schema, does not make
the document itself useless, nor the document format it is expressed in.
Although RDF Schema and OWL are definitely part of the ‘RDF ecosystem’,
they are built on top of the base RDF framework and they are not in
themselves required for RDF to function. However the schema does provide
a useful description about the document structures and has the ability
to express certain semantics, and is thus a worthy technology in its own
right.
> I don't really understand the 'discoverability' argument here, at
> least in the context of it being similar to HTML hyperlinks.
> Hyperlinks are useful for people because they make it simple to
> navigate to a new page. You just click and it works, no need to
> copypasta the address into a new browser window.
>
By what means the user dereferences the link is not relevant. The fact
that an URI is there, identifying a unique location on the world wide
web, and thus contributing to the web of linked documents that we call
the World Wide Web. Without links and URIs, there would be no ‘web’.
There would be a big set of networked yet isolated computers that all
live in their own walled garden.
Links provide discoverability of data provided elsewhere, by indicating
a location. Users can find other documents because of this. Search
engines like Google can spider the web based on this.
The Web of Linked Data is Tim Berners-Lee’s vision of a WWW for data.
> I'm also not sure how a rotted link helps you compare vocabularies
> with other spiders, which in a hypothetical world you are
> communicating with (at this point we're *far* into theory, not
> practice). Any uniquifier would allow you to compare things in the
> same way, no?
>
Just a simple rdfs:seeAlso statement referencing it in one single place
will allow a spider to ‘follow its own nose’ and find the triples of the
ontology in the republished location. This republication can be
anywhere, a new ontology location, or a copy cached by another spider
that republishes the triples it harvests on the web (such as archive.org
[1]).
I agree we’re getting far into the theory-not-practice realm, which is
why Shelley is right in saying that in practice vocabularies are served
from a location that is well cared for, e.g. using services like purl to
provide permanent URLs, or having a solid organisational backing, and
Philip Taylor’s list [2] does not do much to discredit this.
[Side note: To point out some flaws in Philip’s list, many of the sites
in his ‘404’ and ‘not responding’ list are experimental URLs.
Additionally, the list fails to list usage frequency. Finally, it does
not (and can not, obviously) list whether there was any RDF Schema at
those locations in the first place. Because, as I explained before, I
can make up the following RDF triple right here on the spot, and there
would be nothing wrong with it:
_:a rdf:type <http://grauw.nl/rdf#Game>
The type referenced in this triple’s subject has no ontology at this
location. The fact that it is a type is inferred by it being referenced
through rdf:type, and that is enough. There is no requirement that this
type resolves into a document containing RDF Schema triples. A creative
example of this on the list is “java:java.util.Date”.]
>> You are now only considering the ontologies, that is, types and properties.
>> You’re forgetting (or ignoring) that in RDF, objects are also named with
>> URIs so that data at other locations can refer to it. You know, that ‘web of
>> linked data’ people refer to, core principle of RDF. No ‘simple’ scheme
>> based on what Ian proposed can provide a sufficient level of uniqueness for
>> that. URIs are the best and most natural fit for use as web-scale
>> identifiers.
>>
>
> Define 'sufficient', as used here. I believe that this is an area
> where absolute uniqueness is not a requirement. Worst case, you get a
> little bit of data pollution with weird triples being produced by
> badly-written pages. Perhaps your browser offers to add an event to
> your calendar when no event shows up on the page, or a fraction of a
> search engine's microdata collection is spurious. Neither of these
> are big deals.
>
> That being said, I agree that URIs provide a very convenient source of
> uniqueness. Ian's microdata allows them to be used either in normal
> form or in reverse-domain form; either way provides the necessary
> uniqueness.
>
I am talking about individual triples for MANY pieces of data here. Take
for example the identifier of the band Coldplay on Zitgist:
<http://zitgist.com/music/artist/cc197bad-dc9c-440d-a5b5-d52ba2e14234>
A reverse domain version of such an identifier would look like this:
com.zitgist.music.artist.cc197bad-dc9c-440d-a5b5-d52ba2e14234
How exactly is this really shorter, or different other than ‘for the
sake of doing it different’ and failing to build upon the well-known
concept of URIs? Note that you can browse to the above URL in your
browser of choice and view the data.
Also, creating a framework to configure DNS servers to resolve to useful
documents for these domains will be pretty tedious. If you ask me, Hixie
using the ‘reverse DNS’ notation in his Microdata proposal is just a
trick to pretend he is using something that is different from what RDF
uses. If the domain were not ‘reversed’, people would see the similarity
with URIs too easily.
Note that in RDF, if you do not need this global identifying, you can
easily create anonymous nodes called blank nodes (‘bnodes’). Also, URIs
can be written in relative form, making a triple statement often as
simple as about="#laurens".
Example of some completely anonymous statements using bnodes (aside from
using basic RDF building blocks):
_:a rdf:type _:Game
_:Game rdf:type rdfs:Class
_:Game rdfs:label "Game"
So as you can see, RDF also caters for the use cases you mentioned above
where uniqueness is not required. In RDFa, you achieve this by using a
‘typeof’ attribute without corresponding ‘about’ attribute.
If you reuse properties from widely-used vocabularies though (such as
FOAF, or Dublin Core), it seems obvious that they need to be identified
globally to avoid namespace conflicts. Instead of long
‘org.foaf-project.Person’ identifiers as Hixie proposes, RDF uses URIs
and most RDF serialisations go for a (shorter) prefix-based
‘foaf:Person’ solution, which IMO is pretty user-friendly.
~Laurens
[1] http://web.archive.org/web/*/http://www.grauw.nl/foaf.rdf
[2] http://philip.html5.org/data/rdf-namespace-status.txt
--
Note: New email address! Please update your address book.
~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~
Laurens Holst, student, Utrecht University, the Netherlands
Website: www.grauw.nl. Backbase employee; www.backbase.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: laurens_nospam.vcf
Type: text/x-vcard
Size: 111 bytes
Desc: not available
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20090517/4c372592/attachment-0002.vcf>
More information about the whatwg
mailing list