[whatwg] Link rot is not dangerous

Sat May 16 11:04:12 PDT 2009

2009/5/16 Laurens Holst <laurens.nospam at grauw.nl>:
> Tab Atkins Jr. schreef:
>> Once you remove discovery as a strong requirement, then you remove the
>> need for large urls, and that removes the need for CURIEs, or any
>> other form of prefixing.  You still want to uniquify your identifiers
>> to avoid accidental clashes, but that's not that hard, nor is it
>> absolutely necessary.  The system can be robust and usable even with a
>> bit of potential ambiguity if small authors design their private
>> vocabs badly.  As a bonus, everything gets simpler.  Essentially it
>> devolves into something relatively close to Ian's microdata proposal,
>> perhaps with datatype added in (though I do question how necessary
>> that is, given a half-intelligent parser can recognize things as
>> numbers or dates).
>
> Ho, ho, you’re making a big leap there! By me explaining that dereferencible
> URIs are not needed to make RDF work on a core level, which makes RDF
> robust, do not jump to the conclusion that it is of no benefit! URIs are
> there for the benefit of linking, and help discoverability a lot (just like
> HTML hyperlinks do). Spidering the semantic web in a follow-your-nose style
> is effective. Incidentally, if an ontology disappears from its original
> address, this kind of spidering will likely lead you to a copy thereof
> stored elsewhere. For example on a different spider which has the triples
> cached.

You had just stated in the previous email, however, that few (if any)
major consumers of RDFa *use* what is located on the far end of the
URI.  If they're not even paying attention to it, where is the value
in it?

I don't really understand the 'discoverability' argument here, at
least in the context of it being similar to HTML hyperlinks.
Hyperlinks are useful for people because they make it simple to
navigate to a new page.  You just click and it works, no need to
copypasta the address into a new browser window.

I'm also not sure how a rotted link helps you compare vocabularies
with other spiders, which in a hypothetical world you are
communicating with (at this point we're *far* into theory, not
practice).  Any uniquifier would allow you to compare things in the
same way, no?

> You are now only considering the ontologies, that is, types and properties.
> You’re forgetting (or ignoring) that in RDF, objects are also named with
> URIs so that data at other locations can refer to it. You know, that ‘web of
> linked data’ people refer to, core principle of RDF. No ‘simple’ scheme
> based on what Ian proposed can provide a sufficient level of uniqueness for
> that. URIs are the best and most natural fit for use as web-scale
> identifiers.

Define 'sufficient', as used here.  I believe that this is an area
where absolute uniqueness is not a requirement.  Worst case, you get a
little bit of data pollution with weird triples being produced by
badly-written pages.  Perhaps your browser offers to add an event to
your calendar when no event shows up on the page, or a fraction of a
search engine's microdata collection is spurious.  Neither of these
are big deals.

That being said, I agree that URIs provide a very convenient source of
uniqueness.  Ian's microdata allows them to be used either in normal
form or in reverse-domain form; either way provides the necessary
uniqueness.

> And then there is of course also the thing that there is already an existing
> framework, which has already been here for a long time, has had a lot of
> clever people work on it and is gaining in popularity, and here we have
> ‘HTML5’ wanting to reinvent the wheel and making an entirely new framework
> ‘just for them’. You’d think that of all places, in a standards body people
> would be compelled to adopt existing standards :).

There are compelling reasons to make any proposal *compatible* with
RDF at the least.  Ian's microdata does this, though not
perfectly/completely.  I've said in another thread that I dislike
*all* of the inline microdata proposals.  RDFa sucks, Ian's microdata
sucks, they all suck.  They force structure completely inline, which
solves what I feel is a minority issue (carrying microdata while
copypasting sourcecode) while introducing several larger downsides
(carrying possibly *incorrect* microdata while copypasting source,
duplication of meta structure when there is a regular page structure
that can obviate this, etc.).  It's the exact same problems that
inline event handlers or inline @style attributes have.  I think Ian
is trying to limit the suckiness by at least making it as simple as
possible to write.  It's probably half as difficult or less to write
properly, while solving 90% or more of the cases that RDFa does.  This
is an effort that I'm in favor of.

I won't be using RDF in my pages at all unless I know that I can use
something like RDF-EASE or CRDF; they allow me to just write my page
as normal, then specify what the page's data means in a separate file.
 Plus, honestly, CRDF's inline syntax seems just as expressive as
microdata and RDFa, while being easier to write than either of them.
Frex, taking an example from Ian's proposal (I know that some of the
names are slightly out of date now):

<div item="org.w3.spec">
  <h1 property="org.w3.name">HTML5</h1>
  <a property="org.w3.url" href="/TR/html5">Current Version</a>
  <div property="org.w3.status" item>
    <p>Level: <span property="org.w3.level">WD</span>
    <p>Date: <span property="org.w3.pubdate">03/02/2009</span>
    <p>Deadline: <span property="org.w3.deadline">02/03/2009</span>
  </div>
  <p>Working Group: <span property="org.w3.wg">HTMLWG</span>
</div>

This can be written using inline CRDF as:
<script type="text/crdf">
@namespace w3 http://www.w3.org/;
</script>
<div crdf="@|subject; @|typeof: w3|item">
  <h1 crdf="w3|name">HTML5</h1>
  <a crdf="w3|url:attr(href)" href="/TR/html5">Current Version</a>
  <div crdf="@|subject; @|typeof: w3|status">
    <p>Level: <span crdf="w3|level">WD</span></p>
    <p>Date: <span crdf="w3|pubdate">03/02/2009</p>
    <p>Deadline: <span crdf="w3|deadline">02/03/2009</p>
  </div>
  <p>Working Group: <span crdf="w3|wg">HTMLWG</span></p>
</div>

I believe this communicates everything necessary for an RDF
serialization of the content, but in a somewhat more concise manner
than Ian's microdata and in a *much* more easily understandable manner
than RDFa.

And for fun, the same thing in standard CRDF:
<div class="item">
  <h1>HTML5</h1>
  <a href="/TR/html5">Current Version</a>
  <div class="status">
    <p class="level">Level: <span>WD</span></p>
    <p class="pubdate">Date: <span>03/02/2009</p>
    <p class="deadline">Deadline: <span>02/03/2009</p>
  </div>
  <p class="wg">Working Group: <span>HTMLWG</span></p>
</div>
<script type="text/crdf">
@namespace w3 http://www.w3.org/;
.item {
  @|subject;
  @|typeof: w3|item;
}

.item h1 {
  w3|name;
}

.item h1 + a {
  w3|url: attr(href);
}

.item .status {
  @|subject;
  @|typeof: w3|status;
}

.item .status .level span {
  w3|level;
}

.item .status .pubdate span {
  w3|pubdate;
}

.item .status .deadline span {
  w3|deadline;
}

.item .wg span {
  w3|wg;
}
</script>

Obviously quite a bit longer in screen inches, but you see the same
thing when comparing a single instance of inline @style to the
equivalent CSS.  The code *looks* clean, though - it looks like HTML
*should* look.  This especially shines when you realize that it allows
you to extract triples from multiple items on a single page and across
an entire site just by adding a few classes (possibly useful for
styling anyway) and then including this crdf file.  (It would normally
be <link>ed in, rather than written in a <script> tag.)  If we're sure
that we can rely on a very specific structure, we don't even need most
of the classes - we can just use positional selectors instead.

~TJ