danbri at danbri.org
Tue Aug 26 00:32:21 PDT 2008
Ian Hickson wrote:
> On Sat, 23 Aug 2008, Julian Reschke wrote:
>> Again you're confusing HTTP URLs with URIs.
>> Using URIs as identifiers allows lots of identification schemes other
>> than HTTP, in particular ones that are not based on DNS, or that use
>> DNS, but include a timestamp to address the concern of "losing" a domain
>> name (tag URI scheme).
> Sure, but most people use HTTP URIs anyway for namespaces.
> You can use any URI or any system you want with class="". The key is just
> to make it unique enough that clashes won't happen. In practice, names
> like "dc:title" are actually quite unique enough. But people can use much
> more unique ones if desired, all the way to full URIs.
I'm certainly in favour of making mainstream namespace names prettier.
But this design worries me, since it requires guesswork and heuristics
on the part of consumer code to figure out if class = "info.age" or
"museum.acquisitionDate" is intended as a URI or not. I'll air the worry
first, and then sketch an approach that makes me worry less and which
might have some of the characteristics that you value (such as not
depending on separate xmlns-like declarations of abbreviations, and not
being too ugly to look at).
You mentioned earlier that the RDFish practices around downloading and
interpreting schemas from the Web is news to you. I'll take up an action
to document some of the things we do in that area (eg. with SPARQL for
data merging), probably as a blog post.
Doing so would help as background on my next point, which is that making
it ambiguous whether a URI was declared is something that would need
careful security review, to ensure that data consumers are aware that
they should not expect property definitions found at the domain to be
consistent with the intended meaning of the markup.
Sketch of a scenario:
1. Alice deploys <class="creationDate.info">1979</class> to describe a
museum artifact. She calls it this because it marks up some information
about the creation date of some real world thing, and because
'creationDate' is already in use for describing page creation dates, in
the CSS library she's using.
2. Bob buys himself the Internet domain creationDate.info and wires up a
webserver to respond with an RDFa schema defining creationDate as a
sub-property of http://ecommerce.example.com/vocab#priceInEuros.
3. Charlie's code downloads Alice's markup, parses out the RDFa, and
noticing that creationDate.info seems to be de-referencable, so goes to
fetch the schema. For every triple "x creationDate y" in the document,
it also generates "x ecom:priceInEuros y" too. Perhaps Bob is selling
other museum artifact and wants to make Alice's look more expensive. Or
cheaper. Or to make her data look corrupted so that certain consumers
won't include her listing. Or maybe he wants to buy the item cheaply and
is probing for bugs in Alice's online shopping system.
In other words, the fact that Alice's markup only *appears* to be using
an Internet domain opens her up to risk that someone will go buy that
domain, and put a fake schema there which affects the likely
interpretation of her markup. This exposure is increased by our
uncertainty about ICANN strategy: we can't rely on the assumption that
there are only a tiny handful of TLDs. We can probably rely on them
being expensive at the top level, but not on having a hardcoded list
Icann has announced it will allow the creation of any new top-level
domains, albeit at a considerable cost.
As well as opening the door to an influx of new web addresses, Icann has
also said that it will allow Japanese, Chinese, Arabic and Cyrillic
characters to be used in registrations for the first time.
"It's a massive increase in the real estate of the internet. It will
allow groups, communities and businesses to express their identities
online," says Paul Twonmey, chief executive of Icann, speaking to the Times.
The RDF approach generally has been to make it very clear which chunks
of data contain URIs, and whether they can be relative or not. Other
markup systems have adopted a similar approach. These share the merit
that it makes such ambiguity much less of a problem (although there are
other attacks of course).
Lately I've been thinking that perhaps we can get something less ugly
than "http://" in the markup, yet specify rules that allow expansion to
http:// or https:// while keeping it clear whether the markup author
really intends to cite some domain/page as vocabulary documentation.
For example <p>I'm <span property="info.foaf/age">1979</p> years old</p>
(if FOAF was documented at http://foaf.info/age and we specified the
property attribute to use java-style names, and be declared relative to
the http:// scheme).
Or <p>I'm <span property="foaf/age">1979</p> years old</p>
(if I spend $100k at ICANN to buy a tld 'foaf')
or <p>I'm <span property="Com.xmlns.foaf.age">1979</p> years old</p>
(if I did some Apache config sysadmin on xmlns.com)
<p>I'm <span property="http://xmlns.com/foaf/0.1/age">1979</p> years old</p>
(if this was written out in fullest form, and if the 'age' property
existed yet in FOAF).
Such a design would open things to a marketplace in a real sense.
Parties who wanted nice short URLs for their properties could beg,
borrow or buy the appropriate domain names. The reverse-domain format
from Java would be a bit unusual for people used to the HTTP/browser
way. Perhaps property="age.foaf.xmlns.com" is equally readable?
The main cost here is that our prettification strategy is syntactically
indistinguishable from relative URIs. So we could only reliably use it
in attributes where we know we don't have a relative URI. For
properties, that seems fine. For the subjects and objects of statements
(ie. the things the properties apply to, or take as values) this would
require further thought.
Am I making any sense here? (regardless of whether you agree...)
More information about the whatwg