ian at hixie.ch
Tue Aug 26 01:42:58 PDT 2008
On Tue, 26 Aug 2008, Dan Brickley wrote:
> You mentioned earlier that the RDFish practices around downloading and
> interpreting schemas from the Web is news to you. I'll take up an action
> to document some of the things we do in that area (eg. with SPARQL for
> data merging), probably as a blog post.
> Doing so would help as background on my next point, which is that making
> it ambiguous whether a URI was declared is something that would need
> careful security review, to ensure that data consumers are aware that
> they should not expect property definitions found at the domain to be
> consistent with the intended meaning of the markup.
Yes, it would be very helpful to have this background. As I mentioned in
earlier e-mails, I'm trying to understand the core problem being solved
here, and I don't yet have a good enough understanding of that to really
be able to evaluate most of the rest of the discussion. (I haven't yet
studied Manu's e-mail carefully, but I will do so.)
> Sketch of a scenario:
> 1. Alice deploys <class="creationDate.info">1979</class> to describe a
> museum artifact. She calls it this because it marks up some information
> about the creation date of some real world thing, and because
> 'creationDate' is already in use for describing page creation dates, in
> the CSS library she's using.
> 2. Bob buys himself the Internet domain creationDate.info and wires up a
> webserver to respond with an RDFa schema defining creationDate as a
> sub-property of http://ecommerce.example.com/vocab#priceInEuros.
I have no idea what this means or why anyone would want to do that, but
> 3. Charlie's code downloads Alice's markup, parses out the RDFa, and
> noticing that creationDate.info seems to be de-referencable, so goes to
> fetch the schema.
Step 3 seems totally crazy on several levels, but let's continue:
> For every triple "x creationDate y" in the document, it also generates
> "x ecom:priceInEuros y" too. Perhaps Bob is selling other museum
> artifact and wants to make Alice's look more expensive. Or cheaper. Or
> to make her data look corrupted so that certain consumers won't include
> her listing. Or maybe he wants to buy the item cheaply and is probing
> for bugs in Alice's online shopping system.
Why would Charlie ever depend on Bob for anything to do with Alice's site?
That seems like a disaster waiting to happen.
For that matter, why would Charlie trust Alice _or_ Bob? Bob could easily
just lie on his own prices, or, if Charlie is busy downloading things from
Bob's site, could just feed up bogus data about Alice directly, without
having to go through the indirection layer of defining what Alice is doing
to mean something when it doesn't really mean anything. Similarly, Alice
could just include totally bogus data on her site, about either her own
stuff or about Bob's.
If Charlie wants to work with Alice's site, he should agree with Alice
about what vocabularies they're going to use, and then only use that.
That's how standards work, you agree on common vocabularies and then use
those for interoperability. For example, everyone agrees on HTML's
vocabulary as a way to describe documents (and now applications).
Anyway. I assume that I'm missing something that is part of the problem
that is being solved, so maybe this will make more sense after I've read
> In other words, the fact that Alice's markup only *appears* to be using
> an Internet domain opens her up to risk that someone will go buy that
> domain, and put a fake schema there which affects the likely
> interpretation of her markup.
This same problem exists with URIs. What happens if everyone is pointing
to w3.org for their definition of "price", and then someone hacks the W3C
servers and suddenly the whole Web's meaning changes for whoever is using
this magic "follow your nose" principle?
Anyway, I don't think you should ever dereference something that isn't an
actual URI. That's what URIs are for.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg