[whatwg] RDFa

Ian Hickson ian at hixie.ch
Tue Aug 26 01:42:58 PDT 2008

On Tue, 26 Aug 2008, Dan Brickley wrote:
> You mentioned earlier that the RDFish practices around downloading and 
> interpreting schemas from the Web is news to you. I'll take up an action 
> to document some of the things we do in that area (eg. with SPARQL for 
> data merging), probably as a blog post.
> Doing so would help as background on my next point, which is that making 
> it ambiguous whether a URI was declared is something that would need 
> careful security review, to ensure that data consumers are aware that 
> they should not expect property definitions found at the domain to be 
> consistent with the intended meaning of the markup.

Yes, it would be very helpful to have this background. As I mentioned in 
earlier e-mails, I'm trying to understand the core problem being solved 
here, and I don't yet have a good enough understanding of that to really 
be able to evaluate most of the rest of the discussion. (I haven't yet 
studied Manu's e-mail carefully, but I will do so.)

> Sketch of a scenario:
> 1. Alice deploys <class="creationDate.info">1979</class> to describe a 
> museum artifact. She calls it this because it marks up some information 
> about the creation date of some real world thing, and because 
> 'creationDate' is already in use for describing page creation dates, in 
> the CSS library she's using.
> 2. Bob buys himself the Internet domain creationDate.info and wires up a 
> webserver to respond with an RDFa schema defining creationDate as a 
> sub-property of http://ecommerce.example.com/vocab#priceInEuros.

I have no idea what this means or why anyone would want to do that, but 
let's continue:

> 3. Charlie's code downloads Alice's markup, parses out the RDFa, and 
> noticing that creationDate.info seems to be de-referencable, so goes to 
> fetch the schema.

Step 3 seems totally crazy on several levels, but let's continue:

> For every triple "x creationDate y" in the document, it also generates 
> "x ecom:priceInEuros y" too. Perhaps Bob is selling other museum 
> artifact and wants to make Alice's look more expensive. Or cheaper. Or 
> to make her data look corrupted so that certain consumers won't include 
> her listing. Or maybe he wants to buy the item cheaply and is probing 
> for bugs in Alice's online shopping system.

Why would Charlie ever depend on Bob for anything to do with Alice's site? 
That seems like a disaster waiting to happen.

For that matter, why would Charlie trust Alice _or_ Bob? Bob could easily 
just lie on his own prices, or, if Charlie is busy downloading things from 
Bob's site, could just feed up bogus data about Alice directly, without 
having to go through the indirection layer of defining what Alice is doing 
to mean something when it doesn't really mean anything. Similarly, Alice 
could just include totally bogus data on her site, about either her own 
stuff or about Bob's.

If Charlie wants to work with Alice's site, he should agree with Alice 
about what vocabularies they're going to use, and then only use that. 
That's how standards work, you agree on common vocabularies and then use 
those for interoperability. For example, everyone agrees on HTML's 
vocabulary as a way to describe documents (and now applications).

Anyway. I assume that I'm missing something that is part of the problem 
that is being solved, so maybe this will make more sense after I've read 
Manu's e-mail.

> In other words, the fact that Alice's markup only *appears* to be using 
> an Internet domain opens her up to risk that someone will go buy that 
> domain, and put a fake schema there which affects the likely 
> interpretation of her markup.

This same problem exists with URIs. What happens if everyone is pointing 
to w3.org for their definition of "price", and then someone hacks the W3C 
servers and suddenly the whole Web's meaning changes for whoever is using 
this magic "follow your nose" principle?

Anyway, I don't think you should ever dereference something that isn't an 
actual URI. That's what URIs are for.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list