[whatwg] Persistent storage is critically flawed.

Mon Aug 28 17:23:12 PDT 2006

On Mon, 28 Aug 2006, Shannon Baker wrote:
> > 
> > This is mentioned in the "Security and privacy" section; the third 
> > bullet point here for example suggests blocking access to "public" 
> > storage areas:
> > 
> >   http://whatwg.org/specs/web-apps/current-work/#user-tracking
> 
> I did read the suggestions and I know the authors have given these 
> issues thought. However, my concern is that the solutions are all 
> 'suggestions' rather than rules. I believe the standard should be more 
> definitive to eliminate the potential for browser inconsistencies.

The problem is that the solution is to use a list that doesn't exist yet. 
If the list existed and was firmly established and proved usable, then we 
could require its use, but since it is still being developed (by the 
people trying to implement the Storage APIs), we can't really require it.

> > Basically, for the few cases where an author doesn't control his 
> > subdomain space, he should be careful. But this goes without saying. 
> > The same requirement (that authors be responsible) applies to all Web 
> > technologies, for example CGI script authors must be careful not to 
> > allow SQL injection attacks, must check Referer headers, must ensure 
> > POST/GET requests are handled appropriately, and so forth.
> 
> As I pointed out this only gives control to the parent domain, not the 
> child without regard for the real-world political relationship between 
> the two. Also the implication here is that the 'parent' domain is more 
> trustworthy and important than the child - that it should always be able 
> to read a subdomains private user data. The spec doesn't give the 
> developer a chance to be responsible when it hands out user data to 
> anybody in the domain hierarchy without regard for whether they are a 
> single, trusted entity or not. Don't blame the programmer when the spec 
> dictates who can read and write the data with no regard for the authors 
> preferences. CGI scripts generally do not have this limitation so your 
> analogy is irrelevant.

It seems that what you are suggesting is that foo.example.com cannot trust 
example.com, because example.com could then steal data from 
foo.example.com. But there's a much simpler attack scenario for 
example.com: it can just take over foo.example.com directly. For example, 
it could insert new HTML code containing <script> tags (which is exactly 
what geocities.com does today, for example!), or it could change the DNS 
entries (which is what, e.g., dyndns.org could do).

There is an implicit trust relationship here already. There is no point 
making the storage APIs more secure than the DNS and Web servers they rely 
on. That would be like putting a $500 padlock on a paper screen.

> > Indeed; users are geocities.com shouldn't be using this service, and 
> > geocities themselves should put their data (if any) in a private 
> > subdomain space.
>
> Geocities and other free-hosting sites generally have a low server-side 
> storage allowance. This means these sites have a _greater_ need for 
> persistent storage than 'real' domains.

They can use it if they want. It just won't be secure. This is true 
regardless of how we design the API, since the Web server can insert 
arbitary content into their site.

> > It doesn't. The solution for mysite.geocities.com is to get their own 
> > domain.
>
> That's a bit presumptuous. In fact it's downright offensive. The user 
> may have valid reasons for not buying a domain. Is it the whatcg's role 
> to dictate hosting requirements in a web standard?

I'm just stating a fact of life. If you want a secure data storage 
mechanism, you don't host your site on a system where you don't trust the 
hosting provider.

> I accept that such a list is probably the answer, however I believe the 
> list should itself be standardised before becoming part of a web 
> standard - otherwise more UA inconsistency.

I think we should change the spec once the list is ready, yes. This isn't 
yet the case, though. In the meantime, I don't think it's wise for us to 
restrict the possible security solutions; a UA vendor might come up with a 
better (and more scalable) solution.

Note that the problems you raise also exist (and have long existed) with 
cookies; at least the storage APIs default to a safe state in the general 
case instead of defaulting to an unsafe state.

> > One could create much more complex APIs, naturally, but I do not see 
> > that this would solve the problems. It wouldn't solve the issue of 
> > authors who don't understand the security implications of their code, 
> > for instance. It also wouldn't prevent the security issue you 
> > mentioned -- why couldn't all *.geocities.com sites cooperate to 
> > violate the user's privacy? Or *.co.uk sites, for that matter? (Note 
> > that it is already possible today to do such tracking with cookies; in 
> > fact it's already possible today even without cookies if you use 
> > Referer tracking, and even without Referer tracking one can use IP and 
> > User-Agent fingerprinting combined with log analysis to perform quite 
> > thorough tracking.)
>
> None of those techniques are reliable.

Neither are the data storage APIs. I assure you that the above methods are 
plenty accurate enough to violate people's privacy if sites cooperate. (If 
sites cooperate in detail, for example by all including <iframe>s and 
sending as much info as they can on the user in ?query parameters, then it 
might even be more accurate than mechanisms that use data storage APIs.)

> I don't have a solution to this other than to revoke this proposal or 
> prevent the sharing of storage between sites. I accept tracking is 
> inevitable but we shouldn't be making it easier either.

The spec lists a number of ways to mitigate this, and browser vendors will 
pick the ones that make sense. Once a clear solution is available, we can 
make the spec more detailed.

Note that there are certain mitigation mechanisms (such as data 
expiration) regarding which we can never make strong requirements, because 
different UAs will be in different contexts. For example, you can't ever 
require a UA that has a readonly filesystem to not expire its data at the 
end of the session. And similarly, you can't require such a UA to expire 
data if the filesystem it runs from always starts with the same data, 
timestamped at the UA's launch time (so it always seems new). (I've seen 
both of these scenarios in the context of browsers and cookies.)

> > Certainly one could add a .readonly field or some such to storage data 
> > items, or even fully fledged ACL APIs, but I don't think that should 
> > be available in a first version, and I'm not sure it's really useful 
> > in later versions either.
>
> Any more or less complex or useful than the .secure flag? Readonly is an 
> essential attribute in any shared data system from databases to 
> filesystems. Would you advocate that all websites be world-writable just 
> to simplify the API?

Clearly not, the API is entirely constructed around not being 
world-writable, but having spaces, each of which is limited to being 
usable by certain domains.

> Not that it should be hard to implement .readonly, as we already have 
> metadata with each key.

It's not that each individual additional feature is hard, it's that the 
sum total is harder the more features you add. That's why it's better to 
start very simple, and add features gradually over time, based on user 
need and implementation experience.

> > I don't really understand what this is referring to. Could you show an 
> > example of the transaction/callback system you refer to? The API is 
> > intended to be really simple, just specify the item name and there you 
> > go.
>
> I'm refering to the "storage" event described in 5.9.6 which is fired in 
> all active pages as data changes. This is an unusual proceedure that 
> needs a better justification than those given in the spec. If the event 
> pulls me out of my current function then how am I going to do anything 
> useful with the application state (without really knowing where 
> execution was interrupted)?

Events are not re-entrant. This event will never fire while other scripts 
are running. (This isn't clear in the spec yet because I haven't yet 
written the general event dispatch section.)

> > While I agree that there are valid concerns, I believe they are all 
> > addressed explicitly in the spec, with suggested solutions.
>
> You points are also quite valid however they ignore the root of my 
> concerns - which is that the spec leaves too much up to the UA to 
> resolve. I don't see how you can explicitly define something with a 
> suggestion! The whole spec kind of 'hopes' that many disparate 
> companies/groups will cooperate to make persistent storage work 
> consistently across browsers. They might, but given both Microsoft and 
> Netscapes track records I think things need to be more concrete in such 
> an important spec.

Hopefully the above comments have explained why the spec is not yet very 
explicit about this. I agree that for the "public" domain space we should 
make the spec more explicit in due course. I'm not sure we'll ever be able 
to be completely explicit, though; there are some scenarios where you 
can't expect the UA to have the full list of subdomains. (For example, a 
browser on a tightly constrained device may not have enough disk space to 
store such a large data file.)

> As a quick thought, the simplest approach might just be to require the 
> site send a secret hash or public key in order to prove it 'owns' the 
> key. The secret could even be a timestamp of the exact time the key was 
> set or just a hash of the users site login. eg:
> 
> DOMAIN         KEY          SECRET                                 DATA
> foo.bar              baz             kj43h545j34h6jk534dfytyf      A string.

I don't really see how that would work. Could you explain it in more 
detail?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'