[whatwg] Persistent storage is critically flawed.
Shannon Baker
shannon at arc.net.au
Sun Aug 27 23:00:25 PDT 2006
Ian Hickson wrote:
>
> This is mentioned in the "Security and privacy" section; the third
> bullet point here for example suggests blocking access to "public"
> storage areas:
>
> http://whatwg.org/specs/web-apps/current-work/#user-tracking
>
I did read the suggestions and I know the authors have given these
issues thought. However, my concern is that the solutions are all
'suggestions' rather than rules. I believe the standard should be more
definitive to eliminate the potential for browser inconsistencies.
> Yes, there's an entire section of the spec discussing this in detail,
> with suggested solutions.
>
Again, the key word here is 'suggest'.
> Indeed, the spec suggests blocking such access.
>
Suggest. See where I'm going with this. The spec is too loose.
> There generally is; but for the two cases where there are not, see:
>
> http://whatwg.org/specs/web-apps/current-work/#storage
>
> ...and:
>
> http://whatwg.org/specs/web-apps/current-work/#storage0
>
> Basically, for the few cases where an author doesn't control his
> subdomain space, he should be careful. But this goes without saying.
> The same requirement (that authors be responsible) applies to all Web
> technologies, for example CGI script authors must be careful not to
> allow SQL injection attacks, must check Referer headers, must ensure
> POST/GET requests are handled appropriately, and so forth.
>
As I pointed out this only gives control to the parent domain, not the
child without regard for the real-world political relationship between
the two. Also the implication here is that the 'parent' domain is more
trustworthy and important than the child - that it should always be able
to read a subdomains private user data. The spec doesn't give the
developer a chance to be responsible when it hands out user data to
anybody in the domain hierarchy without regard for whether they are a
single, trusted entity or not. Don't blame the programmer when the spec
dictates who can read and write the data with no regard for the authors
preferences. CGI scripts generally do not have this limitation so your
analogy is irrelevant.
> Indeed; users are geocities.com shouldn't be using this service, and
> geocities themselves should put their data (if any) in a private
> subdomain space.
Geocities and other free-hosting sites generally have a low server-side
storage allowance. This means these sites have a _greater_ need for
persistent storage than 'real' domains.
> It doesn't. The solution for mysite.geocities.com is to get their own
> domain.
That's a bit presumptuous. In fact it's downright offensive. The user
may have valid reasons for not buying a domain. Is it the whatcg's role
to dictate hosting requirements in a web standard?
> The spec was written in conjunction with UA vendors. It is realistic
> for UA vendors to provide a hardcoded list of TLDs; in fact, there is
> significant work underway to create such a list (and have it be
> regualrly updated). That work was originally started for use for HTTP
> Cookie implementations, which have similar problems, but would be very
> useful for Storage API implementations (although, again as noted in
> the draft, not imperative for a secure implementation if the author is
> responsible.
I accept that such a list is probably the answer, however I believe the
list should itself be standardised before becoming part of a web
standard - otherwise more UA inconsistency.
> One could create much more complex APIs, naturally, but I do not see
> that this would solve the problems. It wouldn't solve the issue of
> authors who don't understand the security implications of their code,
> for instance. It also wouldn't prevent the security issue you
> mentioned -- why couldn't all *.geocities.com sites cooperate to
> violate the user's privacy? Or *.co.uk sites, for that matter? (Note
> that it is already possible today to do such tracking with cookies; in
> fact it's already possible today even without cookies if you use
> Referer tracking, and even without Referer tracking one can use IP and
> User-Agent fingerprinting combined with log analysis to perform quite
> thorough tracking.)
None of those techniques are reliable. My own weblogs show most users
have the referer field turned off. Cookies can be safely deleted after
every session without a major impact on site function (I may have to
login again). IP tracking is mitigated by proxies and NAT's. The trouble
with this proposal is that it would allow important data to get lumped
in with tracking data when the spec suggests that UA's should only
delete the storage when explicitly asked to do so. I don't have a
solution to this other than to revoke this proposal or prevent the
sharing of storage between sites. I accept tracking is inevitable but we
shouldn't be making it easier either.
> Certainly one could add a .readonly field or some such to storage data
> items, or even fully fledged ACL APIs, but I don't think that should
> be available in a first version, and I'm not sure it's really useful
> in later versions either.
Any more or less complex or useful than the .secure flag? Readonly is an
essential attribute in any shared data system from databases to
filesystems. Would you advocate that all websites be world-writable just
to simplify the API? Not that it should be hard to implement .readonly,
as we already have metadata with each key.
> I don't really understand what this is referring to. Could you show an
> example of the transaction/callback system you refer to? The API is
> intended to be really simple, just specify the item name and there you
> go.
I'm refering to the "storage" event described in 5.9.6 which is fired in
all active pages as data changes. This is an unusual proceedure that
needs a better justification than those given in the spec. If the event
pulls me out of my current function then how am I going to do anything
useful with the application state (without really knowing where
execution was interrupted)?
> While I agree that there are valid concerns, I believe they are all
> addressed explicitly in the spec, with suggested solutions.
You points are also quite valid however they ignore the root of my
concerns - which is that the spec leaves too much up to the UA to
resolve. I don't see how you can explicitly define something with a
suggestion! The whole spec kind of 'hopes' that many disparate
companies/groups will cooperate to make persistent storage work
consistently across browsers. They might, but given both Microsoft and
Netscapes track records I think things need to be more concrete in such
an important spec.
> I would be interested in seeing a concrete proposal for a better
> solution; I don't really see what a better solution would be.
I'm not sure myself but I don't think it can stay the way it is. I would
be happy to offer a better proposal or update the current one given
enough time to consider it.
As a quick thought, the simplest approach might just be to require the
site send a secret hash or public key in order to prove it 'owns' the
key. The secret could even be a timestamp of the exact time the key was
set or just a hash of the users site login. eg:
DOMAIN KEY SECRET DATA
foo.bar baz kj43h545j34h6jk534dfytyf A string.
Just one idea.
Shannon
Web Developer
More information about the whatwg
mailing list