[whatwg] Storage quota introspection and modification

Wed Apr 7 18:05:05 PDT 2010

On Wed, 10 Mar 2010, Ian Fette (ã~B¤ã~B¢ã~C³ã~C~Uã~B§ã~C~Cã~C~Fã~B£) wrote:
>
> As I talk with more application developers (both within Google and at 
> large), one thing that consistently gets pointed out to me as a problem 
> is the notion of the opaqueness of storage quotas in all of the new 
> storage mechanisms (Local Storage, Web SQL Database, Web Indexed 
> Database, the Filesystem API being worked on in DAP, etc). First, 
> without being able to know how large your quota currently is and how 
> much headroom you are using, it is very difficult to plan in an 
> efficient manner. For instance, if you are trying to sync email, I think 
> it is reasonable to ask "how much space do I have," as opposed to just 
> getting halfway through an update and finding out that you hit your 
> quota, rolling back the transaction, trying again with a smaller subset, 
> realizing you still hit your quota, etc.

I agree. I would recommend this be resolved at the browser level, though. 
Instead of having a hard quota limit, a better solution would be to have a 
soft limit which, when exceeded, informs the user that the app is using a 
lot of disk space, and allows the user to either do nothing or enforce a 
hard quota. Similarly, when the browser detects that a site that hasn't 
been used in a while is using a lot of disk space, it might inform the 
user that there's stuff that could be purged. Or the browser could detect 
when the disk is getting full, and offer to the user a way to manage 
existing apps with data.

> I would like to see a method added, for any "storage mechanism", 
> something like "GetCurrentQuota()" and "GetCurrentQuotaUsed()". (I 
> specifically don't care what they're called or the details, e.g. whether 
> they need to be callbacks, I just want to see us address this use case.)

I don't think such an API makes sense, because the possible implementation 
strategies are so vast that these APIs often don't make sense.

What if you're storing in the cloud? Do you return +Inf?
What if you're storing on a device whose free space is fluctuating wildly? 
Do you report a different magnitude each time?
What if you're going to ask the user when the quota reaches 5MB, and don't 
yet know if the user will grant permission to give the app an extra 5TB?
What if you're compressing the data so you can store 5MB of uncompressible 
video, but you can store 10MB of plain text?
What if you're storing the data as UTF-32, but other browsers use UTF-8, 
and the app is storing ASCII text? Do you have to report a number four 
times smaller than reality?

There are too many variables to make such an API usable, IMHO.

> Secondly, I think we need a better answer for obtaining a larger quota. 
> Let's think for a moment about the use cases -- in most instances, I am 
> going to make a decision that I want to use an application offline. (I 
> personally would not expect to browse to a site and then just happen to 
> be able to use it offline, nor do I expect users to have that 
> expectation or experience. Rather, I expect going through some sort of 
> flow like clicking something that says "Yes, I want to use Application X 
> offline". At this point, I want to get any permissioning issues out of 
> the way. I don't want to wait until the data sync to the Web XXX 
> Database starts failing 10 minutes later to pop up an infobar that is no 
> longer in the user's current flow / state of mind, I don't want to then 
> pop up another infobar 30 minutes later saying their Filesystem quota is 
> full, etc. I want to be able to get this out of the way all at once, at 
> a point in time where I believe the user is actually in the mindset / 
> task of deciding that they want to use my web application. I 
> specifically do not want to have to deal with 4 different infobars, 
> potentially at 4 different points in time, to use an application I have 
> decided I want to use.

All of the above modes of interaction are possible today. It's up to the 
browser implementor to decide which one it should offer.

> To that point, I would like to see a method added (presumably that can 
> only be called in response to a user action) that would allow my page to 
> request a bundle of permissions.

I strongly disagree that any such mechanism is a good idea. We should 
never ask the user for permissions explicitly. Users would just click yes.

What we can do is say "this page can go offline" and the user can then say 
how much disk space he is ok with giving the page (typically chosing 
between none "I don't want this site on my computer", 5MB "I want this 
site to be able to do basic safe things", and unlimited "I trust this 
applicaton developer and want their app".

> I am going to go out on a limb here and include geolocation in this 
> bundle. Some sort of a callback type API where I pass in a list of 
> permissions that I want, the UA is free to display this to the user in 
> whatever mechanism it determines appropriate (if at all -- e.g. if the 
> user has already denied geolocation and that's being requested again, as 
> a UA i might decide not to present that request). That is, I could pass 
> in something like ["LocalStorageQuota", 20*1024*1024 /* 20M */, 
> "WebSQLQuota", 1*1024*1024*1024 /* 1GB */, "FileSystemQuota", 
> 10*1024*1024*1024 /* 10GB */, "Geolocation", true], and the callback 
> could then (as a convenience) indicate the current quota for all of the 
> things that I asked for, so that I could figure out whether the user 
> accepted, denied, or accepted and modified the request and how I can 
> then proceed (or not proceed). Again, I don't care terribly about the 
> details of how the API looks, the above is just meant for illustration.

I think this would be a horrible user experience and would lead to hostile 
sites having basically full access to the user's computer without their 
knowledge.

On Wed, 10 Mar 2010, Jeremy Orlow wrote:
> 
> That said, I agree with you...as long as we can do it in a manor that's 
> completely unobtrusive and not in the "Do you want this app to work: yes 
> or no" style (where yes implies giving them tons of permissions).  
> Ideally with an <input> type.  Perhaps the input could have parameters 
> that give the "recommended" values and then leave the rest up to the UA 
> to help advise the user?

I could see having some sort of <input> control that allows the user to 
opt in to giving the page more permissions. (This would be something that 
would make sense to add to the <bb> element I had proposed, in fact.) What 
we would need here is implementation experience: what features are needed? 
Does it work? Do users like it? Do app devs like it?

On Wed, 10 Mar 2010, Scott Hess wrote:
> 
> An alternative to providing a measure of what your remaining quota is 
> would be to provide an estimate of the minimum amount of additional data 
> which the system has a high confidence that it can store for you. That 
> number can probably be generated more reliably, and it sidesteps some of 
> the issues with what quota really is (after considering straight storage 
> overhead and the overhead needed to deal with transactions).

That's a better solution from the point of view of avoiding many of the 
aforementioned problems, but does it solve the problem for authors?

Is there a browser implementor who would be willing to try implementing 
something like window.navigator.minimumQuotaRemaining who could get 
implementation experience for us here?

On Wed, 10 Mar 2010, Mike Shaver wrote:
> 
> It generally seems that "desktop" mail clients behave in the undesirable 
> way you describe, in that I've never seen one warn me about available 
> disk space, and I've had several choke on a disk being surprisingly 
> full.  And yet, I don't think it causes a lot of problems for users.  
> One reason for that is likely that most users don't operate in the red 
> zone of their disk capacity; a reason for THAT might be that the OS 
> tells them that they're getting close, and that many of their apps start 
> to fail when they get full, so they are more conditioned to react 
> appropriately when they're warned.  (Also, today's disks are gigantic, 
> so if you fill one up it's usually a WTF sort of moment.)
> 
> Part of that is also helped by the fact that they're managing a single 
> quota, effectively, which might point to a useful simplification: when 
> the disk gets close to full, and there's "a lot" of data in the storage 
> cache, the UA could prompt the user to do some cleanup.  Just as with 
> cleaning their disk, they would look for stuff they had forgotten was 
> still on there ("I haven't used Google Reader in ages!") or didn't know 
> was taking up so much space ("Flickr is caching *how* much image data 
> locally?").  The browser could provide a unified interface for setting a 
> limit, forbidding any storage, compressing to trade space for perf; on 
> the desktop users need to configure those things per-application, if 
> such are configurable at all.  If I really don't like an app's disk 
> space usage on the desktop, I can uninstall it, for which the web 
> storage analogue would perhaps be setting a small/zero quota, or just 
> not going there.
> 
> One thing that could help users make better quota decisions is a way for 
> apps to opt in to sub-quotas: gmail might have quotas for contact data, 
> search indexing, message bodies, and attachments.  I could decide that 
> on my netbook I want message bodies and contact data, but will be OK 
> with slow search and missing attachments.  An app like Remember The Milk 
> might just use one quota for simplicity, but with the ability to expose 
> distinct storage types to the UA, more complex web applications could 
> get sophisticated storage management "for free".
> 
> So I guess my position is this: I think it's reasonable for apps to run 
> into their quota, and to that end they should probably synchronize data 
> in priority order where they can distinguish (and if they were going to 
> make some decision based on the result of a quota check, presumably they 
> can).  User agents should seek to make quota management as 
> straightforward as possible for users.  One reasonable approach, IMO, is 
> to assume that if there is space available on the disk, then an app 
> they've "offlined" can use it.  If it hurts, don't go back to that site, 
> or put it in a quota box when you get the "achievement unlocked: 1GB of 
> offline data" pop-up.

I more or less agree with the above (though I think we should punt on 
subquota features for now). In general, this really seems like a quality 
of implementation issue, not a Web platform API issue.

On Thu, 11 Mar 2010, Jeremy Orlow wrote:
> 
> It might also be nice if there's a way for them to specify "these are 
> the files that are really important to me" and "these are the files that 
> are simply a cache" since they're two fairly different use cases.  And 
> one of them the UA can clean up without getting user permission and one 
> it can't. (I've suggested something like this be added to IndexedDB 
> anyhow.)

I agree that this would make sense to add in due course.

> I wonder if we should have some way for web apps to give the browser an 
> address for cleaning up the app's resource usage without deleting 
> everything.

This could be somewhat dangerous if you consider the context for why we 
have quotas in the first place, namely minimising the impact of hostile 
pages. We wouldn't want the attempt to clear a cache of some hostile page 
to actively go and load that page!

On Thu, 11 Mar 2010, Ian Fette (ã~B¤ã~B¢ã~C³ã~C~Uã~B§ã~C~Cã~C~Fã~B£) wrote:
> 
> I think apps will have to deal with hitting quota as you describe, 
> however with a normal desktop app you usually have a giant disk relative 
> to what the user actually needs. When we're talking about shipping 
> something with a 5mb or 50mb default quota, that's a very different 
> story than my grandfather having a 1tb disk that he is never going to 
> use. Even with 50mb (which is about as much freebie quota as I think I 
> am comfortable giving at the moment), you will blow through that quite 
> quickly if you want to sync your email. The thing that makes this worse 
> is that you will blow through it at some random point (as there is no 
> natural "installation" point from the APIs we have. You just get some 
> freebie appcache, web *** database quota etc.)

I think the solution here is to change from a hard 5mb limit to a soft 
limit as described above, not to add an installation point. The 
intallation point in desktop apps is a bug, not a feature -- the Web not 
having it is a huge positive for the platform.

> You seem to propose "if the user has offlined the app, set the default 
> quota to be unlimited and provide better ways for the user and 
> application to manage this when there is pressure on disk space." I 
> would personally be in favor of this approach, if only we had a good way 
> to define what it meant to "offline the app". Right now, appcache, 
> database, everything is advisory. The browser runs across an appcache 
> manifest and magically makes it available offline. The browser gets a 
> request to store a new database and the assumption in the spec seems to 
> be that there is some freebie quota, and then when you hit it some UA 
> magic happens. There is no real way in the spec for the user to tell the 
> browser "I actually want to use this site offline."

It's entirely up to the user and the user agent how this works. Just as 
easily as automatically making everything be offlinable, one can imagine a 
browser that doesn't do any offlining by default, but enables a toolbar 
button when the app supports going offline, which, when clicked, offlines 
the app with a huge quota. Similarly one can imagine the browser showing a 
gauge in the status bar showing how much disk space the app is using, and 
allowing the user to change it at any time.

These are features that are all supported by the spec, but are UI issues 
that the spec shouldn't get involved in, IMHO.

On Mon, 15 Mar 2010, Ian Fette (ã~B¤ã~B¢ã~C³ã~C~Uã~B§ã~C~Cã~C~Fã~B£) 
wrote:
>
> My initial reaction was that I don't know how much I buy into the 
> "subquota" part (vs named quota in general). E.g. if we can't enforce 
> any of the subquota distinctions beyond a same-origin level, it seems of 
> limited use. Upon further thought though, if you assume apps you trust 
> are well behaved, then this may actually be a good idea. Would make 
> displaying this information to users easier as well, even if relatively 
> few users ever do go into options UI.
> 
> At this point, if named subquota would meet the requirements I initially 
> put forth (request a set of resource quotas that I think I need, and get 
> a callback if I get them), and ideally be able to interrogate some sort 
> of information about the named subquota (be it "how many bytes are free" 
> vs "what are you reasonably sure I can store" I really don't care), I am 
> all for it ;-)
> 
> Is there some "named subquota" thread that I need to +1?

The most helpful thing to do, if we think named subquotas would help, 
would be to come up with an API proposal for it and then implement it 
experimentally, to see how well it works in practice.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'