[whatwg] localStorage, the storage mutex, document.domain, and workers

Thu Sep 17 01:32:02 PDT 2009

SUMMARY:

I haven't removed the storage mutex. I don't see any other workable 
solution to the problem.

We can't change the API, as much as we'd like to, because it's already 
shipped in IE, and it would simply be dumb for us to screw Microsoft over 
here. It's hard enough to get them to implement specs without giving their 
management more reasons for not being on the cutting edge.

We can't simply remove the storage mutex, because then we'll have race 
conditions up the wazoo, especially for localStorage. We could probably 
get away with letting document.cookie be unprotected, since IE has done 
that for years and cookies aren't a high-traffic API, but who knows how 
many bugs that's really causing. If implementations don't implement the 
mutex around document.cookie access, I can't blame them, but it's a risk 
that each implementor needs to evaluate on their own, the spec isn't going 
to condone it.

I have made sure that as far as I am aware, there are no ways to 
synchronously invoke script from another origin in HTML5 without dropping 
the mutex (specifically, I've made history.back() and changing of 
document.domain drop the mutex), and I've made it so that you can only 
access localStorage objects of your effective script origin (i.e. changing 
document.domain means your localStorage is no longer accessible). This 
means that it should be possible to implement the storage mutex on a 
per-domain basis. This should minimise the potential damage.

I haven't added localStorage to workers, because it'd be too easy to grab 
the mutex and try to see if Pi is finite using a brute-force approach, 
leaving all the other event loops that try to grab the mutex SOL.

I haven't added an async callback to localStorage yet, because I don't 
imagine that most authors will use it given that they don't need to and 
see no problems when testing (you'd only see issues if you tested having 
two apps from the same domain both doing storage updates in long scripts).

LESSONS LEARNT

If we ever define a new API that needs a lock of some kind, the way to do 
it is to use a callback, so that the UA can wait for the lock 
asynchronously, and then run the callback once it has it. (This is what 
the Web Database spec does, for instance.)

On Thu, 3 Sep 2009, timeless wrote:
> On Sun, Aug 30, 2009 at 4:06 AM, Ian Hickson<ian at hixie.ch> wrote:
> > Upon further consideration I've renamed getStorageUpdates() to 
> > yieldForStorageUpdates().
> 
> If getStorageUpdates() actually returned how *many* updates there were, 
> it could be a vaguely useful name.
> 
> If the answer is 0, then my application knows it doesn't need to try to 
> figure out how the world changed, right?

The answer 0 would only be useful if you were to grab the storage mutex 
immediately upon receiving it, which defeats the point of the call.

On Fri, 4 Sep 2009, Jeremy Orlow wrote:
> 
> I like this idea except for one problem:  It doesn't tell whether 
> something got changed without your knowledge.  If you call alert, access 
> a plugin, etc it's possible to drop the lock.  I think some sort of 
> global counter, variable, etc would be more valuable since it solves 
> both problems.

What's the use case for this? I don't understand what a script making use 
of this would look like. (I mean, I can see how to write a demo that shows 
using it, but how would a real site use it?)

On Fri, 4 Sep 2009, Chris Jones wrote:
>
> I'd like to propose that HTML5 specify different schemes than a 
> conceptual global storage mutex to provide consistency guarantees for 
> localStorage and cookies.
> 
> Cookies would be protected according to Benjamin Smedberg's post in the 
> "[whatwg] Storage mutex and cookies can lead to browser deadlock" 
> thread. Roughly, this proposal would give scripts a consistent view of 
> document.cookie until they completed.  AIUI this is stronger consistency 
> than Google Chrome provides today, and anecdotal evidence suggests even 
> their weaker consistency hasn't "broken the web."

I think if we're willing to run the risk of pages stomping on each other, 
we should just go ahead and not have locking at all, rather than trying to 
show a consistent view that we know to be a lie.

That is, if this is going to fail when run in parallel:

   var count = getCookie('counter');
   counter += 1;
   setCookie('data' + counter, data);
   setCookie('counter', counter);

...then there's not much point worrying about whether the view is 
consistent. Either way, data loss will occur.

> localStorage would be changed in a non-backwards-compatible way.

I think this is a non-starter, as described above.

> [transactions]

The problem with a system where transactions might fail based on timing is 
that there's not much for the author to do but just try again, and that's 
really not a good pattern to encourage. It's also likely that many authors 
won't notice the failure case, leading to just another kind of race 
condition very similar to what we're trying to avoid in the first place.

On Fri, 4 Sep 2009, Chris Jones wrote:
> 
> My problem with storage mutex boils down to the fact that by the letter 
> of the spec, a script can lock out the UA indefinitely by just reading a 
> cookie.

Not indefinitely -- only until the UA kills the script. Also, it should be 
possible to make this per-origin locks, so it would only lock up the pages 
that share its event loop or that are same-origin (or share the event loop 
of same-origin content). And then only if they themselves need the mutex.

On Sat, 5 Sep 2009, Chris Jones wrote:
> 
> The cases I've thought of so far where we will probably have to break 
> storage-mutex semantics are
> 
>   * clear private data
>   * close tab
>   * quit UA
>   * "slow script" timeout
>   * store-to-disk failure
>   * crash

I don't follow. Why would these break storage mutex semantics?

I would presume that the first one would just grab the mutex like anyone 
else. The "close tab" and "quit UA" cases don't seem to need the storage 
mutex. The "slow script" timeout would just spin the event loop, which 
releases the mutex. The "store to disk failure" and "crash" cases seem 
like exceptional cases unrelated to the storage mutex.

> In the "secret-storage-mutex world," if you agree that the cases above 
> imply that the UA will have to interrupt scripts, then it's possible for 
> scripts to make changes to localStorage that are only partially applied, 
> going by the letter of the storage mutex spec.

Sure, but there are far easier ways for that to happen, like an exception 
firing in the middle of the script.

> I think that for sites that would care (e.g. gmail), partially-applied 
> changes are a bad thing.  And as I argued in the OP, I think 
> localStorage should be designed only with sites like gmail in mind.

I don't think it is a good idea to ignore the other sites, in particular 
because frankly a case like GMail is far more likely to want to use the 
Web Database feature with full transactions going on than localStorage. 
However, it's quite possible to build crash-safe transactions on top of 
localStorage if you have the implied locking semantics of the storage 
mutex, so if authors want to handle that case, they can.

On Tue, 8 Sep 2009, Robert O'Callahan wrote:
> On Tue, Sep 8, 2009 at 8:45 PM, Jeremy Orlow <jorlow at chromium.org> 
> wrote:
> > 
> > You'd have to implement it via a mutex.  An optimized implementation 
> > could wait until the first operation that can't be un-done before 
> > acquiring it, and do everything optimistically until then.  This is 
> > the same situation as WebDatabase if I understand it correctly.
> 
> AFAIK WebDatabase transactions can't have side effects outside the 
> database, so they can be implemented optimistically with automatic 
> retries so aborts aren't exposed to the developer.

WebDatabase transactions just always grab a lock on the DB. You decide 
when creating the transaction whether it's a read lock or a write lock. If 
you get a read lock, writes fail. If you get a write lock, you get an 
exclusive lock for the entire database before your callback runs. They can 
have side-effects, and they never get retried, because they can't fail 
unpredictably.

On Tue, 8 Sep 2009, Aaron Boodman wrote:
> 
> Here are two cases I know of where it is possible to have synchronous 
> script execution across origins:
> 
> * Plugins. It is possible for script to invoke a plugin function in one 
> frame, and for the plugin to synchronously execute script in another 
> frame. We have addressed this in the spec by saying that invoking a 
> plugin releases the storage mutex, but that doesn't really solve the 
> problem. We are exchanging violation of run-to-completion for deadlock. 
> I guess it is an improvement, but it is still a bug.

If you call a plugin, all bets are off anyway. I mean, the plugin could go 
and corrupt your database directly, for all you know.

> * In WebKit, onresize is invoked synchronously. You can cause
> cross-origin synchronous script execution by resizing an iframe.
> AFAIK, the spec does not disallow this event from being synchronous.

I'm not aware of any spec that fires 'resize' events currently. They 
should either fire asynchronously or release the mutex, though.

On Wed, 9 Sep 2009, Chris Jones wrote:
> 
> But in general, nesting transactions, both { localStorage { Web DataBase 
> } } and { localStorage { localStorage } }, is something the spec should 
> explicitly disallow.  There's not a clearly best way to resolve the 
> semantic problems that arise.  (Note that preventing nested transactions 
> also eliminates deadlock concerns for mutex implementations.)

You can't nest async Web Database transactions in anything, the Web 
Database callbacks are called from the top-level event loop. You can get 
deadlocks in Workers with sync transactions (e.g. opening a write 
transaction inside another transaction, or a read transaction inside a 
write transaction), but they're not race condition deadlocks, they're 
reproduceable logic errors. (I expect UAs will detect those cases and just 
fire the timeout exception immediately, though the spec doesn't 
technically require that.)

On Tue, 8 Sep 2009, Chris Jones wrote:
> 
> No one has responded directly to my original proposal of making 
> |window.localStorage === undefined| until |window.transaction| or 
> whatever has been accessed.  Unlike your proposal and a similar one from 
> Jeremy, mine is a "safe" (non-racy) way for spec-compliant UAs to "bend" 
> backwards compatibility without explicitly breaking it.  Web apps slow 
> to change should theoretically be prepared for |window.localStorage === 
> undefined| anyway, as not all UAs support localStorage.  And if a UA 
> doesn't support window.transaction, a web app that cares never needs to 
> worry about racy localStorage because in non-compliant UAs, 
> |window.transaction === undefined|.

The problem is that sites that _don't_ check will work fine in IE8, and 
fail everywhere else, and if IE8 gets enough market share that people 
write pages that work with ie without checking other browsers, as happened 
with IE6 for example, then we'll be forced to implement this anyway, 
except Microsoft will get bad PR for being incompatible and acting 
monopolistically. On the other hand, if, say, Firefox implements this 
proposal, and it gets enough market share that the opposite happens, then 
IE is SOL, because people will be calling .transaction() and it'll fail in 
IE. Either way, Microsoft end up being burnt for having done the right 
thing, implementing cutting edge specs, and then we run the risk of them 
backing off even further from doing that again.

On Fri, 4 Sep 2009, Jonas Sicking wrote:
> >> 
> >> I really liked Darin's (?) suggestion of allowStorageUpdates as that
> >> seems to exactly describe the intended use of the function. We no longer
> >> prevent other page from updating the storage.
> >
> > "allow" implies a state change, which I don't think really matches what is
> > happening here. ("How do I disallow updates?")
> 
> I don't understand why you associate "allow" with "state change"? It
> could just as well be allowing anything else. The word "Updates" is
> much more associated with "state change" i'd say. And that word occurs
> in your proposal too. Really it should probably be allowStorageAccess
> or yeildForStorageAccess to be more correct.

A method called allowFoo() feels to me like it should be paired with 
another disallowFoo() and feels like it sets a flag to enable foo or 
disable foo. But maybe that's just me.

Anyway I think yieldForStorageUpdates() is fine. I somewhat prefer 
getStorageUpdates() but I agree that getFoo() methods should return 
something.

On Wed, 9 Sep 2009, Jens Alfke wrote:
> On Sep 4, 2009, at 2:38 PM, Ian Hickson wrote:
> > 
> > Right. My point is the site can do that already, since we don't ever
> > _stop_ the site from using the local storage area. It can prompt you for
> > a name directly, without UA involvement.
> 
> I'm sorry, I don't understand that. We must somehow be talking about
> completely different things. Sure, the site can call prompt() to get a name,
> but how can the site then create a file on the user's machine?

Just setting anything in localStorage does it automatically.

> > > You can't store 50MB of data in a cookie. I'm talking about the 
> > > entire local storage of a site here, not a 40-byte session ID or 
> > > something.
> > 
> > Then that distinction can be what is exposed in the UI to distinguish 
> > "boring" storage from "important" storage.
> 
> Size isn't an indicator of importance, if that's what you mean. A 
> 100-byte email draft could be vitally important, while 50MB of 
> downloaded cached textures for a game isn't. "Importance" is an 
> application-specific property and the UA's not going to be able to guess 
> it.

Well I don't think we can rely on the sites to declare it, either, so I 
guess we're back to it all being important.

On Wed, 9 Sep 2009, Michael Nordman wrote:
>
> If this feature existed, we likely would have used it for offline Gmail 
> to coordinate which instance of the app (page with gmail in it) should 
> be responsible for sync'ing the local database with the mail service. In 
> the absence of a feature like this, instead we used the local database 
> itself to register which page was the 'syncagent'. This involved 
> periodically updating the db by the syncagent, and periodic polling by 
> the would be syncagents waiting to possibly take over. Much ugliness. 

Shared workers are the better solution to this.

On Thu, 10 Sep 2009, James Robinson wrote:
> 
> Is it really too late for DB and localStorage?  I'm still trying to get 
> used to the standards process used here but I thought the idea with UAs 
> implementing draft specs is that the feedback from the experience can be 
> used to refine the spec - a few UAs have implemented synchronous access 
> to a single resource from multiple threads and it appears to be 
> problematic.
> Wouldn't that mean it's a good time to revise the problematic parts?

Web Database is still very much in flux. However, localStorage is more or 
less fixed because Microsoft shipped it and their release cycle means they 
won't ship changes to it before it is solidly embedded in too many sites 
to risk changes.

On Fri, 11 Sep 2009, Jeremy Orlow wrote:
> 
> If the intention is to split localStorage into multiple domains, I think 
> we should do just that.  For example, we could add 
> |window.getLocalStorage(name)| that would create a new object that 
> implements the storage interface.  |window.localStorage| could be an 
> alias to |window.getLocalStorage(null)| maybe?  We could then specify 
> that you can only hold a lock on one storage interface at once.

With a sync access model, this wouldn't help. You'd still need one lock 
for each storage area, otherwise a single event loop could end up trying 
to aquire too locks while another is doing the opposite, and they would 
deadlock.

We could have an additional API for async localstorage access that did 
this division of data, but it's not clear that the additional value of 
this makes it worth it given that we'd have to still offer the current API 
as well. I don't think many people would use the new API.

On Thu, 10 Sep 2009, Scott Hess wrote:
>
> I think that you can either give web developers a strong set of 
> concurrent-programming primitives, or you can give them a weak set and 
> let them make up what they think they need out of those.  The nice thing 
> about providing a very basic primitives is that it's more likely that 
> good developers can compose higher-level operations with the primitives.

The good developers are in the minority (through no fault of the others 
-- it's just that the bar to developing on the Web is so low that many 
more people who would never try programming some other platform are able 
to do useful things on the Web despite their minimal knowledge).

I think that providing any kind of explicit locking or mutex system to Web 
authors would be like passing them a loaded gun with the safety off, and 
them not being able to tell which side the bullet was going to come out 
of. All the cool kids would be using it, so clearly they should do...

I really think it would have spectacularly bad results.

On Wed, 9 Sep 2009, Darin Fisher wrote:
> 
> By the way, you can already pretty much create my acquireLock / 
> releaseLock API on top of SharedWorkers today, but in a slightly 
> crappier way.

How? Since the API is completely async, you can't make a spinlock.

> Yes, I wholeheartedly agree.  Note: my concern is that there is no good 
> implementation for the storage mutex.  Implicitly dropping it at weird 
> times is not a good answer.

I think the few cases where it happens are actually pretty reasonable:

 - changing document.domain
 - history.back(), .forward(), .go(n)
 - invoking a plugin
 - alert(), confirm(), prompt(), print()
 - showModalDialog()
 - yieldForStorageUpdates()

To examine each in turn:

 - changing document.domain
   After this, you can't access your storage area anymore anyway.

 - history.back(), .forward(), .go(n)
   Usually this unloads your document anyway. In the case where you're 
   navigating an iframe, I could see some confusion arising, but it is 
   likely quite a rare operation.

 - invoking a plugin
   All bets are off when calling a plugin.

 - alert(), confirm(), prompt(), print()
 - showModalDialog()
   These are all fire long-term modal dialogs, so it makes sense that the 
   storage area could change while they're running.

 - yieldForStorageUpdates()
   That's the point of the method.

On Wed, 9 Sep 2009, Darin Fisher wrote:
> 
> Yeah, if you had to call an API that asynchronously acquires exclusive 
> access to storage, then I believe that would nicely address most of the 
> issues.  It is the solution we have for database transactions.
> 
> I say "most" because I'm not sure that it eliminates the need to drop 
> the storage mutex in the showModalDialog case.
> 
> If I call showModalDialog from within a database transaction, and then 
> showModalDialog tries to create another database transaction, should I 
> expect that the transaction can be started within the nested run loop of 
> the modal dialog?  If not, then it may cause the app to get confused and 
> never allow the dialog to be closed (e.g., perhaps the close action is 
> predicated on a database query.)

You can get into a situation where the database query started from 
showModalDialog() fails to ever aqcuire the lock and call its callback, 
yes. The user can always manually dismiss the modal window, though, so 
it's not a disaster (the callback would then run when the window was 
closed and the parent transaction finishd). It's also reproduceable, so 
it's not going to be that hard to debug.

The same problem exists in workers, where you have synchronous 
transactions.

On Thu, 10 Sep 2009, Jeremy Orlow wrote:
> 
> We could just disallow showModalDialog and any other trouble APIs like 
> that during localStorage and database "transactions".  Doing so seems 
> better than silently dropping transactional semantics.

Right now, if the showModalDialog() is called from a write transaction, 
nested transactions on the same database always fail, and from a read 
transaction, nested write transactions always fail (in both cases, 
assuming the user doesn't close the window and let the callback fire once 
the parent transaction has finished). I think that's fine, we don't need 
to either drop transactional semantics or block all nested transactions or 
modal dialogs of any kind. We already have an error callback that fires 
when the UA times out, and the UA could detect this situation and report 
the likely cause of the timeout with a warning to the console to report it 
for debugging purposes. I don't think it's a particularly big problem.

On Wed, 9 Sep 2009, Darin Fisher wrote:
> 
> It may not be so easy to disallow showModalDialog.  Imagine if you 
> script a plugin inside the transaction, and before returning, the plugin 
> scripts another window, where it calls showModalDialog.  There could 
> have been a process hop there.

If your script is linking processes in this way then you can have far more 
serious deadlocks already (imagine two plugins both doing this but going 
in the opposite direction).

On Wed, 9 Sep 2009, Maciej Stachowiak wrote:
> 
> I'm really hesitant to expose explicit locking to the Web platform. 
> Mutexes are incredibly hard to program with correctly, and we will 
> surely end up with stuck locks, race conditions, livelocks, and so 
> forth. For Workers I was happy that we managed to avoid any locking 
> primitives by using a message-passing model, but explicit mutexes would 
> ruin that.

I agree.

On Tue, 15 Sep 2009, Jeremy Orlow wrote:
>
> [storage in workers]
> One possible solution is to add an asynchronous callback interface for
> LocalStorage into workers.  For example:
> 
> function myCallback(localStorage) {
>   localStorage.accountBalance = localStorage.accountBalance + 100;
> }
> executeLocalStorageCallback(myCallback);
> 
> [...] Of course, it's still possible for a poorly behaving worker to do 
> large amounts of computation in the callback, but hopefully the fact 
> they're executing in a callback makes the developer more aware of the 
> problem.

I don't think we want to hang user experience on a hope that authors will 
do the right thing. We've lost every time we've made that bet in the past.

The only way I could see making localStorage visible to workers is with a 
mechanism that took a finite list of operations to do and that could do 
them asynchronously. However, short of inventing a new language like SQL, 
or an elaborate API with some sort of symbolic expressions, I don't really 
see how to expose that sanely.

We could let the script see a stale copy of the data and have it say when 
it wants its changes committed back, but then we'd overwrite data left 
right and centre, as described earlier. Other solutions like explicit 
locks and transactions that can fail unpredictably are equally big 
pitfalls for Web authors.

At least with what we have now the worst case scenario is pretty well 
understood and is actually no worse than all pages from a particular 
origin doing "while (true) {}" all at once, which is already possible, 
and just results in a "slow script" dialog.

On Tue, 15 Sep 2009, Jonas Sicking wrote:
> 
> First off, I agree that not having localStorage in workers is a big 
> problem that we need to address.
> 
> If I were designing the localStorage interface today I would use the 
> above interface that you suggest. Grabbing localStorage can only be done 
> asynchronously, and while you're using it, no one else can get a 
> reference to it. This way there are no race conditions, but also no way 
> for anyone to have to lock.
> 
> So one solution is to do that in parallel to the current localStorage 
> interface. Let's say we introduce a 'clientStorage' object. You can only 
> get a reference to it using a 'getClientStorage' function. This function 
> is available both to workers and windows. The storage is separate from 
> localStorage so no need to worry about the 'storage mutex'.

I think we should be very careful before introducing a fourth storage 
mechanism to make sure that whatever we introduce really is something 
that's going to be very useful and really solve problems. I'd really 
rather not rush into adding yet another mechanism at this point.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'