[whatwg] localStorage feedback

Wed Dec 2 02:06:34 PST 2009

Summary: I haven't removed the storage mutex, but I did adjust when it is 
released slightly to handle some cases that had been missed before.

On Thu, 29 Oct 2009, Darin Fisher wrote:
> On Mon, Oct 12, 2009 at 7:07 PM, Ian Hickson <ian at hixie.ch> wrote:
> > >
> > > the problem here is that localStorage is a pile of global variables. 
> > > we are trying to give people global variables without giving them 
> > > tools to synchronize access to them.  the claim i've heard is that 
> > > developers are not savy enough to use those tools properly.  i agree 
> > > that developers tend to use tools without fully understanding them.  
> > > ok, but then why are we giving them global variables?
> >
> > The global variables have implicit locks such that you can build the 
> > tools for explicit locking on top of them:
> >
> >   // run this first, in one script block
> >   var id = localStorage['last-id'] + 1;
> >   localStorage['last-id'] = id;
> >   localStorage['email-ready-' + id] = "0"; // "begin"
> >
> >   // these can run each in separate script blocks as desired
> >   localStorage['email-subject-' + id] = subject;
> >   localStorage['email-from-' + id] = from;
> >   localStorage['email-to-' + id] = to;
> >   localStorage['email-body-' + id] = body;
> >
> >   // run this last
> >   localStorage['email-ready-' + id] = "1"; // "commit"
>
> Dividing up work like this into separate SCRIPT elements to scope the 
> locking seems really awkward to me.

Oh I agree, I'm just saying that your original point -- that we are trying 
to give people global variables without giving them tools to synchronize 
access to them -- is mistaken insofar as we _are_ trying to give them 
tools to synchronise access, that's what this whole thread is basically 
about. It just happens that those tools have unfortunate side-effects 
(they can require synchronous blocking of other processes running code for 
the same domain).

> > > The current API exposes race conditions to the web.  The implicit 
> > > dropping of the storage lock is that.  In Chrome, we'll have to drop 
> > > an existing lock whenever a new lock is acquired.  That can happen 
> > > due to a variety of really odd cases (usually related to nested 
> > > loops or nested JS execution), which will be difficult for 
> > > developers to predict, especially if they are relying on third-party 
> > > JS libraries.
> > >
> > > This issue seems to be discounted for reasons I do not understand.
> >
> > You can only lose the lock in very specific conditions. Those 
> > conditions are rarely going to interact with code that actually does 
> > storage work in a way that relies on the lock:
> >
> >  - changing document.domain
> >  - history.back(), .forward(), .go(n)
> >  - invoking a plugin
> >  - alert(), confirm(), prompt(), print()
> >  - showModalDialog()
> >  - yieldForStorageUpdates()
> >
> > I discussed this in more detail here: 
> > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-September/023059.html
>
> You are right that the conditions are specific, but I don't know that 
> that is the exhaustive list.

If it's not, then we should fix it.

> Rather than defining unlock points, I would implement implicit unlocking 
> by having any nested attempt to acquire a lock cause the existing lock 
> to be dropped. Nesting can happen in the cases you mention, but 
> depending on the UA, it may happen for other reasons too.

I don't think this is equivalent at all. Most of the release points are 
points where the thread is about to hang (sync XHR, alert(), etc) and we 
want to unlock other processes. The plugin case is about third-party code 
causing a deadlock through undetectable thread synchronisation (e.g. where 
a plugin causes code in another process to try to grab the lock and waits 
for that to complete before handing control back to the UA). The nesting 
defence only helps in the case of preventing deadlocks with two processes 
trying to grab different domains' locks in reverse order from each other, 
as far as I can tell, which is just a subset of the cases where it is 
released per the spec.

> This combined with the fact that most people use JS libraries means that 
> the coder is not going to have an easy time knowing when these specific 
> conditions are met.  I don't think defining a set of allowed unlock 
> points is sufficient to make this API not be a minefield for users.

I don't think anyone is arguing that this is ideal. I do think it's orders 
of magnitude better than having the API be always racy even without any 
libraries being involved.

> For example, a JS library might evolve to use flash for something small 
> (like storage or sound) that it previously didn't use when I first 
> developed my code. Voila, now my storage lock is released out from under 
> me.

At least this is documentable.

On Sat, 31 Oct 2009, Darin Fisher wrote:
> 
> Unlock if yieldForStorageUpdates is called.
> Unlock when returning from script execution.
> Unlock if another attempt to lock occurs (any form of nesting).
> 
> In the third case, I'd probably log something to the JS console to alert 
> developers.

You'll also need to unlock for cases like alert(), prompt(), sync XHR, and 
so on (the enumerated cases above that temporarily hang the thread).

This seems like a fine implementation strategy. Please let me know if you 
find any cases where it happens other than those in the spec though.

> So, we are currently on track to support this feature without locking.  
> In the future, we might add locking.  I've also considered other 
> solutions, like copy-on-write, which could obviously lead to data loss, 
> but at least it would ensure stability/consistency within a scripts 
> execution.  I would like it if the spec were open to such 
> implementations.

The spec would either require such behaviour or not; I don't think we 
should explicitly allow multiple different (non-interoperable) behaviours. 
That's worse than the race conditions, IMHO.

On Mon, 2 Nov 2009, Robert O'Callahan wrote:
> 
> I think this would make the spec too dependent on implementation 
> details. If your implementation needlessly or accidentally "nests" 
> script execution --- e.g. by firing an event synchronously that should 
> be, or could be, asynchronous --- then you'd still conform to your spec 
> while the behaviour you present to authors gets quietly worse.
> 
> If your description is (or can be, after suitable modifications) 
> equivalent to what the spec currently says, but the equivalence is 
> subtle (which it would be!), then I think they should *both* be in the 
> spec, and the spec should say they're equivalent. Then if we find 
> they're not equivalent, we clearly have a bug in the spec which must be 
> fixed --- not carte blanche to proceed in an undesirable direction. It 
> would be a sort of spec-level assertion.

I would rather not require potentially conflicting things in the spec. 
Historically, I've found that people tend to just ignore one of the 
requirements (picking whichever one is most convenient for them to ignore) 
and not report it.

On Mon, 2 Nov 2009, Darin Fisher wrote:
> 
> I think the spec currently calls attention to only some situations that 
> could lead to nesting of implicitly acquired storage locks.
>
> I previously described some other situations, which you and others 
> indicated should be treated as WebKit and IE bugs.  I didn't look very 
> far to dig those up.  After some more thought, I came up with these 
> additional cases that the spec doesn't cover:
> 
> 1a) Given a page (domain A) containing an iframe (domain B), have the 
> outer page navigate the inner frame to "about:blank".  This navigation 
> completes synchronously, and the unload handler for the iframe runs 
> before the navigation request completes.  This is true of all browsers.

Good point. I've made the navigation algorithm release the mutex, and made 
some related changes to make sure that all synchronous events are 
similarly safe. (To make things easier to reason about, I also made all 
history.go() calls and anything else that navigates the history simply 
always release the mutex.)

> 1b) Suppose the inner page has a pending XMLHttpRequest when the outer 
> frame navigates the inner frame.  The XHR's onabort handler would run 
> before the navigation to "about:blank" completes.

Per XHR2, the 'abort' event doesn't fire in this case.

> 2) Set a break point in the Mozilla JS debugger.  This runs a nested 
> event loop each time you single step so that it can drive the rest of 
> the browser UI.
> 3) Install a Firefox extension that runs a nested event loop in response 
> to an event generated by content.  I debugged many Firefox crashes 
> resulting from extensions that do this kind of thing for various 
> reasons.

Both of these are non-conforming situations. There's a whole bunch of 
situations that act differently under the debugger; releasing the storage 
mutex in those cases is a UA-specific behaviour and is the least of your 
problems as a programmer.

> It seems odd to me that this behavior was put into the spec without any 
> implementation experience to guide it.  The only multi-process 
> implementations that I know of do not have a storage mutex.

Implementation experience _did_ guide it. In fact, that is pretty much the 
only reason this is in the spec.

On Tue, 3 Nov 2009, Darin Fisher wrote:
> 
> Will we just keep amending the spec each time we find such a possible 
> case?

Yes.

> I think it is far saner to say that any nesting leads to unlocking the 
> storage mutex.  The spec can then list cases where this nesting might 
> occur.

It wouldn't make sense to spec it that way. The spec just has one mutex, 
it's not per-origin -- that you can implement it per-domain is an 
optimisation that is enabled by design, but it isn't how the spec defines 
it. So you can never block, the way the spec is written (since there's no 
way to ever do a synchronisation of multiple processes).

In addition, as mentioned above, the storage mutex is released in many 
cases other than just those that could lead to a thread grabbing the mutex 
on behalf of two origins at once.

I guess I could add a note in the "obtain the storage mutex" algorithm 
saying that if the UA implements the storage mutex per-origin, then it 
should never be able to reach this algorithm while holding the mutex from 
an origin different than the one it's about to try to grab. Would that 
help? I'm not sure it makes sense to say that, though, given the lack of 
per-origin mutexes in the spec.

> >> 2) Set a break point in the Mozilla JS debugger.  This runs a nested 
> >> event loop each time you single step so that it can drive the rest of 
> >> the browser UI.
> >>
> >> 3) Install a Firefox extension that runs a nested event loop in 
> >> response to an event generated by content.  I debugged many Firefox 
> >> crashes resulting from extensions that do this kind of thing for 
> >> various reasons.
> >
> > These are internal Mozilla issues and should not be allowed to 
> > influence the design of the Web platform. They'll probably change for 
> > multi-process anyway.
> 
> OK, but my point is that the spec should afford implementors with the 
> ability to unlock the storage mutex at other times for reasons not 
> mentioned in the spec.

How do we do that without making the mutex-free implementation correct?

> OK. Please note my objection to the storage mutex.

I think everyone objects to the storage mutex. The problem is we have 
nothing better. Race conditions aren't better.

On Tue, 3 Nov 2009, Jeremy Orlow wrote:
> 
> If we do this, we need to re-visit ways that scripts can tell whether 
> the lock has been dropped.  I can't remember which idea was most in 
> favor last time we talked about it, but a counter that increments every 
> time LocalStorage is unlocked sticks out in my mind.  (Scripts can check 
> the counter, do something that could cause unlocking, and then verify 
> the counter is still the same after.)

The counter could be a boolean -- does the event loop have the lock [for 
the origin of the first script]. I would recommend implementing this as 
a property window.navigator.webkitStorageUpdatesLocked which is true when 
the lock is obtained and false when it is released. If people find it 
useful, we can add it to the language.

> Another option that just came to mind is to have some flag that says 
> "throw an exception whenever there's been a serialization violation".

Could you elaborate on this?

On Wed, 4 Nov 2009, Rob Ennals wrote:
> 
> How about this for a solution for the localStorage mutex problem:
> 
> "the user agent MAY release the storage mutex on *any* API operation 
> except localStorage itself"
> 
> This guarantees that the common case of "several storage operations in a 
> row with nothing in-between" works, but gives the implementors the 
> freedom to release the storage mutex wherever else they find they need 
> to.
> 
> I ran this by a few people at the W3C tpac (where I am now) and everyone 
> I talked to seemed to think this would work.

On Thu, 5 Nov 2009, Jeremy Orlow wrote:
>
> Leaving the language open might not be terribly useful to a typical web 
> developer since they're not going to read the spec and probably aren't 
> going to have a very firm idea of whether what they're doing could 
> unlock storage or not.  Experimentation wouldn't work very well since 
> each platform could be wildly different (since a lot of possible 
> behaviors fall between the MAY and the MAY NOT in the proposed spec).
> 
> Another concern is that the worst case performance aspects of 
> LocalStorage remain.  I cringe every time I think of one event loop 
> blocking another.
>
> But I will admit that the average time would be better--especially if 
> we're unlocking fairly aggressively.

On Sat, 7 Nov 2009, Chris Jones wrote:
> 
> IMHO, this is actually worse than the current proposal of a global mutex 
> :S. This proposal makes atomicity guarantees not only library-dependent, 
> but browser-implementation-dependent.  For example
> 
>   a = localStorage.x()
>   jquery.foo()
>   b = localStorage.y()
> 
> If |jquery.foo()| were, say, parsing JSON or determining selector 
> matching, it might involve "browser API calls" in some browser, and in 
> others not.
> 
> Worse, if |jquery.foo()| involves accessing browser-managed things like 
> computed DOM attributes, then even in the *same* browser it might result 
> in sometimes needing a "browser API call", and sometimes only needing a 
> JS-only call.  (Depending on DOM attribute cache status, if present.)
> 
> This of course of depends on the definition of "browser API call", but I 
> interpret this as approximately meaning "calling from JS to C++".
> 
> These objections are in addition to those made by Jeremy Orlow 
> concerning a script-managed, possibly cross-process mutex, which I also 
> find unpleasant.

On Sun, 8 Nov 2009, Adam Barth wrote:
> 
> As I mentioned to Ian at TPAC, one way to make this more predictable is 
> to release the lock on *every* function call and return.  This provides 
> content enough atomicity to build whatever locks it needs.

On Mon, 9 Nov 2009, Robert O'Callahan wrote:
> 
> Then simple refactorings that should be semantics-preserving --- like 
> extracting a block of code into a shared function --- would break 
> atomicity. This also places a new burden on JS implementations.

On Mon, 23 Nov 2009, Rob Ennals wrote:
>
> The planned model is that if you don't want to lose the storage mutex 
> then you shouldn't call any code that you didn't write yourself. Any 
> external library (e.g. jquery) should be assumed to release the storage 
> mutex. Perhaps that should be made explicit in the spec?
> 
> We want to guarantee essential things like "several localStorage ops in 
> a row are atomic", and make sure any language-level transformations that 
> people expect to be semantics preserving remain so (e.g. moving code 
> into a function), while still giving the browser the ability to release 
> the mutex whenever it might need to.

On Tue, 24 Nov 2009, Rob Ennals wrote:
>
> Yes. I think that *any* DOM operation, including getElementById should 
> be allowed to release the storage mutex. If we start specifying 
> particular categories of API that do or do not release the mutex then 
> thing will just get confusing, or we will find that we have missed 
> something important.
> 
> The basic rule of thumb should be that if you want to do an atomic 
> storage operation then you do the following:
> 
> 1.) Gather all the data that you want to use in your storage transaction 
> (using whatever DOM call you need)
>
> 2.) Perform a sequence of storage operations, not interrupted by any 
> calls to any API or external library.
> 
> This is consistent with the way that systems programmers write 
> concurrent sections. You gather all your data using whatever 
> potentially-blocking APIs you need (e.g. opening files), and then hold 
> your lock for as brief a period of time as possible while doing the 
> updates you need.
> 
> If you run your browser in "super-warnings-enabled" mode then you could 
> have it warn you if you did anything remotely suspect between calls to 
> localStorage (e.g. calling a function defined by an external javascript 
> file or calling an API).
> 
> As you say, from a programmers point of view, it would be much nicer if 
> programmers could use whatever APIs they liked, without having to worry 
> about losing the storage mutex, however, as we have seen earlier in this 
> list, that isn't something that the browser vendors believe they can 
> implement. Some operations unavoidably *have* to release the storage 
> mutex, and it is surprisingly difficult to specify what operations these 
> are, or when they might be called.
> 
> If we are going to fix the current spec, and give users something that 
> they can understand, then I see little alternative but to tell them that 
> *all* APIs and external libraries may release the storage mutex. It 
> isn't perfect, but it does at least give users an invariant that they 
> can understand, that they can be sure will always hold, and that can be 
> implemented by all browsers. Moreover, it allows a programmer to do 
> essential basic things like implement a test and set lock.

On Tue, 24 Nov 2009, Jonas Sicking wrote:
> 
> It seems to me that the only difference between what we have now (in the 
> spec), and your suggestion, would be that implementations could in more 
> cases claim to follow the spec. I.e. the only difference seems to be one 
> of PR for implementations. Nothing that would actually help web authors.

On Tue, 24 Nov 2009, Boris Zbarsky wrote:
> On 11/24/09 7:46 PM, Rob Ennals wrote:
> > E.g., I may want to write:
> > 
> > x = localStorage.bla
> > localStorage.bla = x+1
> 
> If that's happening at global scope, the latter line involves global 
> variable access, which involves DOM operations (in particular, looking 
> for windows of that name), no?

On Tue, 24 Nov 2009, Rob Ennals wrote:
>
> Yeah. Such things could be confusing. Perhaps we would want to 
> explicitly specify that global variable access does not release the 
> storage mutex (I'm assuming there is no reason you would ever want it to 
> do so?).
> 
> More generally, the basic principle of this proposal is to specify what 
> *doesn't* release the storage mutex, rather than specifying what *does*, 
> since previous efforts to enumerate everything that releases the storage 
> mutex seem to have got stuck.

On Tue, 24 Nov 2009, Mike Shaver wrote:
>
> How would the browser distinguish between
> 
> storage-operation-1
> inadvertent-API-call
> storage-operation-2-that-should-be-atomic-with-1
> 
> and
> 
> storage-operation-1
> API-calls-to-gather-data-for-another-transaction
> storage-operation-2-that-is-unrelated-to-1
> 
> ?  Seems like that's a necessary distinction if it's not to just warn 
> all over the place uselessly!

On Tue, 24 Nov 2009, Rob Ennals wrote:
> 
> I can imagine that this could get particularly noisy if you use a 
> library which decides to use localStorage itself, and so you aren't 
> aware that localStorage has been used previously elsewhere.
> 
> One simple fix to avoid these warnings would be to introduce two 
> functions, startAtomic and endAtomic, that a user uses to explicitly 
> declare what they think is atomic. startAtomic declares that you are 
> starting an atomic localStorage section, and returns a token. endAtomic 
> declares that you are ending an atomic localStorage section, and takes 
> the token as an argument.
> 
> All localStorage that isn't between startAtomic and endAtomic is assumed 
> to be part of the same atomic transaction. The only reason to use 
> startAtomic and endAtomic is to suppress warnings by making it clear 
> that things are *intended* to be part of different atomic blocks. They 
> would typically be used only in libraries, and not by a user's own code.

On Wed, 25 Nov 2009, Mike Shaver wrote:
> On Wed, Nov 25, 2009 at 6:20 AM, Ian Hickson <ian at hixie.ch> wrote:
> > Reading or writing a property on a native object doesn't do it, so
> >
> >   window['x'].document.forms['y'].value = 'foo';
> >
> > ...doesn't release the mutex, though this (identical code) would:
> >
> >   window['x'].document.forms.namedItem('y').value = 'foo';
> >
> > ...because of namedItem() call.
> 
> I don't think that we can reasonably expect web developers to do that 
> kind of analysis, since they are likely to be working through libraries 
> and other sources of indirection.
> 
> (Especially since I bet there is a lot of documentation describing those 
> two cases as being entirely identical in behaviour.)

On Wed, 25 Nov 2009, Boris Zbarsky wrote:
> On 11/25/09 6:20 AM, Ian Hickson wrote:
> >     - script calling a method implemented in native code on a host object
> 
> If this is a MUST, this seems like a possible compat issue depending on 
> whether the method is native or library-provided, at the very least. 
> There's also been talk at least in Gecko of self-hosting some DOM 
> methods in JS (e.g. getElementById), at which point they will no longer 
> be implemented in native code.
> 
> I'm not sure this is a fatal issue (and I haven't been following this 
> thread closely enough in general to be sure of anything); just pointing 
> out that it's an issue.

On Wed, 25 Nov 2009, Darin Fisher wrote:
> 
> I had a similar thought as it pertains to Chrome.  I also worry about 
> having to do some "storage mutex" processing for every native call.  
> That seems like unfortunate overhead.

On Wed, 25 Nov 2009, Jeremy Orlow wrote:
> 
> Good point.  So even if it's a MUST, it'll be dependent on the platform. 
> And thus not terribly "standard".
> 
> I'd say the current language is better than this proposed change.

I've left the spec as is, since Rob's proposal didn't get a good 
reception.

On Wed, 25 Nov 2009, Jeremy Orlow wrote:
> 
> I know that we've discussed approximations of run to completion before, 
> but maybe it's worth one more shot:  What if on the first use of 
> document.cookie or local storage we took a snapshot of both and used 
> that during the task's execution.  All writes would be queued up until 
> the task finishes, at which point they'd be written to the central 
> version of the cookie and/or local storage.  This would provide a 
> consistent view of data for the duration of the task and would solve 
> almost all the atomicity problems except |document.cookie = 
> document.cookie + "foo";|.  For that, I'd suggest adding a method that 
> allows scripts to do atomic modifications to storage within a callback.
> 
> I can understand everyone's desire to have completely serializable 
> semantics for local storage access and cookies (if you don't count the 
> servers' interaction with them), but maybe we need to go back to use 
> cases.  In a world with WebDatabase/WebSimpleDB, I really don't see 
> anyone turning to LocalStorage except for more basic uses.  Most of 
> which I'm guessing need consistent reads much more than serialization of 
> everything.
> 
> And let's be realistic.  IE has had this problem with document.cookie 
> for a long time.  And IE8 now has this problem with localStorage.  
> Given that in the best case (MS and all others implement the storage 
> mutex) web developers will not be able to assume localStorage and 
> document.cookie access is atomic for many years at a minimum, I think 
> we're being pretty unrealistic about how much the storage mutex is going 
> to improve anyone's life.  Let's come up with an approximation, give 
> developers a callback for atomic access, and be done with it.

As far as I can tell, this wouldn't remove race problems. It would in fact 
make it impossible to avoid them, since as far as I can tell you can't 
build a sane locking mechanism based on the above.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'