[whatwg] Structured clone algorithm on LocalStorage

Tue Sep 29 23:48:56 PDT 2009

On Tue, Sep 29, 2009 at 12:19 AM, Darin Fisher <darin at chromium.org> wrote:
> On Thu, Sep 24, 2009 at 11:57 PM, Jonas Sicking <jonas at sicking.cc> wrote:
>>
>> On Thu, Sep 24, 2009 at 9:04 PM, Darin Fisher <darin at chromium.org> wrote:
>> > On Thu, Sep 24, 2009 at 4:43 PM, Jonas Sicking <jonas at sicking.cc> wrote:
>> >>
>> >> On Thu, Sep 24, 2009 at 10:52 AM, Darin Fisher <darin at chromium.org>
>> >> wrote:
>> >> > On Thu, Sep 24, 2009 at 10:40 AM, Jonas Sicking <jonas at sicking.cc>
>> >> > wrote:
>> >> >>
>> >> >> On Thu, Sep 24, 2009 at 1:17 AM, Darin Fisher <darin at chromium.org>
>> >> >> wrote:
>> >> >> > On Thu, Sep 24, 2009 at 12:20 AM, Jonas Sicking <jonas at sicking.cc>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Wed, Sep 23, 2009 at 10:19 PM, Darin Fisher
>> >> >> >> <darin at chromium.org>
>> >> >> >> wrote:
>
> ... snip ...
>
>>
>> >> >> >> > multi-core is the future.  what's the opposite of fine-grained
>> >> >> >> > locking?
>> >> >> >> >  it's not good ;-)
>> >> >> >> > the implicit locking mechanism as spec'd is super lame.
>> >> >> >> >  implicitly
>> >> >> >> > unlocking under
>> >> >> >> > mysterious-to-the-developer circumstances!  how can that be a
>> >> >> >> > good
>> >> >> >> > thing?
>> >> >> >> > storage.setItem("y",
>> >> >> >> > function_involving_implicit_unlocking(storage.getItem("x")));
>> >> >> >>
>> >> >> >> I totally agree on all points. The current API has big
>> >> >> >> imperfections.
>> >> >> >> However I haven't seen any workable counter proposals so far, and
>> >> >> >> I
>> >> >> >> honestly don't believe there are any as long as our goals are:
>> >> >> >>
>> >> >> >> * Don't break existing users of the current implementations.
>> >> >> >> * Don't expose race conditions to the web.
>> >> >> >> * Don't rely on authors getting explicit locking mechanisms
>> >> >> >> right.
>> >> >> >>
>> >> >> >
>> >> >> > The current API exposes race conditions to the web.  The implicit
>> >> >> > dropping of the storage lock is that.  In Chrome, we'll have to
>> >> >> > drop
>> >> >> > an existing lock whenever a new lock is acquired.  That can happen
>> >> >> > due to a variety of really odd cases (usually related to nested
>> >> >> > loops
>> >> >> > or nested JS execution), which will be difficult for developers to
>> >> >> > predict, especially if they are relying on third-party JS
>> >> >> > libraries.
>> >> >> > This issue seems to be discounted for reasons I do not understand.
>> >> >>
>> >> >> I don't believe we've heard about this before, so that would be the
>> >> >> reason it hasn't been taken into account.
>> >> >>
>> >> >> So you're saying that chrome would be unable implement the current
>> >> >> storage mutex as specified in spec? I.e. one that is only released
>> >> >> at
>> >> >> the explicit points that the spec defines? That seems like a huge
>> >> >> problem.
>> >> >
>> >> > No, no... my point is that to the application developer, those
>> >> > "explicit"
>> >> > points will appear quite implicit and mysterious.  This is why I
>> >> > called
>> >> > out third-party JS libraries.  One day, a function that you are using
>> >> > might transition to scripting a plugin, which might cause a nested
>> >> > loop, which could then force the lock to be released.  As a
>> >> > programmer,
>> >> > the unlocking is not explicit or predictable.
>> >>
>> >> Ah, indeed, this is a problem. However the unfortunate fact remains
>> >> that so far no other workable solution has been proposed.
>> >
>> > OK, so we agree that the current solution doesn't meet the goals you
>> > stated above :-(
>>
>> Well, it addresses them as long as users are aware of the risk, and
>> properly document weather their various library functions will release
>> the lock or not. However I agree that it's unlikely that they will do
>> so correctly.
>
> I thought the point of not having lock APIs was that users shouldn't have
> to understand locks ;-)  The issue I've raised here is super subtle.  We
> have not succeeded in avoiding subtlety!

I think we're mostly in agreement. What I'm not sure about is what you
are proposing we do with localStorage? Remove it from the spec? Change
the API? Something else?

>> >> > Moreover, there are other examples which have been discussed on the
>> >> > list.  There are some DOM operations that can result in a frame
>> >> > receiving
>> >> > a DOM event synchronously.  That can result in a nesting of storage
>> >> > locks,
>> >> > which can force us to have to implicitly unlock the outermost lock to
>> >> > avoid
>> >> > deadlocks.  Again, the programmer will have very poor visibility into
>> >> > when
>> >> > these things can happen.
>> >>
>> >> So far I don't think it has been shown that these events need to be
>> >> synchronous. They all appear to be asynchronous in gecko, and in the
>> >> case of different-origin frames, I'm not even sure there's a way for
>> >> pages to detect if the event was fired asynchronously or not.
>> >
>> > IE and WebKit dispatch some of them synchronously.  It's hard to say
>> > which
>> > is correct or if it causes any web compat isues.  I'm also not sure that
>> > we
>> > have covered all of the cases.
>>
>> It still seems to me that it's extremely unlikely that pages depend on
>> cross origin events to fire synchronously. I can't even think of a way
>> to test if a browser dispatches these events synchronously or not. Can
>> you?
>
> i agree that it seems uncommon.  maybe there could be some odd app that
> does something after resizing an iframe that could be dependent on the
> event handler setting some data field.  this kind of thing is probably even
> less common in the cross-origin case.

But how would you read that data field in the cross-origin frame? I
think it might be possible, but extremely hard.

>> >> >> >> But, as imperfect as the current API is, I think the following is
>> >> >> >> a
>> >> >> >> decent way forward:
>> >> >> >>
>> >> >> >> * Allow pages that want the convenience of localStorage to use
>> >> >> >> it.
>> >> >> >> For
>> >> >> >> multi-process browsers this will mean poor UI *for pages that use
>> >> >> >> localStorage*. Especially when said pages hold on to localStorage
>> >> >> >> for
>> >> >> >> a long time.
>> >> >> >> * Add alternative APIs that don't suffer from the same problems.
>> >> >> >> More
>> >> >> >> below.
>> >> >> >>
>> >> >> >> >> > In addition, this argument assumes that Microsoft (and other
>> >> >> >> >> > UAs)
>> >> >> >> >> > will
>> >> >> >> >> > implement the structured clone version of LocalStorage.  Has
>> >> >> >> >> > anyone
>> >> >> >> >> > (or
>> >> >> >> >> > can
>> >> >> >> >> > anyone) from Microsoft comment on this?
>> >> >> >> >>
>> >> >> >> >> Given that I've never heard microsoft commit to a webstandard,
>> >> >> >> >> ever,
>> >> >> >> >> I
>> >> >> >> >> doubt that we'll hear anything here. Or that the lack of
>> >> >> >> >> hearing
>> >> >> >> >> anything means we can draw any conclusions.
>> >> >> >> >>
>> >> >> >> >> > This is not a small feature to add.  Yes, it's smaller than
>> >> >> >> >> > creating
>> >> >> >> >> > a
>> >> >> >> >> > new
>> >> >> >> >> > storage mechanism (that everyone is willing to adopt), but I
>> >> >> >> >> > still
>> >> >> >> >> > think
>> >> >> >> >> > that's what we should be looking at.  Rather than polishing
>> >> >> >> >> > a
>> >> >> >> >> > turd.
>> >> >> >> >>
>> >> >> >> >> I do think that localStorage is a decent API that developers
>> >> >> >> >> will
>> >> >> >> >> want
>> >> >> >> >> to, and should, use. I think looking into adding a async
>> >> >> >> >> accessor
>> >> >> >> >> to
>> >> >> >> >> get a storage object so that people can use an
>> >> >> >> >> localStorage-like
>> >> >> >> >> API
>> >> >> >> >> while avoiding risks of blocking. This would also allow
>> >> >> >> >> sharing
>> >> >> >> >> data
>> >> >> >> >> between worker threads and the main window.
>> >> >> >> >
>> >> >> >> > i think the async callback to get a storage object is an
>> >> >> >> > improvement,
>> >> >> >> > but
>> >> >> >> > i'm not sure that it addresses all of the problems.  for
>> >> >> >> > example,
>> >> >> >> > if
>> >> >> >> > a
>> >> >> >> > worker
>> >> >> >> > wants to read values from storage, compute, and then put a
>> >> >> >> > value
>> >> >> >> > into
>> >> >> >> > storage, it would probably do all of this from the storage
>> >> >> >> > callback.
>> >> >> >> >  that
>> >> >> >> > would result in holding the lock for a long time, which would
>> >> >> >> > lock
>> >> >> >> > out
>> >> >> >> > any
>> >> >> >> > other threads, including non-worker threads.
>> >> >> >> > the problem here is that localStorage is a pile of global
>> >> >> >> > variables.
>> >> >> >> >  we
>> >> >> >> > are
>> >> >> >> > trying to give people global variables without giving them
>> >> >> >> > tools
>> >> >> >> > to
>> >> >> >> > synchronize
>> >> >> >> > access to them.  the claim i've heard is that developers are
>> >> >> >> > not
>> >> >> >> > savy
>> >> >> >> > enough
>> >> >> >> > to use those tools properly.  i agree that developers tend to
>> >> >> >> > use
>> >> >> >> > tools
>> >> >> >> > without
>> >> >> >> > fully understanding them.  ok, but then why are we giving them
>> >> >> >> > global
>> >> >> >> > variables?
>> >> >> >> > there has to be a better answer.
>> >> >> >>
>> >> >> >> I actually described an potential solution in the thread on
>> >> >> >> worker
>> >> >> >> storage.
>> >> >> >>
>> >> >> >> The problem you describe is a worker holding on the the storage
>> >> >> >> for
>> >> >> >> an
>> >> >> >> very long (indefinite) time, thereby locking out other
>> >> >> >> threads/windows
>> >> >> >> from accessing the same storage area. This seems inevitable if we
>> >> >> >> want
>> >> >> >> to prevent race conditions while at the same time not forcing the
>> >> >> >> complexities of locks onto web developers. The WebDatabase API
>> >> >> >> suffers
>> >> >> >> from exactly the same problem.
>> >> >> >
>> >> >> > Hmm... are you saying that from the SQLStatementCallback used to
>> >> >> > read
>> >> >> > some data out of the database, you might compute on that data, and
>> >> >> > then
>> >> >> > issue an executeSql call to write a computed result, and that in
>> >> >> > this
>> >> >> > scenario,
>> >> >> > the fact that it is the same transaction means that other threads
>> >> >> > are
>> >> >> > locked
>> >> >> > out of accessing the same database?  I hadn't considered chaining
>> >> >> > executeSql
>> >> >> > calls like this to keep the transaction alive.  Hmm...
>> >> >>
>> >> >> Indeed.
>> >> >>
>> >> >> >> However, we can lessen the problem. By adding multiple storage
>> >> >> >> areas,
>> >> >> >> we can allow a worker to use one storage area, while allowing
>> >> >> >> other
>> >> >> >> parties to simultaneously use other storage areas. This way, if a
>> >> >> >> worker and a window aren't sharing data at all, they never get in
>> >> >> >> the
>> >> >> >> way of each other.
>> >> >> >>
>> >> >> >> So a very simplistic design would be something like the
>> >> >> >> following:
>> >> >> >>
>> >> >> >> getStorageArea(name, callback)
>> >> >> >>
>> >> >> >> when called will asynchronously call the callback parameter once
>> >> >> >> the
>> >> >> >> storage area named by the first parameter becomes available. The
>> >> >> >> callback receives the storage area as an argument. We would also
>> >> >> >> have
>> >> >> >> the function
>> >> >> >>
>> >> >> >> getMultipleStorageAreas(names, callback)
>> >> >> >>
>> >> >> >> Same as above, but names is an array of strings indicating
>> >> >> >> multiple
>> >> >> >> storage areas that need to be acquired before the callback is
>> >> >> >> called.
>> >> >> >> The callback receives all the areas in an array as an argument.
>> >> >> >> This
>> >> >> >> function allows transferring data between multiple storage areas
>> >> >> >> without risking racing.
>> >> >> >>
>> >> >> >> There's several problems with this, such as the names are sort of
>> >> >> >> crappy, and that getting storage areas an array isn't very
>> >> >> >> friendly.
>> >> >> >> However you get the basic idea.
>> >> >> >>
>> >> >> >> We don't even need to use Storage objects for this. In fact, I
>> >> >> >> hope
>> >> >> >> mozilla will in a not too distant future come up with an
>> >> >> >> alternative
>> >> >> >> proposal to the WebDatabase SQL API. Something like this might
>> >> >> >> fit
>> >> >> >> into such a proposal as I think that'll have multiple separate
>> >> >> >> storage
>> >> >> >> areas anyway.
>> >> >> >>
>> >> >> >> / Jonas
>> >> >> >
>> >> >> >
>> >> >> > Maybe we should just invent a similar transaction method for
>> >> >> > name/value
>> >> >> > storage?  Wouldn't that be better than inventing a new idiom?
>> >> >> >  Ideally,
>> >> >> > we'd also make reads and writes on storage be asynchronous.  The
>> >> >> > transaction would then be usable to hold the lock across multiple
>> >> >> > asynchronous reads and writes.  Since local storage is backed by
>> >> >> > disk,
>> >> >> > it seems like a more ideal local storage API would not
>> >> >> > require synchronous
>> >> >> > filesystem access.
>> >> >>
>> >> >> Not quite following what you're suggesting, but there's lots of ways
>> >> >> to design this. The critical part is to allow grabbing (with
>> >> >> associated locking) of just a subset of the available storage space.
>> >> >>
>> >> >> / Jonas
>> >> >
>> >> >
>> >> > I was suggesting that we only provide asynchronous getItem / setItem
>> >> > calls,
>> >> > where
>> >> > each call is parameterized by a transaction.  This is how database
>> >> > works.
>> >>
>> >> Not quite sure I follow your proposal. How would you for example
>> >> increase the value of a property by one without risking race
>> >> conditions? Or keep two values in different properties in sync? I.e.
>> >> so that if you update one always update the other, so that they never
>> >> have different values.
>> >>
>> >> / Jonas
>> >
>> >
>> > Easy.  Just like with database, the transaction is the storage lock.
>> >  Any
>> > storage
>> > operation performed on that transaction are done atomically.  However,
>> > all
>> > storage
>> > operations are asynchronous.  You basically string together asynchronous
>> > storage
>> > operations by using the same transaction for each.
>> > We could add methods to get/set multiple items at once to simplify life
>> > for
>> > the coder.
>>
>> I think I still don't understand your proposal, could you give some
>> code examples?
>>
>
>
> ripping off database:
> interface ValueStorage {
>   void transaction(in DOMString namespace, in
> ValueStorageTransactionCallback callback);
> };
> interface ValueStorageTransactionCallback {
>   void handleEvent(in ValueStorageTransaction transaction);
> };
> interface ValueStorageTransaction {
>   void readValue(in DOMString name, in ValueStorageReadCallback callback);
>   void writeValue(in DOMString name, in DOMString value);
> };
> interface ValueStorageReadCallback {
>   void handleEvent(in ValueStorageTransaction transaction, in DOMString
> value);
> };
> then, to use these interfaces, you could implement thread-safe increment:
> window.localStorage.transaction("slice", function(transaction) {
>   transaction.readValue("foo", function(transaction, fooValue) {
>     transaction.writeValue("foo", ++fooValue);
>   })
> })
> to fetch multiple values, you could do this:
> var values = [];
> var numValues = 10;
> function readNextValue(transaction) {
>   if (values.length == numValues)
>    return;  // done!
>   var index = values.length;
>   transaction.readValue("value" + index, function(transaction, value) {
>     values.push(value);
>     readNextValue(transaction);
>   })
> }
> window.localStorage.transaction("slice", readNextValue);
> This has the property that all IO is non-blocking and the "lock" is held
> only
> for a very limited scope.  The programmer is however free to extend the
> life of the lock as needed.

What do you mean by that the "lock" is held for only a very limited
scope? You still want to prevent modifications for as long as the
transaction is being used right? I.e. no modifications can happen
between the read and the write in the first example, and between the
different reads in the second.

/ Jonas