[whatwg] SQL section feedback

Tue May 6 22:14:45 PDT 2008

On Thu, 10 Apr 2008, Dimitri Glazkov wrote:
>
> In the current SQL storage spec 
> (http://www.whatwg.org/specs/web-apps/current-work/multipage/section-sql.html), 
> all database operations can be nicely tucked onto a separate thread, so 
> that they don't block the UI thread, except for one place: openDatabase 
> has to query version information and open or create the database.
> 
> This seems a bit out-of-sync (oh no, bad pun) with the rest of the spec, 
> where everything is asynchronous. Would it be more logical/practical to 
> explicitly (per spec) move the actual opening of the database off the 
> main thread? Like so:
> 
> Verifying database version and opening/creation of the database occurs 
> at pre-flight of the transaction, unless the database is already open.
> 
> Thus, no potential UI thread blockage by the database operations during 
> openDatabase invocation, as well as no need to raise the 
> INVALID_STATE_ERR exception.
> 
> What do you think?

This seems like something that UAs could optimise -- knowing what 
databases and what version each origin has seems similar to having 
immediate access to name/value pairs and to cookies, both of which are 
already synchronous. And making it use a callback would make this even 
less usable. :-)

What do other vendors think? Anyone?

On Mon, 12 Nov 2007, Aaron Boodman wrote:
> On 10/31/07, Ian Hickson <ian at hixie.ch> wrote:
> > I agree, but like you I don't know exactly what to say about this. 
> > This is an area where implementation experience will help. It may be, 
> > for instance, that nobody uses more than one database per app, and a 
> > prompt ends up being fine.
> 
> I think one simple way to address this is to move the metadata into the 
> application itself. For example:
> 
> <html applicationName="Google Reader">

I've added <meta name=application-name> to the spec.

However, the name here is really the name of the database, not the name of 
the app.

> You can imagine other things eventually going in to this application 
> metadata, such as icons at various sizes.

> I also think the estimated bytes thing is not that useful. It's going to 
> be hard for developers to estimate this and they're all just going to 
> put 64k or something they copied from an example.
> 
> I think it should be up to the UA to make sure that the database does 
> not grow "too large" and to prompt the user for access to more storage 
> when necessary on behalf of the application.

I agree that these arguments might not be useful. I propose that we see 
what authors do and what browsers do and remove the feature if it turns 
out to be useless.

On Tue, 27 Nov 2007, Brady Eidson wrote:
>
> In the provided spec, openDatabase() can fail in exactly one case, leading to
> an exception:
> "If the database version provided is not the empty string, and the database
> already exists but has a different version, then the method must raise an
> INVALID_STATE_ERR exception."
> 
> For the sake of providing implementation feedback, I'd like to add another
> specified failure condition:
> If the user agent decides for policy reasons that a database should not be
> opened/created (the origin's quota has already been met/exceeded, the user has
> disabled databases altogether, etc), there should be an exception.
> 
> For the sake of argument, I can't think of any existent error code that works
> better than INVALID_STATE_ERR.  NO_DATA_ALLOWED_ERR fits based on name alone,
> but not based on intended use.

Added. Note that a full quota is not a reason to prevent access (disabled 
databases is). I used a security exception btw, not INVALID_STATE_ERR.

On Fri, 8 Feb 2008, Alexey Proskuryakov wrote:
> 
>   It seems to be a natural idea to save Web application state from an 
> unload event handler. But is it guaranteed that client-side database API 
> is still functional at this point? And if it is - can one queue up more 
> statements and/or transactions from statement callbacks?
> 
>   There needs to be some limits put on this, as otherwise a script could 
> continue to use resources indefinitely after a browser window is closed. 
> But I do not see where it is specified, explicitly or implicitly.

It is not safe to do asynchronous things from onunload.

The right thing to do is use onbeforeunload, prompt the user to save, have 
the user cancel the load, save the data, and then have the user retry the 
close or navigation.

(Though note that onbeforeunload isn't yet specced.)

On Fri, 8 Feb 2008, Brady Eidson wrote:
>
> Though it would be very nice to have defined semantics wrt the unload 
> handler, I always thought this work could be done with the 
> beforeonunload handler, instead.
> 
> As far as the unload handler question, what are the semantics for XHR?  
> Seems the application of "saving some application state via I/O" is the 
> same concept here, so the same rules should apply.

Right. Anything asynchronous runs the risk of being shut down when the 
page navigates.

On Tue, 12 Feb 2008, Geoffrey Garen wrote:
>
> I see two options here:
> 
> 1. Delay leaving the page indefinitely, until all outstanding database 
> operations have completed.
> 
> 2. Leave the page immediately, canceling all outstanding database 
> operations.
> 
> Option #1 seems undesirable because it allows a malicious or poorly 
> programmed website to hijack the browser. That's a pretty bad user 
> experience -- one that the database API's asynchronous callbacks were 
> specifically designed to avoid.
> 
> Option #2 is not ideal, but it's workable. A beforeunload event handler 
> can detect unsaved changes and prompt the user to cancel the navigation 
> and save. Once the save succeeds, the page can update its UI to indicate 
> that navigation is OK.
> 
> So, I recommend option #2.

Agreed.

On Mon, 25 Feb 2008, Ralf Stoltze wrote:
>
> - In 4.3.2, the spec defines the concept of origin, with respect to 
> script elements. However, the term is also used in combination with 
> browsing contexts and databases. 4.11.2 says: "Each origin has an 
> associated set of databases."
> 
> So what is the origin of a database?
> - the originating host of the script which creates a database?
> - the origin of the document that script belongs to?

This was actually already defined, but I've moved it into its own 
paragraph to be clearer.

> - From 4.11.2:
> "Otherwise, if the database provided is the empty string, [...]"
> 
> I think this must read "the database version provided".

Fixed.

> - 4.11.3 defines that placeholders simply have to be replaced with 
> values from the arguments array. As I understand, this does not per se 
> ban SQL injections. Will the spec define *how* to replace placeholders, 
> including how to escape and quote values?

Yeah, this will be defined when we define the SQL language subset.

> - From 4.11.3:
> "A mostly arbitrary limit of five megabytes per origin is recommended."
> 
> The session/local storage part defines a quota on a per domain basis. Is 
> there any reason for this inconsistency (since both specs are now based 
> on the origin model)? Circumventing origin restrictions with subdomains 
> is the same for local storage and database storage.

I've merged the two sections to avoid this inconsistency.

> - From 4.11.4:
> "If no rows were returned, then the object will be empty."
> 
> What does emtpy mean? Getting an SQLResultSetRowList that does neither
> have a length attribute nor an item() method?

(This isn't a conformance criteria or normative requirement, it's just a 
statement of fact.)

Clarified.

> - From 4.11.6, step 6:
> "Each statement has a statement, a result set callback, and optionally
> an error callback."
> 
> That looks like the result set callback is mandatory, which is not (from 
> the interface definition).

Fixed.

> - I've seen some discussion on this list regarding the order of 
> execution of statements within one transaction. However, I believe that 
> this was related to an older version of the spec (which had implicit 
> transactions?).
> 
> Based on 4.11.6, step 6.7, I assume the following snippet to always
> execute in order 1, 2, 3?
> 
> db.transaction(function(tx) {
>   tx.executeSql('query 1', null, function(tx, rs) {
>     tx.executeSql('query 2', null, function(tx, rs) {
>     });
>   });
>   tx.executeSql('query 3', null, function(tx, rs) {
>   });
> });

No, this will always execute 1, 3, 2.

> - Small typo in 4.11.5: "dependending"

Fixed.

> - From 4.11.7:
> "In contrast with the localStorage feature, which intentionally allows
> data to be accessed across multiple domains, protocols, and ports
> (albeit in a controlled fashion), [...]"
> 
> Is this still true? My understanding of the current version is that
> third-party scripts can access the localStorage associated with the
> origin of the document they are belonging to, but not any other.

This is now gone.

> - cont'd:
> "this database feature is limited to scripts running with the same
> origin as the database."
>
> Seems like my English is too limited here. What does "running with"
> mean? 
> - the originating host of the script?
> - the origin of the document that script belongs to?
> 
> Again, this comes down to defining the origin of a database.

This section seems to be gone (I can't find "running with" in the spec 
anymore).

On Tue, 26 Feb 2008, Ralf Stoltze wrote:
> 
> So step 3 "Replace each ? placeholder" can be skipped if the underlying 
> DB architecture already has a similar mechanism.

Well, the "underlying DB architecture" is part of the UA, so the UA is 
still doing step 3. I don't really care how. :-)

> A statement can only be queued when executeSql() is invoked.
> 
> It's not clear to me when the top-level executeSql() methods are invoked 
> with respect to the overall processing model from 4.11.6.

They're invoked when the code runs.

If it helps, consider that the following:

 db.transaction(function(tx) {
   tx.executeSql('query 1', null, function(tx, rs) {
     tx.executeSql('query 2', null, function(tx, rs) {
     });
   });
   tx.executeSql('query 3', null, function(tx, rs) {
   });
 });

...is equivalent to:

 function q(tx, rs) {
   tx.executeSql('query 2', null);
 }

 function t(tx) {
   tx.executeSql('query 1', null, q);
   tx.executeSql('query 3', null);
 }

 db.transaction(t);

Here you see the execution order more obviously.

> 4.11.2:
> "The transaction() method takes one or two arguments. When called, the
> method must immediately return and then asynchronously run the
> transaction steps"
> 
> That doesn't necessarily mean to me that any executeSql() is ever
> invoked before the transaction steps are run. Thus, the transaction
> queue would be empty (what makes no sense). 
> 
> Is it meant that all top-level executeSql() methods are executed before
> the transaction steps are run? In this case, I also see execution order
> 1,3,2.

The transaction steps are what call the callback that contains the calls 
to executeSql().

On Wed, 9 Apr 2008, Aaron Boodman wrote:
> >
> > For the sake of simplicity in the API, I see no reason why there 
> > couldn't be a single version of changeVersion() with the 2 strings and 
> > 2 callbacks. Then if a user truly wanted to call changeVersion() but 
> > wasn't interested in the results, they could just pass null for both 
> > of the callbacks.  I suppose the same could be said about 
> > executeSql().
> >
> > But that might go against some DOM'ey or Javascript'ey convention that 
> > I'm not familiar with.
> >
> > I don't feel all too strongly about this.
> 
> Sorry I'm (really) late on this.
> 
> Anyway, I think these two callbacks should be optional. For several 
> reasons:
> 
> 1. It's basically the same API as transaction(), so I feel like it 
> should be consistent.
> 
> 2. I feel like if an argument allows null, then it is logically 
> optional. It feels cleaner to me that it is actually optional as well. 
> That is, it seems arbitrary to me that sometimes in the database api 
> "optional" arguments can either be unspecified or null, but sometimes 
> they have to be specified and null.
> 
> 3. It seems like a valid use case to me to not have an error handler. 
> This isn't in the spec, but I had assumed that unhandled database errors 
> would end up calling window.onerror, and then bubbling to the console, 
> if the UA has one. It's pretty common in JS to not handle every single 
> error, but instead allow unexpected errors to bubble to window.onerror 
> and log them there. This is what gmail does for example. It would be 
> unfortunate to have to have an error handle there just to immediately 
> rethrow so that you don't lose any information.
> 
> Brady, do any of these help convince you at all?

Right now it has three callbacks, and the third is the generic success 
one, which I assume all users will always want to hook into (otherwise, 
what are they doing?). So adding more variants seems pointless. I haven't 
changed this.

On Thu, 10 Apr 2008, Dimitri Glazkov wrote:
>
> It would be "a good thing" to specify some standard way to convert 
> argument types for the executeSql call. This question was first raised 
> on public-html list 
> (http://lists.w3.org/Archives/Public/public-html/2008Feb/0401.html), 
> suggesting that the types, not directly supported by the database engine 
> (SQLite seems to be the logical choice) are treated as strings.
> 
> In other words, if the type of the argument is not null, string, double, 
> or int, the argument is implicitly converted to to string.
> 
> As an alternative, I would like to present a slightly different 
> approach: if, during step 3 of the executeSql algorithm 
> (http://www.whatwg.org/specs/web-apps/current-work/multipage/section-sql.html#executing), 
> an argument is found to be unpalatable for the underlying database 
> implementation, the statement is marked bogus.
> 
> Though a bit more stringent than the implicit conversion, this approach 
> explicitly prevents any subtle coercion bugs and will probably lead to 
> more debuggable code.
> 
> What are your thoughts on this, oh wise WHATWG oracle?

I think we should get implementation experience here and update the spec 
to match implementations once there are several implementations that have 
shown how to do this.

On Thu, 10 Apr 2008, Aaron Boodman wrote:
> 
> So, we have some experience with this, having built several large 
> applications using an earlier version of the database API. Our current 
> implementation does coerce to string (we had the same initial assumption 
> as you), and having learned our lesson, we would like to fix this in v2.
> 
> Here are a couple reasons why we would like to make this more strict:
> 
> 1) There are lots of strange edge cases. SQLite does not support 
> boolean. Would you coerce it to a string? That means that if you insert 
> <false> and then select it, you get back something that evaluates to 
> <true>. Similar problem for <undefined>. Maybe a more sensible coercion 
> for these two types would be to int, but it seems weird to make 
> arbitrary distinctions for some popular input types, and you're still 
> losing information.
> 
> 2) Permanence. Coercion is a nice convenience most of the time in JS, 
> but I think with a permanent store, the damage you can do accidentally 
> goes up and warrants more care. An author probably will not find out 
> that he accidentally coerced undefined to string until long after the 
> damage has been done. It may be difficult to fix the data retroactively. 
> It doesn't seem worth the convenience to me. We actually tried to fix 
> this bug a little while after launch and found non-Google applications 
> in the wild accidentally storing <undefined>. So this is a real concern 
> for us.
> 
> 3) Forward compatibility. If an implementation allows all valid js types 
> and coerces unsupported ones to supported ones, then it can never add 
> supported types. This is bad for implementors.
> 
> So for all these reasons, the Gears team will probably not be 
> implementing coercion in our implementation of the HTML5 database spec. 
> I'm not sure whether this belongs in the spec (it is tied to SQLite 
> details), but I think it is something vendors should keep in mind, and 
> it would be nice if we aligned on.

I've made a brief note of this in the spec. Thanks for the input. It will 
be interesting to see what else people say.

On Thu, 10 Apr 2008, Dimitri Glazkov wrote:
>
> Is there any reason not to make name and version of the openDatabase() 
> method in the SQL storage spec optional 
> (http://www.whatwg.org/specs/web-apps/current-work/multipage/section-sql.html)? 
> In some implementations, there may not be an opportunity to offer a UI 
> where these parameters will be used.

Assuming you mean the display name and expected size (this was clarified 
in a later e-mail I haven't included here), they are optional in the sense 
that UAs can ignore them. However, it makes no sense to make them optional 
to the author, since the UAs that _do_ use them will need them always 
specified.

On Thu, 10 Apr 2008, Aaron Boodman wrote:
>
> I was referring to was the version parameter to open(). It can currently 
> be the empty string, but it cannot be omitted. Any reason why not? Some 
> (most?) applications will not need this field and it seems unfortunate 
> to force them to pass an empty string.

Well, there are two arguments after it, so omitting it wouldn't work well. :-)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'