[whatwg] Offline Web Apps

Thu Sep 20 01:13:20 PDT 2007

My commentary below.

Overall, I think the basic model is fairly sound. But I do think some  
improvements could be made.

On Sep 6, 2007, at 5:46 PM, Ian Hickson wrote:
>
> Ok, new proposal:
>
> There's a concept of an application cache. An application cache is a  
> group
> of resources, the group being identified by a URI (which typically  
> happens
> to resolve to a manifest). Resources in a cache are either top-level  
> or
> not; top-level resources are those that are HTML or XML and when  
> parsed
> with scripting disabled have <html application="..."> with the value  
> of
> the attribute pointing to the same URI as identifies the cache.
>
> When you visit a page you first check to see if you have that page  
> in a
> cache as a known top-level page.

Is there any need to treat "top-level" resources differently? If the  
user directly loads a PNG, JPG or for that matter PDF that's part of  
an offline manifest, I think it makes sense to serve it from the app  
cache.

It seems like it would simplify the model a bit a bit for the offline  
cache to treat all items in the manifest as part of the application  
when visited directly.

The only problem here is the potential inconsistency if an HTML or XML  
document doesn't have the <html application="..."> declaration at the  
top, but is still cited in some other app's manifest. Then it would be  
treated as part of the application if an app page citing that manifest  
was visited first, but not if it wasn't. I think this is ok though and  
may even be a desirable behavior. For instance, you might not want a  
single flickr photo page to be an app by itself, but you'd still want  
it to be treated as part of the app domain for someone who had visited  
the main application page.

>
> If you do, skip the next two paragraphs; the 'new cache' flag is set  
> to
> false.
>
> If you don't, you fetch the URL. If it has no application=""  
> attribute,
> then do whatever the normal thing to do is. Ignore the rest of this.
>
> The presence of the attribute indicates that it's expecting an  
> application
> cache to apply. The presence is detected at parse time, and must be
> present on the first <html> start tag before any other start tags.  
> Check
> that the attribute's value is same-origin safe. If it isn't, pretend  
> the
> attribute wasn't there (and ignore the rest of this). Otherwise,  
> check to
> see if you already have a cache for the given URI. If you don't,  
> create a
> new cache identified by the given URI. In any case, save this  
> resource to
> the identified cache, as a known top-level page for that cache.  
> Then, act
> as if you had known about the cache when you started (next step),  
> except
> with the 'new cache' flag set to true.
>
> Load the page from the cache and display it.

I assume any resource that's not found in the cache can be loaded  
normally (it would have to be if this is a brand new cache). Actually,  
I'm not sure "from the cache" makes sense here given the next sentence.

> Any resources that the page
> tries to fetch using GETs that aren't XMLHttpRequest'ed must be  
> taken from
> the cache, if available.

Is it really the right thing for XMLHttpRequest to bypass reading from  
the cache? It makes sense to me that they wouldn't be implicitly  
stored in the cache, but I don't think the data you get for a URI  
should depend on whether you used XMLHttpRequest or loaded it in a  
frame. To be fair, I'm not sure why you'd want to do an XHR for a  
resource that then gets served from the offline cache. But I'm also  
not sure why you'd list an item in your manifest that you then wanted  
to load with XHR. So it seems simpler to omit this slight complication.

> When they aren't, the resources must be fetched then stored in the  
> cache.

If there is an explicit manifest, it seems wrong to store things in  
the cache that aren't in the manifest but are part of this page. That  
means you get the union of the manifest and things the page loads,  
which will make offline behavior hard to debug I think. It would be  
better to fetch the manifest (possibly getting it from the existing  
application cache, if any) before proceeding. Then you'd know which of  
the resources loaded as part of this page belong in the cache up  
front. That would affect the following steps.

> Once the UA is ready to do so the UA must go on to the next steps.  
> UAs may
> do this immediately, or may wait for the original page load to  
> complete,
> or may delay it up to a UA-defined minimum delay.
>
> If 'new cache' is true, and the cache identifier URI is the same as  
> the
> URI that was just downloaded and put in the cache: Do nothing.
>
> If 'new cache' is true, and the cache identifier URI is different  
> from the
> URI that was just downloaded: Fetch the resource identified by that  
> URI.
> Store it in the cache. If it's a manifest and it parses correctly,
> download all the URIs given in that manifest and add them to the  
> cache. If
> any are HTML files which, when parsed with scripting disabled,  
> trigger the
> application="" handling and have a value that points to the same URI  
> as
> the one identifying this application cache, then mark them as known
> top-levels for this cache.

There would be no need to parse the resources if there were no  
distinction drawn between top-level and other resources in the cache.

> If 'new cache' is false: Create a new cache. Fetch the resource with  
> the
> URI of the cache identifier. If it's a manifest, and it has changed  
> from
> what's in the last cache, and it parses correctly, download all the  
> URIs
> in that manifest and add them to the new cache.

I would suggest going a little beyond the http caching rules. I  
propose that if the manifest is unchanged (as defined below), the UA  
doesn't need to download anything. This makes it possible to give the  
manifest a fairly short http expiration, so that checks for updates  
are relatively frequent, but make the checks themselves extremely  
cheap. This would require some modifiable version field in the  
manifest to let it change when the contents of a referenced resource  
have changed, but the set of resources hasn't.

A UA may consider the manifest "unchanged" if any of the following  
conditions applies:

- If the http freshness lifetime of either the copy in the offline  
cache or the copy in the normal browser cache has not expired
- If a conditional request relative to a copy in either the offline  
cache or the browser cache (via If-Modified-Since or If-Match) gives a  
304 Not Modified response
- For non-http protocols, if it appears unmodified using whatever  
caching scheme is appropriate to the protocol

But if none of these applies, the UA should not compare the actual  
manifest data and should assume the manifest has changed and refetch  
the resources (possibly using a cache).

Note that if the manifest is generated dynamically server-side, then  
it can always appear new when any resource it points to has changed  
but still easily save a lot of needless http traffic using ETags.

Also, another resource to check manifest freshness before proceeding  
with a page load is to be able to provide the app with some way of  
knowing that it is going to upgrade. Then it could choose to display  
custom upgrading UI instead of proceeding with a normal load of all  
its resources. In this case though, it would need an event when the  
upgrade finishes successfully but also one when it fails.

> If the manifest has an upgrader entry, use that as the upgrader as  
> described below. Otherwise, if
> it's not a manifest but an HTML/XML file, and it has changed from  
> what's
> in the last cache, use that as the upgrader as described below. If  
> it's a
> manifest that misparsed, or if it's another kind of file, then act  
> as if
> it the URI just pointed to the top level page being loaded (and use  
> that
> as the upgrader as described below). If the newly updated cache  
> doesn't
> contain the current top-level page, then fetch that too.

I think it would be preferable if a value that isn't either the empty  
string or a reference to a valid manifest were treated as if the  
application attribute was unset. The rules above make it too easy to  
mistakenly think you are using a manifest when actually you are using  
implicit application mode, in a way that may not readily show up in  
offline testing. Plus, getting rid of the ability to define an  
application via an HTML file other than the current one removes the  
need for the hidden background browsing context thing, which seems  
like a whole mess of needless implementation complexity.

> When a file is fetched by the main page loading in a background  
> browsing
> context, the loads are conditional loads, so that files that haven't
> changed since the previous update are directly copied from the old  
> cache.

This should (of course) apply to loads of resources cited in a  
manifest and the manifest itself, as suggested above.

> If the newly update cache's copy of the top level page being shown  
> is no
> longer categorised as a "known top-level" for this cache (e.g.  
> because it
> doesn't have an <html application> attribute any more) then inform the
> user, e.g. an infobar saying something like "This application may no
> longer be available. (( View new page in a new window )) (( Delete
> application from cache )) (( Keep application in cache and check for
> updates later )) [x]". The first of these buttons would just show the
> background browsing context in the foreground. The second would  
> delete the
> webapp cache and reload the page from the normal cache, and the third
> would just not do anything special. Don't run the upgrader in this  
> case.

Not distinguishing "top-level" resources would remove the need to  
present such potentially confusing UI to the user. (A page with  
implicit manifest, i.e. pointing to itself as the cache, could simply  
cease to get special caching if a version is loaded that doesn't have  
<html application=""> set).

> If any of the files being updated in the new cache are 4xx or 5xx,  
> or fail
> for some other reason (e.g. DNS errors, user went offline), then the  
> UA
> should alert the user to this fact somehow (infobar maybe) -- "An  
> error
> occurred while updating the application. (( View details )) [x]" --  
> and
> then wait a few minutes (or longer if it can tell it'll fail again)  
> before
> trying again.

I think this is inappropriate. The offline model should work with  
intermittent connections or in captive wifi networks, and showing this  
kind of error to the user seems unhelpful. What's wrong with just  
using the complete old version and trying the update again later?

> Upgrader:
> Create a hidden browsing context.
> Load the upgrader in it.

I don't like this whole upgrader idea. Parsing HTML and CSS and  
executing JavaScript seems like an inefficient way to do an app  
update. I think it is reasonable to require a manifest file for  
multipage apps, and writing an HTML/CSS/JS upgrader that can cover all  
pages of a multipage app does not seem significantly easier than  
creating a manifest file. The implicit manifest idea seems handy as a  
quick way to handle one-page apps but it does not seem reasonable for  
the multipage case, and this would obviate the need for an upgrader.

> Just before onload, fire an 'upgrading' event to every instance of a
>  top-level page using a cache with the same identifier.

Whether or not there are upgraders though, I think events should  
dispatch when a manifest-based upgrade either completes or fails (and  
perhaps also when the upgrade starts).

> The event has a handle to the Window object of the hidden browsing
>  context.
> After every 'upgrading' event has been fired, the 'load' event must be
>  fired on the upgrader.
> After that happens, if any of the aforementioned instances are still
>  using old versions of the cache, then the user agent may inform user
>  they can reload to update.

I think it would be preferable to let the apps upgrade themselves  
instead. They could choose to use location.reload() if they are not  
holding any interesting state, or they could offer to save the user's  
state before doing this, or they could make some alternate call that  
requests all new resource loads for this instance should come from the  
freshly upgraded cache, which would let it perform an upgrade manually  
preserving current state if feasible.

> The Upgrader can do such things as updating the database schema  
> between
> versions, and when there are multiple instances running, it allows  
> them to
> negotiate who will do that work instead of it happening several times.

I would suggest instead that the instance that triggered the upgrade  
be given a special event so that it can do this and could optionally  
present in-page UI while doing so. This seems simpler than adding a  
hidden browsing context. Changing the schema may require pausing other  
instances, however, if there is no way to lock the database.

> Modal alerts (window.alert, .prompt, etc) in the background page can
> either raise an exception, be ignored, drop a message to a console, or
> possibly display a message over the top of the foreground app's  
> browsing
> context.

To avoid such complexities it would be better to avoid the idea of a  
hidden upgrader. And in-page UI could be more tasteful than prompts or  
alerts.

> The manifest format has:
> a list of URIs.
> optionally a place to have an opaque string which can be changed
>  arbitrarily (this gives authors a way to change the manifest when  
> they
>  want things to be refetched).
> optionally a URI for an upgrader (HTML file).

I'd skip the upgrader part. I would also consider adding optional  
versions of resources where the UA may assume if the version number is  
unchanged it doesn't have to fetch that resource (not even  
conditionally) as part of an upgrade to make the supercaching effect  
even more super, but perhaps that's overkill.

> We provide an API that can add files to the cache, and that can be  
> queried
> to determine if we are in upgrader mode or not, and that can swap in a
> new cache without reloading the page, during the 'upgrading' event.

Other API I'd suggest:

1) Request an immediate attempt at upgrade, notwithstanding apparent  
freshness of the manifest. This could be used to force an upgrade in  
"oops" situations where the manifest has a long expiration but a buggy  
version of the app is accidentally shipped and the server gives an  
error to ask the app to update immediately.

2) A way to send messages to other app instances - this way, an  
instance performing a database scheme update could ask other instances  
to hold off on database access, or similarly for an instance doing a  
sync of data from the network to the local database.

3) An API to explicitly remove resources from the cache.

I'm not sure if an API to introspect what is currently in the cache is  
needed. I can't think of a use case off hand. But both the Google  
Gears LocalServer API and the Mozilla offline API have this.

> (If a particular URI is in an application cache as a known top- 
> level, but
> later is fetched and found to be a known top-level for another
> application, e.g. because two other pages both fetch that page in  
> their
> manifest and the server returns pages with different application=""  
> links
> for those two apps, then if the page is visited directly, it uses  
> the app
> cache of the last cache to have found it as a top-level. This causes
> problems if visiting the page directly would return yet another cache
> identifier, as then you could only see that page if you'd never seen  
> the
> others. I'm not clear about what to do about that.)
>
> Maybe we should check for updates more often than just when the top- 
> level
> page is loaded. e.g. we could do it on a timer, or on every cache  
> hit when
> online.

I don't think an already-loaded running instance should trigger a  
cache update implicitly, only if it explicitly asks. So I'd advise  
against these.

See also my other email about offline fallback pages. These should be  
specified in the manifest.

A la the Google Gears API, I also think a feature is needed to do  
something useful with <input type="file"> when offline, to save a  
resource for later upload to the server. Preferably this should not  
require round-tripping the data through an ECMAScript string or number  
array, or it will be too inefficient to work for large files.

I also don't see how apps that require login will be able to work  
offline. Do you need to make sure to check the appropriate "remember  
me on this computer" checkbox (perhaps not desirable for the security- 
conscious, and not available on all apps in any case)? Do you get to  
access the app when offline without having to go through login at all  
(which seems like a security issue)?

Regards,
Maciej