[whatwg] Offline Web Apps
mjs at apple.com
Thu Sep 20 01:13:20 PDT 2007
My commentary below.
Overall, I think the basic model is fairly sound. But I do think some
improvements could be made.
On Sep 6, 2007, at 5:46 PM, Ian Hickson wrote:
> Ok, new proposal:
> There's a concept of an application cache. An application cache is a
> of resources, the group being identified by a URI (which typically
> to resolve to a manifest). Resources in a cache are either top-level
> not; top-level resources are those that are HTML or XML and when
> with scripting disabled have <html application="..."> with the value
> the attribute pointing to the same URI as identifies the cache.
> When you visit a page you first check to see if you have that page
> in a
> cache as a known top-level page.
Is there any need to treat "top-level" resources differently? If the
user directly loads a PNG, JPG or for that matter PDF that's part of
an offline manifest, I think it makes sense to serve it from the app
It seems like it would simplify the model a bit a bit for the offline
cache to treat all items in the manifest as part of the application
when visited directly.
The only problem here is the potential inconsistency if an HTML or XML
document doesn't have the <html application="..."> declaration at the
top, but is still cited in some other app's manifest. Then it would be
treated as part of the application if an app page citing that manifest
was visited first, but not if it wasn't. I think this is ok though and
may even be a desirable behavior. For instance, you might not want a
single flickr photo page to be an app by itself, but you'd still want
it to be treated as part of the app domain for someone who had visited
the main application page.
> If you do, skip the next two paragraphs; the 'new cache' flag is set
> If you don't, you fetch the URL. If it has no application=""
> then do whatever the normal thing to do is. Ignore the rest of this.
> The presence of the attribute indicates that it's expecting an
> cache to apply. The presence is detected at parse time, and must be
> present on the first <html> start tag before any other start tags.
> that the attribute's value is same-origin safe. If it isn't, pretend
> attribute wasn't there (and ignore the rest of this). Otherwise,
> check to
> see if you already have a cache for the given URI. If you don't,
> create a
> new cache identified by the given URI. In any case, save this
> resource to
> the identified cache, as a known top-level page for that cache.
> Then, act
> as if you had known about the cache when you started (next step),
> with the 'new cache' flag set to true.
> Load the page from the cache and display it.
I assume any resource that's not found in the cache can be loaded
normally (it would have to be if this is a brand new cache). Actually,
I'm not sure "from the cache" makes sense here given the next sentence.
> Any resources that the page
> tries to fetch using GETs that aren't XMLHttpRequest'ed must be
> taken from
> the cache, if available.
Is it really the right thing for XMLHttpRequest to bypass reading from
the cache? It makes sense to me that they wouldn't be implicitly
stored in the cache, but I don't think the data you get for a URI
should depend on whether you used XMLHttpRequest or loaded it in a
frame. To be fair, I'm not sure why you'd want to do an XHR for a
resource that then gets served from the offline cache. But I'm also
not sure why you'd list an item in your manifest that you then wanted
to load with XHR. So it seems simpler to omit this slight complication.
> When they aren't, the resources must be fetched then stored in the
If there is an explicit manifest, it seems wrong to store things in
the cache that aren't in the manifest but are part of this page. That
means you get the union of the manifest and things the page loads,
which will make offline behavior hard to debug I think. It would be
better to fetch the manifest (possibly getting it from the existing
application cache, if any) before proceeding. Then you'd know which of
the resources loaded as part of this page belong in the cache up
front. That would affect the following steps.
> Once the UA is ready to do so the UA must go on to the next steps.
> UAs may
> do this immediately, or may wait for the original page load to
> or may delay it up to a UA-defined minimum delay.
> If 'new cache' is true, and the cache identifier URI is the same as
> URI that was just downloaded and put in the cache: Do nothing.
> If 'new cache' is true, and the cache identifier URI is different
> from the
> URI that was just downloaded: Fetch the resource identified by that
> Store it in the cache. If it's a manifest and it parses correctly,
> download all the URIs given in that manifest and add them to the
> cache. If
> any are HTML files which, when parsed with scripting disabled,
> trigger the
> application="" handling and have a value that points to the same URI
> the one identifying this application cache, then mark them as known
> top-levels for this cache.
There would be no need to parse the resources if there were no
distinction drawn between top-level and other resources in the cache.
> If 'new cache' is false: Create a new cache. Fetch the resource with
> URI of the cache identifier. If it's a manifest, and it has changed
> what's in the last cache, and it parses correctly, download all the
> in that manifest and add them to the new cache.
I would suggest going a little beyond the http caching rules. I
propose that if the manifest is unchanged (as defined below), the UA
doesn't need to download anything. This makes it possible to give the
manifest a fairly short http expiration, so that checks for updates
are relatively frequent, but make the checks themselves extremely
cheap. This would require some modifiable version field in the
manifest to let it change when the contents of a referenced resource
have changed, but the set of resources hasn't.
A UA may consider the manifest "unchanged" if any of the following
- If the http freshness lifetime of either the copy in the offline
cache or the copy in the normal browser cache has not expired
- If a conditional request relative to a copy in either the offline
cache or the browser cache (via If-Modified-Since or If-Match) gives a
304 Not Modified response
- For non-http protocols, if it appears unmodified using whatever
caching scheme is appropriate to the protocol
But if none of these applies, the UA should not compare the actual
manifest data and should assume the manifest has changed and refetch
the resources (possibly using a cache).
Note that if the manifest is generated dynamically server-side, then
it can always appear new when any resource it points to has changed
but still easily save a lot of needless http traffic using ETags.
Also, another resource to check manifest freshness before proceeding
with a page load is to be able to provide the app with some way of
knowing that it is going to upgrade. Then it could choose to display
custom upgrading UI instead of proceeding with a normal load of all
its resources. In this case though, it would need an event when the
upgrade finishes successfully but also one when it fails.
> If the manifest has an upgrader entry, use that as the upgrader as
> described below. Otherwise, if
> it's not a manifest but an HTML/XML file, and it has changed from
> in the last cache, use that as the upgrader as described below. If
> it's a
> manifest that misparsed, or if it's another kind of file, then act
> as if
> it the URI just pointed to the top level page being loaded (and use
> as the upgrader as described below). If the newly updated cache
> contain the current top-level page, then fetch that too.
I think it would be preferable if a value that isn't either the empty
string or a reference to a valid manifest were treated as if the
application attribute was unset. The rules above make it too easy to
mistakenly think you are using a manifest when actually you are using
implicit application mode, in a way that may not readily show up in
offline testing. Plus, getting rid of the ability to define an
application via an HTML file other than the current one removes the
need for the hidden background browsing context thing, which seems
like a whole mess of needless implementation complexity.
> When a file is fetched by the main page loading in a background
> context, the loads are conditional loads, so that files that haven't
> changed since the previous update are directly copied from the old
This should (of course) apply to loads of resources cited in a
manifest and the manifest itself, as suggested above.
> If the newly update cache's copy of the top level page being shown
> is no
> longer categorised as a "known top-level" for this cache (e.g.
> because it
> doesn't have an <html application> attribute any more) then inform the
> user, e.g. an infobar saying something like "This application may no
> longer be available. (( View new page in a new window )) (( Delete
> application from cache )) (( Keep application in cache and check for
> updates later )) [x]". The first of these buttons would just show the
> background browsing context in the foreground. The second would
> delete the
> webapp cache and reload the page from the normal cache, and the third
> would just not do anything special. Don't run the upgrader in this
Not distinguishing "top-level" resources would remove the need to
present such potentially confusing UI to the user. (A page with
implicit manifest, i.e. pointing to itself as the cache, could simply
cease to get special caching if a version is loaded that doesn't have
<html application=""> set).
> If any of the files being updated in the new cache are 4xx or 5xx,
> or fail
> for some other reason (e.g. DNS errors, user went offline), then the
> should alert the user to this fact somehow (infobar maybe) -- "An
> occurred while updating the application. (( View details )) [x]" --
> then wait a few minutes (or longer if it can tell it'll fail again)
> trying again.
I think this is inappropriate. The offline model should work with
intermittent connections or in captive wifi networks, and showing this
kind of error to the user seems unhelpful. What's wrong with just
using the complete old version and trying the update again later?
> Create a hidden browsing context.
> Load the upgrader in it.
I don't like this whole upgrader idea. Parsing HTML and CSS and
update. I think it is reasonable to require a manifest file for
multipage apps, and writing an HTML/CSS/JS upgrader that can cover all
pages of a multipage app does not seem significantly easier than
creating a manifest file. The implicit manifest idea seems handy as a
quick way to handle one-page apps but it does not seem reasonable for
the multipage case, and this would obviate the need for an upgrader.
> Just before onload, fire an 'upgrading' event to every instance of a
> top-level page using a cache with the same identifier.
Whether or not there are upgraders though, I think events should
dispatch when a manifest-based upgrade either completes or fails (and
perhaps also when the upgrade starts).
> The event has a handle to the Window object of the hidden browsing
> After every 'upgrading' event has been fired, the 'load' event must be
> fired on the upgrader.
> After that happens, if any of the aforementioned instances are still
> using old versions of the cache, then the user agent may inform user
> they can reload to update.
I think it would be preferable to let the apps upgrade themselves
instead. They could choose to use location.reload() if they are not
holding any interesting state, or they could offer to save the user's
state before doing this, or they could make some alternate call that
requests all new resource loads for this instance should come from the
freshly upgraded cache, which would let it perform an upgrade manually
preserving current state if feasible.
> The Upgrader can do such things as updating the database schema
> versions, and when there are multiple instances running, it allows
> them to
> negotiate who will do that work instead of it happening several times.
I would suggest instead that the instance that triggered the upgrade
be given a special event so that it can do this and could optionally
present in-page UI while doing so. This seems simpler than adding a
hidden browsing context. Changing the schema may require pausing other
instances, however, if there is no way to lock the database.
> Modal alerts (window.alert, .prompt, etc) in the background page can
> either raise an exception, be ignored, drop a message to a console, or
> possibly display a message over the top of the foreground app's
To avoid such complexities it would be better to avoid the idea of a
hidden upgrader. And in-page UI could be more tasteful than prompts or
> The manifest format has:
> a list of URIs.
> optionally a place to have an opaque string which can be changed
> arbitrarily (this gives authors a way to change the manifest when
> want things to be refetched).
> optionally a URI for an upgrader (HTML file).
I'd skip the upgrader part. I would also consider adding optional
versions of resources where the UA may assume if the version number is
unchanged it doesn't have to fetch that resource (not even
conditionally) as part of an upgrade to make the supercaching effect
even more super, but perhaps that's overkill.
> We provide an API that can add files to the cache, and that can be
> to determine if we are in upgrader mode or not, and that can swap in a
> new cache without reloading the page, during the 'upgrading' event.
Other API I'd suggest:
1) Request an immediate attempt at upgrade, notwithstanding apparent
freshness of the manifest. This could be used to force an upgrade in
"oops" situations where the manifest has a long expiration but a buggy
version of the app is accidentally shipped and the server gives an
error to ask the app to update immediately.
2) A way to send messages to other app instances - this way, an
instance performing a database scheme update could ask other instances
to hold off on database access, or similarly for an instance doing a
sync of data from the network to the local database.
3) An API to explicitly remove resources from the cache.
I'm not sure if an API to introspect what is currently in the cache is
needed. I can't think of a use case off hand. But both the Google
Gears LocalServer API and the Mozilla offline API have this.
> (If a particular URI is in an application cache as a known top-
> level, but
> later is fetched and found to be a known top-level for another
> application, e.g. because two other pages both fetch that page in
> manifest and the server returns pages with different application=""
> for those two apps, then if the page is visited directly, it uses
> the app
> cache of the last cache to have found it as a top-level. This causes
> problems if visiting the page directly would return yet another cache
> identifier, as then you could only see that page if you'd never seen
> others. I'm not clear about what to do about that.)
> Maybe we should check for updates more often than just when the top-
> page is loaded. e.g. we could do it on a timer, or on every cache
> hit when
I don't think an already-loaded running instance should trigger a
cache update implicitly, only if it explicitly asks. So I'd advise
See also my other email about offline fallback pages. These should be
specified in the manifest.
A la the Google Gears API, I also think a feature is needed to do
something useful with <input type="file"> when offline, to save a
resource for later upload to the server. Preferably this should not
require round-tripping the data through an ECMAScript string or number
array, or it will be too inefficient to work for large files.
I also don't see how apps that require login will be able to work
offline. Do you need to make sure to check the appropriate "remember
me on this computer" checkbox (perhaps not desirable for the security-
conscious, and not available on all apps in any case)? Do you get to
access the app when offline without having to go through login at all
(which seems like a security issue)?
More information about the whatwg