[whatwg] Offline Web Apps

Tue Oct 9 19:29:09 PDT 2007

On Fri, 24 Aug 2007, Maciej Stachowiak wrote:
> > 
> > Could multi-page apps be addressed by letting applications specify 
> > that other applications should be cached (using a similar api to the 
> > one that lets applications programatically cache resources)?
> 
> I don't think that works very well - you'd have to parse all the HTML, 
> CSS and scripts associated with those other pages just to do the 
> caching. That's a huge cost compared to just downloading the resources. 
> Consider web apps like flickr and upcoming which consist of many many 
> pages. Obviously these specific examples can't cache all of their pages 
> offline but they may well want to cache a significant subset that is 
> interesting to the user.

I took this into account and the spec now uses a manifest with no 
automatic adding to the cache (except for the top-level page itself).

> I think it's easy to extend Ian's idea in a way that keeps it really 
> simple for the simple case, but that works better for the multi-page 
> case or other complex cases where pages load some resources dynamically.
> 
> <html application="manifest-file">
> 
> The manifest file would indicate all resources used by the web app, 
> including other pages, and other resources that may be loaded by the 
> current page but normally would not be at startup (another problem with 
> Ian's proposal IMO). Multiple pages that refer to the same manifest are 
> considered part of the same web app and share the same cache. If you 
> give an empty value for the application attribute, then the implicit 
> thing that Ian describes happens - the resources that the page actually 
> loads are the ones cached.

This is somewhat what the spec says now.

Do people think we should rename the application="" attribute to 
manifest="" or cache-manifest="" or something, by the way?

On Fri, 24 Aug 2007, Dimitri Glazkov wrote:
>
> Intuitively, I think I agree with Maciej. Manifest is not as elegant as 
> "participation by association" approach, but it allows for better 
> packaging an application. I am thinking about scripts/stylesheets that 
> are typically a limited set of resources, reused throughout an 
> application.

Agreed.

> I also don't yet understand why Ian wants to store multiple versions of 
> resource representations, if same representation is used by multiple 
> apps.  What is the motivation here?

If App1 and App2 both use the same library J, and J is then updated to J2 
and App2 is updated to use J2, we don't want App1 suddenly to be using J2. 
It should keep on using J until such time as App1 is updated.

> ... and how would one make an app like Twitter or Facebook available 
> offline? Perhaps a markup attribute is not a good idea in this case, 
> where every profile page is potentially an application. I am thinking 
> that only _your_ profile page is an offline app. Right?

I'm not sure what you would do, exactly. The spec has fallback pages for 
this kind of thing now though. It'll be interesting to see how well this 
works.

On Thu, 13 Sep 2007, Aaron Boodman wrote:
> On Sep 6, 2007 5:46 PM, Ian Hickson <ian at hixie.ch> wrote:
> >
> > We provide an API that can add files to the cache, and that can be 
> > queried to determine if we are in upgrader mode or not, and that can 
> > swap in a new cache without reloading the page, during the 'upgrading' 
> > event.
> 
> Given this, and the hidden context that is used while upgrading (not the 
> 'upgrader context', just the one that the page you are viewing is loaded 
> into), isn't it possible to simplify by just doing something like this?
> 
> addFileToCache("otherTopLevelPage.html");
> addFileToCache("yetAnotherTopLevelPage.html");
> addFileToCache("imageThatWasntReferenced.png");
> 
> Then you don't need the manifest anymore, do you?

The manifest is now used in a more critical role than in that proposal.

> I've been thinking about this, and it seems like an interesting idea,
> but to me it creates more complexity than it's worth.

I've dropped the upgrader idea.

On Thu, 20 Sep 2007, Maciej Stachowiak wrote:
> 
> Is there any need to treat "top-level" resources differently? If the 
> user directly loads a PNG, JPG or for that matter PDF that's part of an 
> offline manifest, I think it makes sense to serve it from the app cache.

The spec now does this.

> I assume any resource that's not found in the cache can be loaded 
> normally (it would have to be if this is a brand new cache). Actually, 
> I'm not sure "from the cache" makes sense here given the next sentence.

In the current design, once the cache is primed, resources that aren't on 
the whitelist and aren't in the cache will return errors, and will not be 
loaded normally. This aids development significantly.

> Is it really the right thing for XMLHttpRequest to bypass reading from 
> the cache? It makes sense to me that they wouldn't be implicitly stored 
> in the cache, but I don't think the data you get for a URI should depend 
> on whether you used XMLHttpRequest or loaded it in a frame. To be fair, 
> I'm not sure why you'd want to do an XHR for a resource that then gets 
> served from the offline cache. But I'm also not sure why you'd list an 
> item in your manifest that you then wanted to load with XHR. So it seems 
> simpler to omit this slight complication.

Agreed.

> If there is an explicit manifest, it seems wrong to store things in the 
> cache that aren't in the manifest but are part of this page. That means 
> you get the union of the manifest and things the page loads, which will 
> make offline behavior hard to debug I think. It would be better to fetch 
> the manifest (possibly getting it from the existing application cache, 
> if any) before proceeding. Then you'd know which of the resources loaded 
> as part of this page belong in the cache up front. That would affect the 
> following steps.

We don't want to slow down any page loads (right now the spec only has a 
slowdown for the case of a page being fetched from a cache and immediately 
found to not belong to that cache), certainly not in a way blocking on the 
network. The spec no longer has any implicit caching apart from the 
top-level page, though. Let me know if that's good enough.

> I would suggest going a little beyond the http caching rules. I propose 
> that if the manifest is unchanged (as defined below), the UA doesn't 
> need to download anything. This makes it possible to give the manifest a 
> fairly short http expiration, so that checks for updates are relatively 
> frequent, but make the checks themselves extremely cheap. This would 
> require some modifiable version field in the manifest to let it change 
> when the contents of a referenced resource have changed, but the set of 
> resources hasn't.

Done. (The "modifiable field" being a comment.)

> A UA may consider the manifest "unchanged" if any of the following conditions
> applies:
> 
> - If the http freshness lifetime of either the copy in the offline cache or
> the copy in the normal browser cache has not expired
> - If a conditional request relative to a copy in either the offline cache or
> the browser cache (via If-Modified-Since or If-Match) gives a 304 Not Modified
> response
> - For non-http protocols, if it appears unmodified using whatever 
> caching scheme is appropriate to the protocol

That's out of scope of the spec, it's just HTTP.

> But if none of these applies, the UA should not compare the actual 
> manifest data and should assume the manifest has changed and refetch the 
> resources (possibly using a cache).

Why? The most common case when a server is sending a manifest that hasn't 
changed but without the appropriate headers will be when the author 
doesn't know how to set them. In this scenario, the author probably also 
doesn't have the best control over the server. Yet this would be, under 
your suggestion, the case where the server is hammered the hardest.

> Also, another [reason] to check manifest freshness before proceeding 
> with a page load is to be able to provide the app with some way of 
> knowing that it is going to upgrade. Then it could choose to display 
> custom upgrading UI instead of proceeding with a normal load of all its 
> resources. In this case though, it would need an event when the upgrade 
> finishes successfully but also one when it fails.

The spec caters with this to some extent, but it needs work based on 
implementation experience.

> I think it would be preferable if a value that isn't either the empty 
> string or a reference to a valid manifest were treated as if the 
> application attribute was unset. The rules above make it too easy to 
> mistakenly think you are using a manifest when actually you are using 
> implicit application mode, in a way that may not readily show up in 
> offline testing. Plus, getting rid of the ability to define an 
> application via an HTML file other than the current one removes the need 
> for the hidden background browsing context thing, which seems like a 
> whole mess of needless implementation complexity.

Done.

> > If any of the files being updated in the new cache are 4xx or 5xx, or 
> > fail for some other reason (e.g. DNS errors, user went offline), then 
> > the UA should alert the user to this fact somehow (infobar maybe) -- 
> > "An error occurred while updating the application. (( View details )) 
> > [x]" -- and then wait a few minutes (or longer if it can tell it'll 
> > fail again) before trying again.
> 
> I think this is inappropriate. The offline model should work with 
> intermittent connections or in captive wifi networks, and showing this 
> kind of error to the user seems unhelpful. What's wrong with just using 
> the complete old version and trying the update again later?

Well, ok, but we don't want to hide errors _too_ much...

> I don't like this whole upgrader idea. Parsing HTML and CSS and 
> executing JavaScript seems like an inefficient way to do an app update. 
> I think it is reasonable to require a manifest file for multipage apps, 
> and writing an HTML/CSS/JS upgrader that can cover all pages of a 
> multipage app does not seem significantly easier than creating a 
> manifest file. The implicit manifest idea seems handy as a quick way to 
> handle one-page apps but it does not seem reasonable for the multipage 
> case, and this would obviate the need for an upgrader.

I dropped upgraders and implicit manifests.

> > Just before onload, fire an 'upgrading' event to every instance of a 
> > top-level page using a cache with the same identifier.
> 
> Whether or not there are upgraders though, I think events should 
> dispatch when a manifest-based upgrade either completes or fails (and 
> perhaps also when the upgrade starts).

Done.

> I think it would be preferable to let the apps upgrade themselves 
> instead. They could choose to use location.reload() if they are not 
> holding any interesting state, or they could offer to save the user's 
> state before doing this, or they could make some alternate call that 
> requests all new resource loads for this instance should come from the 
> freshly upgraded cache, which would let it perform an upgrade manually 
> preserving current state if feasible.

Ok, but what about broken apps that do nothing?

> 1) Request an immediate attempt at upgrade, notwithstanding apparent 
> freshness of the manifest. This could be used to force an upgrade in 
> "oops" situations where the manifest has a long expiration but a buggy 
> version of the app is accidentally shipped and the server gives an error 
> to ask the app to update immediately.

Added, though relying on an application-level API to fix application-level 
bustedness seems optimistic. :-)

> 2) A way to send messages to other app instances - this way, an instance 
> performing a database scheme update could ask other instances to hold 
> off on database access, or similarly for an instance doing a sync of 
> data from the network to the local database.

We'll adddress this separately.

> 3) An API to explicitly remove resources from the cache.

Done.

> I'm not sure if an API to introspect what is currently in the cache is 
> needed. I can't think of a use case off hand. But both the Google Gears 
> LocalServer API and the Mozilla offline API have this.

You can iterate over the cache in the current model (though it only gives 
you back the "dynamic" entries).

> I also don't see how apps that require login will be able to work 
> offline. Do you need to make sure to check the appropriate "remember me 
> on this computer" checkbox (perhaps not desirable for the 
> security-conscious, and not available on all apps in any case)? Do you 
> get to access the app when offline without having to go through login at 
> all (which seems like a security issue)?

Right now, the latter. I don't really see the attack scenario though. Why 
is that a problem? 

On Fri, 21 Sep 2007, Robert O'Callahan wrote:
> 
> -- If you can programmatically force URIs into the offline cache, then 
> you want to be able to enumerate the resources in the offline cache, 
> otherwise there is no way to reliably remove unneeded resources 
> (especially if there was an older, buggy version of the app that may 
> have loaded resources from unexpected URIs).

Fair point. Spec handles this

> -- Several Web app authors have asked for the ability to test whether a 
> resource is cached, for their online apps. For example, when you're 
> zooming in and out of a map, the application could choose which tile(s) 
> to use for the animation by scaling them up or down. This would also be 
> convenient for offline use, where a resource might not necessarily be in 
> the offline cache but you could use it if it happened to be available.

Interesting. This isn't covered yet, should we add it immediately?

On Mon, 10 Sep 2007, Dimitri Glazkov wrote:
>
> Since, AFAIK, the fragment identifier is not passed onto the server by 
> the UA, I can't see how an application could be designed with proper 
> noscript degradation and reliance frament ids for query communication.

Well, an offline app is going to have to be scripted.

> Besides, using query parameters is much more natural for HTML: forms 
> with method=get are the way to build it.

The spec now is agnostic to all this, and just uses fallback pages.

On Thu, 13 Sep 2007, Dimitri Glazkov wrote:
> 
> Another, less cool path would be to use regular expressions or somesuch 
> instead of explicit list.

We have prefix matching now.

> What if an application could be given an event when the link, clicked on 
> a document that is part of the application leads to a page that is not 
> present in cache? This way, the app could potentially manage the 
> fallback.

That's basically what we have now.

On Thu, 13 Sep 2007, Dimitri Glazkov wrote:
> 
> Distinct, server-reaching URLs (no fragment identifiers) for each page 
> in an web application are a _good_thing_. Packing the whole application 
> into one document and managing history with id hashes and other hacks is 
> not. It's a necessary kludge that you have to do in order to avoid 
> browser context re-initializing, re-parsing scripts, and re-requesting 
> all accompanying graphical and stylistic overhead every time the user 
> clicks on anything.
> 
> I would've loved it if Google Reader had a distinct URL for each click I 
> make on the page, and I am sure Google Reader devs would've loved it 
> too. Except they also would've loved not having to worry about the 
> browser/scripting context change. Instead, they have to essentially 
> reinvent the way web works 
> (http://www.tbray.org/ongoing/When/200x/2006/03/26/On-REST) by 
> overloading fragment identifier with an entire URI management system. I 
> applaud the effort and the result is awesome, but it doesn't make a good 
> bedtime story.
> 
> I guess the vision is that application context transcends and 
> encompasses browser/scripting context somehow.

Note that you can do this now with the pushState() stuff.

On Thu, 13 Sep 2007, Dimitri Glazkov wrote:
> 
> Upon studying the pushState spec a little bit closer, I can't help but 
> think that it's not quite what the doctor ordered. Consider this 
> scenario:
> 
> I start with the following markup (could've borrowed code from Bugzilla, 
> but eh, too lazy):
> 
> <ul>
>    <li><a href="add-bug">Add New Bug</a></li>
>    <li><a href="browse/">Browse Existing Bugs</a></li>
> </ul>
> 
> Now, in order to preserve context when the user clicks on these links, I 
> have to:
> 
> 1. Attach event handler to each link, which:
> 2. Disables event propagation and cancels the event
> 3. Creates (or retrieves from history) the state object
> 4. Pushes state object with the URL, provided in "href' attribute
> 5. Still does XHR request to the target page to retrieve needed information
> 6. Modifies DOM of current page to replace parts or whole of the page
> with the data, retrieved with the XHR.
> 
> Correct me if I am wrong, but a lot of this pseudo-code is essentially 
> replicating browser's hyperlink behavior all over again.
> 
> I am not sure I have a better alternative, but it just seems that there 
> has to be a better way. Especially considering that the way this 
> develops is in our hands at the moment.

If you do have any proposals, please do send them.

On Thu, 13 Sep 2007, Anne van Kesteren wrote:
> 
> Maybe there should be a way for an application to register which URIs it 
> can handle in offline context and which file will handle them? (This 
> would also make it work if an application was set up to not use query 
> strings.) This does increase the likelyhood you get two "separate" 
> applications though and that's not very nice.

This is basically what the spec does now.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'