[whatwg] Caching offline Web applications

Fri Oct 17 18:36:31 PDT 2008

This e-mail is an attempt to respond to all the outstanding feedback on 
the offline cache features in HTML5.

Summary of changes:

 * Made the online whitelist be prefix-based instead of exact match.

 * Removed opportunistic caching, leaving only the fallback behavior part.

 * Made fallback URLs be prefix-based instead of only path-prefix based 
   (we no longer ignore the query component).

 * Made application caches scoped to their browsing context, and allowed 
   iframes to start new scopes. By default the contents of an iframe are 
   part of the appcache of the parent, but if you declare a manifest, you 
   get your own cache.

 * Made fallback pages have to be same-origin (security fix).

 * Made the whole model treat redirects as errors to be more resilient in 
   the face of captive portals when offline (it's unclear what else would 
   actually be useful and safe behavior anyway).

 * Fixed a bunch of race conditions by redefining how application caches 
   are created in the first place.

 * Made 404 and 410 responses for application caches blow away the 
   application cache.

 * Made checking and downloading events fire on ApplicationCache objects 
   that join an update process midway.

 * Made the update algorithm check the manifest at the start and at the 
   end and fail if the manifest changed in any way.

 * Made errors on master and dynamic entries in the cache get handled in a 
   non-fatal manner (and made 404 and 410 remove the entry).

 * Changed the API from .length and .item() to .items and .hasItem().

On Wed, 28 May 2008, Anders Carlsson wrote:
> 
> one problem with the online whitelist in cache manifest files is that it 
> matches on whole URLs only.

Fixed.

I also changed the order in which the lists are examined, so that this:

   HTTP://EXAMPLE.COM:80/c

...in the online whitelist doesn't mean that this:

   http://example.com/contact.html

...wouldn't ever get served from the cache. That is, if the resource is in 
the cache, it's used from the cache, and the online whitelist isn't 
examined. Also, fallback namespaces now override the online whitelist 
namespaces.

On Fri, 13 Jun 2008, Honza Bambas wrote:
> 
> I was talking with my colleague about it and we both agree it would be 
> useful (and more easy to implement) for ANY resource being fetched 
> inside of browsing context associated with an application cache to 
> opportunistically cache it and not just do it for results of navigation, 
> i.e. top-level document, iframe source and frame source. A set o 
> pictures/icons/css styles could be easily cached this way w/o explicitly 
> listing them in the manifest.
> 
> At this point it would be good to say what really is 
> intention/motivation of opportunistic caching itself. Maybe I am missing 
> the purpose and potentially open a security hole or a kind of attack 
> this way.

The main goal of opportunistic caching was to allow Flickr or Bugzilla to 
continue using URLs on a per-entry basis, so that while offline, if you 
try to go to a page that hasn't yet been seen before, the application can 
catch the navigation attempt and display fallback content (such as an 
error message) instead of the UA saying "file not found".

In particular, there is no feature intended to automatically capture 
resources that aren't in the manifest -- experience has shown that such 
features lead to difficult-to-debug mistakes (resources that aren't used 
every time end up missing in the cache only when they haven't been 
tested). I've changed the spec to make opportunistic caching never cache 
without a manifest.

On Wed, 16 Jul 2008, Honza Bambas wrote:
>
> When application cache update is invoked by document load that is 
> completely fetched from offline application cache while the browser is 
> in offline mode what exactly should happen?
> 
> Let's say we always want to have one of the onxxx events on 
> applicationCache object get called. In case the browser is in offline 
> mode (including user switch to offline mode manually) the only 
> reasonable event seems to be onerror call because the server could not 
> actually be reached. The spec says nothing in particular about behavior 
> of the update process while browser is offline.

I'm not really sure I follow. There are all kinds of events fired during 
the cache update process. While offline, you get "checking" followed by 
"error". What's the problem?

On Tue, 5 Aug 2008, Aaron Boodman wrote:
>
> Some quick notes/questions...
> 
> - I think the manifest should be some structured, extensible format such 
> as XML or JSON. The current text-based format is going to quickly turn 
> into a mess as we add additional fields and rows.

The problem with XML is that it _dramatically_ increases the complexity of 
the code that has to deal with the manifest. You have to check namespaces, 
tag names, node types, etc, just to handle the many bazillion kinds of 
errors that XML enables (and I'm only talking about well-formed XML). JSON 
isn't quite as bad, but it still introduces a great level of complexity 
and is missing a number of features (such as comments) that we might want.

What we're trying to expose here is really just a list of URLs, for which 
a simple text format with one URL per line seems like the least trouble.

> - I like the fallback entry feature, but I don't understand why it is 
> coupled to opportunistic caching. On the Gears team, we frequently get 
> requests to add a feature where a certain pattern of URLs will try to go 
> the network first, and if that fails, will fall through to a certain 
> cached fallback URL. But nobody has asked for a way to lazily cache URLs 
> matching a certain pattern. Even if they had, I don't understand what 
> that has to do with the fallback behavior. Can we split these apart, and 
> maybe just remove the opportunistic caching thing entirely?

Done.

> - It seems odd that you request a resource and the server returns 400
> (bad request) we fallback. Maybe it should just be up to the server to
> return an error message that directs the user to the fallback URL? I'm
> not sure about this one, looking for feedback.

The idea is that if the server isn't responding 200 OK, then it is likely 
that you are not actually on the network (e.g. you are being caught by a 
captive portal).

> - Maybe this is obvious, but it's not specified what happens when the
> server returns a redirect for a resource that is being cached. Do we
> cache the redirect chain and replay it?

I've changed the spec to treat a redirect as an error. I couldn't see a 
good way to handle it otherwise (especially consider cross-domain 
redirects).

> - In practice, I expect the number of URLs in the online whitelist is 
> going to be unbounded because of querystrings. I think if this is going 
> to exist, it has to be a pattern.

Done.

> - I know you added the behavior of failing loads when a URL is not in 
> the manifest based on something I said, but now that I read it, it feels 
> a bit draconian. I wish that developers could somehow easily control the 
> space of URLs they expect to be online as well as the ones they expect 
> to be offline. But maybe we should just remove the whole thing about 
> failing loads of resources not in the manifest and online whitelist for 
> v1.

It seems like failing is what one wants from a debugging perspective.

On Wed, 6 Aug 2008, Michael Nordman wrote:
>
> Extensibility would accommodate the addition of experimental sections 
> and entry attributes while retaining compatibility with what has been 
> formally adopted into the spec.

The spec does now support extending the format (unsupported sections are 
ignored, extra tokens after URLs are ignored).

> Seems like there are a few different use cases to accommodate. Spitting 
> this up would add clarity and allow us to make indpendent decisions 
> about which should be in or out of the spec.
> 
> 1) Being lazy about listing images and css located in well known 
> directories, automatically caching as the app runs.

This is now not supported at all.

> 2) Hijacking parameterized requests and returning a local resource 
> without incurring any network traffic.

This isn't supported; the network is always consulted. But if the network 
request fails, then a fallback resource is used, though it cannot be 
generated on the fly.

> 3) Falling back on a local resource iff server/network error.

This is supported.

> What if... the server is running a newer version of the app than is 
> currently in use... depending on type of resource it is, i think there 
> could be some unexpected consequences, especially when you consider 
> local SQL data with an expected schema.

The database APIs support versioning natively, which might help here, but 
you are right that there's no good support for this.

> Not sure the "fail if not represented in manifest" is a good idea 
> either... are there unintended consequences lurking here... what does 
> this do in the face of mashups?

I'm not sure I understand; can you elaborate?

On Wed, 6 Aug 2008, Aaron Boodman wrote:
> 
> I think that the unnamed columns make it difficult to use, because you 
> have to remember which column does what. I've used a similar text-based 
> format, the Firefox chrome.manifest format, and I found it difficult for 
> the same reason.

The only unlabeled columns are for fallback namespace -> fallback resource 
mapping. We could make it explicit by requiring punctuation, so instead 
of:

   FALLBACK:
   http://example.com/photos/ http://example.com/photos/fallback.html

...we could have:

   FALLBACK:
   http://example.com/photos/ -> http://example.com/photos/fallback.html

Would that help?

> If we ever needed to have columns that were optional for a given 
> section, it would become difficult to see if they were specified because 
> of the lack of names.

It's always possible to come up with things that a syntax doesn't handle 
well. For example, XML doesn't handle graphs well (where some nodes can 
have multiple children and multiple parents). JSON doesn't handle booleans 
and ratios well. I don't know that we can do much about that.

> If rows ever get long with more than a few columns, there is no ability 
> to wrap them with the given format. This would come for free with many 
> other formats.

Sure. Also free with XML is support for other character encodings. But 
these features come at a high cost.

> If we ever needed to add any kind of hierarchy, we would have to 
> reinvent something like XML, JSON, or YAML.

If we used XML and ever needed to add overlapping ranges, we would have 
similar problems. There are limitations to any format.

> To me, the idea of inventing a custom one-off format for this is ugly. I 
> realize this is somewhat a matter of taste (unix uses lots of line-based 
> format and hey, it works fine for them), so I will drop this and just 
> leave it as a vote for XML.

Fair enough. :-)

On Mon, 25 Aug 2008, Michael Nordman wrote:
>
> 5.7.2 Application caches
> I found the terminology used to describe the contents of the cache sometimes
> contradictory and confusing, and it doesn't correspond directly with the
> terminology used in the manifest file syntax. FWIW, some word smithing and
> reconciling the differences could add clarity to the spec.
> 
> * cached resource** categories*
> 
> * implicit category
> This categorization applies to html docs which explicitly contain a
> reference to the manifest file via the 'manifest' attribute of their <html>
> tag. I understand they are not necessarily explicitly listed in the manifest
> file, but they may also be explicitly listed. The end result is that a
> resource can be categorized as both 'implicit' and 'explicit'. This is
> confusing. I'd vote to have a different name for clarity sake... some
> ideas... 'toplevel', 'manifest referencing', 'native' (an awkward play on
> foreign).

Hmm, good point. Renamed them "master entries".

> * manifest category
> Perfect.
> 
> * explicit category
> Ok provided 'implicit' is renamed.
> 
> * fallback category
> The term 'fallback' refers to the prescribed use of these resources for the
> opportunistic-caching namespace in particular. As part of pulling apart
> namespaces vs how to handle hits within a namespace, I'd vote to change the
> name for this category... some ideas... 'namespace-handler'.  I'll say more
> more to say about different types of 'namespaces' below.

I'm not really sure what's wrong with "fallback"; do the recent changes 
make this better?

> * opportunistcally cached category
> A mouthful, but ok. Another possibility is 'auto-cached' which would work
> well with the 'manually-cached' terminology below.

This is gone now.

> * dynamic category
> I'd like to reserve the term 'dynamic' for a different use of that term
> (more on that in a moment).  Some name possibilites for this category...
> 'manually-cached' or 'script-added' or 'programatically-added'.

I've left this as dynamic for now, but I'll consider changing this as the 
situation develops.

> * flavors of namespaces*
> 
>  * online whitelist
> As mentioned in previous messages, this would need to be some form of
> namespacing or filtering to be useful. A better term for this might be
> 'bypass' since with respect to the appcache, hits here bypass the cache. Its
> not clear if path prefix matching is the best option for filtering out
> request that should bypass the cache. In working with app developers using
> Gears, the idea of specifying a particular query argument to filter on in
> addition to a path prefix has come up. http://server/pathprefix   +
> &bypassAppCache

I've changed it to just a prefix. Doing things at the query level seems a 
bit odd. The query parameters should be for the server, not the UA.

I haven't changed the name. Bypass seems ok, but then so does "online". 
The only place this gets exposed today, in the manifest, the name is 
"NETWORK:", which seems pretty clear (next to "CACHE:" and "FALLBACK:").

> * opportunistic caching namespaces
> A mouthful but ok. Whatever terminology used for the category of resulting
> entries should be used here... perhaps 'auto-caching namespace'.

This is gone now.

>  * fallback namespace [factored out of opportunistic-caching]
> This form of namespace is addressed by the spec at present, but is
> co-mingled with the auto-caching feature. This is a proposal to detangle
> them from one another. The basic idea is to load the resource as usual, and
> only upon failure fallback to a cached 'namespace-handler'... no
> auto-caching involved.

This is renamed "fallback namespace" now.

> * intercept namespaces [new]
> This form of namespace is not in the spec at present. This is a proposal to
> add it. It is a heavily used feature of the Gears LocalServer. The basic
> idea is to intercept requests into this namespace and satisfy them with a
> cached 'namespace-handler'  without consulting the server.
> 
> *Scriptlets - or dynamic namespace-handlers [new idea]*
> 
> Something we wrestled with in the process of putting together the Gears
> LocalServer was the distinction between intercepting requests for urls and
> identifying the appropiate cached resource for that request. We ended up
> with a declarative manifest file, similar to but different from what is
> contained in this spec. This wasn't an altogether satisfying answer. The
> expressiveness of the language to match/filter requested urls is limited in
> Gears and this spec shares that same characterization.
> 
> Something else we've wrestled with in Gears was having to do awkward
> redesigns in corners of a web application in order to 'take it offline',
> single-sign-on for example. In general, anywhere an application relies on
> HTTP features more than HTML to influence navigation or conditional resource
> loading, it's difficult to address with a static cache.
> 
> So I'd like to propose extending this spec to incorporate 'dynamically
> generated responses'. I think this capability fits into this corner of the
> HTML5 spec because this is most directly useful in the "Offline Web
> Application" scenario. The basic idea is to execute application code
> (script) to produce responses to intercepted resource loads. The application
> code is executed in the background and can formulate a response
> asynchronously.
> 
> Some handwaving where this could hang off of this spec
> * Modify namespace-handlers entries to have an attitional attribute to
> indicate that they are to be executed rather than returned
> 
> And some handwaving at what a scriptlet can do...
> * Can read the request headers and POST body
> * Can set response status code and headers (redirects)
> * Can generate a textual response body
> * Can designate a non-executable cached resource to be returned in response
> * Can decide to 'bypass' handling of a request and defer to the usual
> resource loading
> * Can decide to perform the usual resource loading, but to have the response
> added to the appCache
> * Can access HTML5Database APIs
> * Can utlize XmlHttpRequest to communicate with a server
> 
> This would obviously be significant addition to the spec, but i do think
> this is worth consideration in the context of 'offline applications'. Based
> on observations of app developers wrestling with Gears, there have been
> several pain points. The HTML5ApplicationCache addresses one of them
> with per-application caches. This addition would address the second of
> them.  (Another pain point has been application deployment).

On Fri, 3 Oct 2008, Philip Tucker wrote:
>
> We've spent the last year or so modifying an existing online application 
> to make it work offline. A secondary motivation was application speed. 
> We used Google Gears to enable this. Although we had some success, we 
> also had challenges. Here are some of the big ones:
> 
>    1. *Significant architecture change*. Our web pages are built on the
>    server without much concern for separation of application logic and user
>    data. User data is usually just "baked in" to the page. In captured/offline
>    mode, we have to serve a generic page that can fill in user data via
>    javascript. This required changing the online application in significant
>    ways just to support offline functionality. And, it required forking the
>    code in many places, leading to maintenance difficulties.
>
>    2. *All-or-nothing captured mode*; i.e., once a URL is captured, it is
>    always served from offline cache. This means if we encounter a bug in the
>    offline app, it's not easy to fall back to the online app. We could serve
>    the offline app off a different URL space, but that makes for harsh
>    transitions between online and offline mode; e.g., on a flaky connection.
>
>    3. *Single sign-on was hard*.
> 
> How I think Michael's proposal addresses these:
> 
>    1. Some applications have cleaner separation of app and user data than
>    others, but one consistent point of clean separation common to all web apps
>    is the HTTP layer, between client and server. Michael's proposal would allow
>    applications a hook at precisely this point. Most client javascript could
>    remain unchanged. XHR servlets would be replaced with client side
>    scriptlets. Server-side templates could be fairly easily migrated to
>    client-side counterparts that populate the template from a local data store.
>
>    2. These scriptlets could also offer a failure mode that would fall
>    through to the online app.
>
>    3. We had to jump through all kinds of hoops to get single sign-on to
>    work. We have to have 2 separate manifests containing all our app URLs, one
>    protected by a cookie and one not. This proposal wouldn't completely ease
>    this problem, but if we're executing a script before the page is loaded we
>    have more options for redirecting to another page when not authenticated,
>    more similar to the way most online authentication schemes work.
> 
> Also, there are some cases where Javascript simply is not powerful enough.
> An example we hit recently is dynamic data in the header (e.g., meta tags,
> styles). We have many cases where these change dynamically. There are ways
> to get around this (put the dynamic page in an iframe, construct HTML in
> javascript, and writer it into the iframe), but I think it's clear that this
> is not a great solution. This is processing that should happen before the
> page is handed to the browser.

I haven't added this in this version.

I think this would be an interesting idea. I don't think there has been a 
critical mass of vocal support for this, and it's not clear to me that it 
really resolves enough problems to be worth the significant added 
complexity. I'm also not sure it's a good idea for us to be co-opting the 
network layer to the point of allowing authors to replace it with a "fake" 
server client-side.

I don't really see why login requires network-layer hacks. It seems like 
this is an issue only for migration of existing large-scale apps, of which 
there are relatively few compared to the number of apps we can expect to 
be written on top of this API from scratch on the long term. It doesn't 
make sense to add huge complexity just to support migration of a small set 
of applications for a transition period.

In particular, it seems like a better solution for login is to not protect 
the _app_ behind a cookie, but only protect the _data_, and to then fetch 
the data for the user and store just that client-side. This implies not 
baking the data into the app, which I understand has performance 
implications, but I think we should try to solve those problems separately 
(e.g. supporting multipart documents, or allowing two resources to be 
served mixed together at the HTTP layer, so that the data is available as 
soon as the app is ready for it).

On Mon, 25 Aug 2008, Michael Nordman wrote:
>
> Manifest file section headers:
> * BYPASS: list of url [namespaces/filters]
> * CACHE: list of exact [urls]
> * INTERCEPT: list of [urlnamespaces, namespace-handler url]
> * AUTOCACHE: list of [urlnamespaces, namespace-handler url]
> * FALLBACK: : list of [urlnamespaces, namespace-handler url]

I haven't changed the manifest syntax; the current terms seem fine.

On Fri, 29 Aug 2008, Michael Nordman wrote:
> 
> *When is anything ever deleted?*
> 
> Maybe i missed it, but where does appCache deletion happen?

It didn't. It now does, in response to 404 or 410 statuses for manifests 
when doing an update.

> Something that Gears user's have done is to serve an empty manifest 
> file. The results are a close approximation to having deleted the 
> resource store. I would vote to have some syntax for expressing 'delete 
> me' in the manifest file for an appCache.

It seems like it would be better to make the absence of a manifest do 
this, since otherwise if someone temporarily puts up a site-wide manifest 
without the site owner knowing, some users (who happened to visit the site 
during the attack window) would never get a successful update and would 
thus never see updates to the site again.

> A new type of event may be warranted for completion of such an update, 
> and when swapCache() is called there would no longer an appCache 
> associated with the context.

Done.

> *Should we revisit the caching semantics for any resource not explicitly 
> listed in the manifest?
> 
> *Unless i missed something, I think the appCache update/validation logic 
> is fundamentally flawed with regard to resources that are not explicitly 
> listed. As presently spec'd, a failure to update/validate any of these 
> resources causes the entire update to fail, and the old version will 
> remain pinned in the cache. Now suppose the app changes it's url space 
> such that some of the resources that got picked up by one of the 
> mechanisms to add new resources (autocaching namespace or manually 
> .add()ed or <html manifest=x>) no longer make sense... i think this 
> means the appCache is stuck in time.

Hm, interesting point.

> One idea is to rephrase this feature in terms closer to std http caching 
> for all entries that do not explicily appear in the manifest file. In 
> effect, closer to telling the http cache to not purge the resource.
> 
> * at initial cache time
>   - cache the resource
> 
> * at appCache update time
>   - validate all non-explicit entries per usual http caching semantics
>      (so 404s  will remove these entries at update time)
>   - network/server errors do not fail the larger update
>   - beyond that, not sure what todo on network/server errors... remove or
> retain the resources?
>   - perhaps maintain a list of 'failed to update' items that the webapp can
> access via script
> 
> * at resource load time
>    - validate per usual http caching rules going forward
>     (so 404s will remove these entries)
>   - with the following exceptions
>      - use the cached resource as a fallback for network or server(5xx)
> errors
>      - do not purge the resource upon expiration

I'm not a big fan of making these resources act differently than manifest 
resources, but I do agree that they should have different error handling.

I've changed the spec so that 404 and 410 errors cause the resource to be 
removed, and other errors (and redirects) cause the resource to be copied 
from the previous cache, without the whole caching process being canceled.

On Mon, 15 Sep 2008, Dave Camp wrote:
> >
> > * at resource load time
> >   - validate per usual http caching rules going forward
> >     (so 404s will remove these entries)
> >   - with the following exceptions
> >      - use the cached resource as a fallback for network or server(5xx)
> > errors
> >      - do not purge the resource upon expiration
> 
> This seems reasonable, but it seems a bit strange that 
> applicationCache.add() resources will behave differently than 
> explicitly-listed manifest entries (on a particularly slow/flaky 
> wireless network, parts of the application will be quick and others 
> won't).

Agreed.

> On the subject of fallbacks, I don't think the spec is quite clear on 
> how the fallbacks are meant to be loaded.  There seem to be two possible 
> interpretations:
> 
> 1) The fallback resource is loaded by the client as though it were 
> loaded from the original URI - security decisions are made with the 
> original URI, and window.location, bookmarks, history, etc. all reflect 
> the original URI.  This is somewhat analogous to the real server 
> returning fallback content at the original URI.
> 
> 2) The fallback resource is loaded by the client as though it were 
> loaded from the fallback URI for purposes of security decisions, 
> window.location, etc.  But bookmarks, history, etc all reflect the 
> original URI.  This is somewhat analogous to a server redirect (with 
> bookmark/history changes to reflect the original URI), or to a frame at 
> the original URI including the fallback URI (but without the 
> intermediate window object).
>
> We need to decide which of these behaviors makes the most sense. The 
> first seems the most straightforward [...[

Agreed. That is the intent of the current text. Where is it ambiguous?

> I think we'd want a few changes:
> 
> a) The fallback URI should be required to have the same origin as the 
> namespace.

Done.

> b) Maybe there should be some way for the page to know that it was 
> loaded as a fallback.

I could add something to Location, would that work?

   window.location.fallbackHref

...or something? It would return the empty string unless it was a fallback 
case?

On Tue, 7 Oct 2008, Michael Nordman wrote:
> 
> 1) Foreign entry detection
> 
> The spec points out an optimization when a foreign entry is discovered 
> at cache-selection time, involving marking the entry as foreign at that 
> time so it will get filtered out of future searches during top-level 
> navigation. Another optimization that could be pointed out is to detect 
> foreign'ness upon insertion into the cache.
> 
> Really, it may be more clear if the spec were simply spec'd that way 
> rather. The behavior exhibitted by the algorithms described corresponds 
> with 'detect on insert', but accomplishes that in a less direct fashion.

Done.

> 2) Silent manifest parsing errors
> 
> The spec goes out of its way to indicate that most errors while parsing the
> manifest file should be silently eaten. That can't be an accident. What
> badness is being averted by that behavior? What is trying to be accomplished
> by that behavior?

We want a format that is forward-compatible and convenient to use.

I'm open to other syntaxes; what would you suggest?

> 3) Update algorithm
> 
> The intent is to grab a coherent set of resources that make up a 
> 'version' of the app. No provisions are made to ensure that is what you 
> actually end up with. Say the system starts an update, grabs the 
> manifest file and starts fetching/validating resources. Half way thru, a 
> new manifest file and set of resources lands on the server (or a new 
> server is deployed). You end up with a mixed set of resources on the 
> client.

I've changed the spec to refetch the manifest at the end and verify that 
it's equal to the first one still.

> 4) Why require text/cache-manifest mimetype?
> 
> Presents a small hurdle to get over. What is being accomplished with 
> this requirement?

This is actually just a restatement of HTTP's requirements. If you want 
the Content-Type to be ignored, please contact the HTTP working group and 
have them change the requirements for handling HTTP Content-Type headers. :-)

> I was trading mail with somebody using Gears and this came up. The 
> developer was interested in purging based on LRU when a threshold was 
> exceeded. The app works with a unbounded (for all practical purposes) 
> set of resources that could be cached.
> 
> If the 'contract' for these non-explicit entries required them be purged 
> as quotas are bumped into, that would be ideal for this particular use 
> case. These type of semantics could make a lot of sense for a class of 
> apps like Flickr or PicassaWeg or YouTube.
> 
> So they don't expire according to normal http caching rules, and they 
> are used as a fallback in the event of errors, but they are not 
> guaranteed to be there forever unless you stay within a quota.

I've added an allowance for expiring resources. It's pretty open-ended, 
left up to the UA.

On Tue, 7 Oct 2008, Michael Nordman wrote:
>
> Another one...
> 
> 6) The DOMApplicationCache .length and .item(indx) members.
> 
> These two are troublesome in a multi-threaded / multi-process browser. 
> Can we come up with an interface that's more ammenable to implementation 
> non-single threaded browsers?

The API, as written, actually is thread-safe, but only because there are 
some pretty draconian requirements.

I'm not really sure what to do about this. What kind of API did you have 
in mind?

On Tue, 7 Oct 2008, Maciej Stachowiak wrote:
>
> Don't you need to have some particular version of the application cache 
> loaded in the thread or process that is processing the particular web 
> page using these APIs? It seems to me that the application cache's 
> atomic update semantics effectively require that, since loading needs to 
> keep a consistent view of the application cache regardless of changes 
> caused by other pages, so length and item are not an obstacle.

I think Michael is talking about the add() and remove() API.

On Wed, 8 Oct 2008, Michael Nordman wrote:
>
> Here's the thing i'm trying to avoid in section 5.7.6 where it
> discusses the add(url) method.
> ...
> 8. "Wait for there to be no running scripts, or at least no running
> scripts that can reach an ApplicationCache object associated with the
> application cache with which this ApplicationCache object is
> associated."
> ...
> The same system-wide synchronization has to be applied for the
> remove(url) method.
> 
> The utility of the .length and .item(indx) method could be provided in 
> such a way that this awkwardness could be avoided.
> 
> Some ideas...
> bool contains(url);
> string[] getItems();

On Wed, 8 Oct 2008, Michael Nordman wrote:
> 
> Another idea, getItems() wouldn't work well with very large collections
> 
> void forEachItem(callback);  // iteration terminates if the callback
> returns false or throws

On Tue, 14 Oct 2008, Michael Nordman wrote:
>
> Another way to address this would be to redefine the semantics of 
> .length and .item(indx) such that the underlying collection was not 
> required to appear immutable till scripts ran to completion. Embrace the 
> fact that the collection is shared across many pages and that it can 
> change at any time. A .lastModifiedDate property could be exposed which 
> would allow pages to detect when a change had occurred.

I went with the first of these proposals.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'