[whatwg] AppCache-related e-mails

Felix Halim felix.halim at gmail.com
Wed Jun 29 02:27:37 PDT 2011


On Thu, Jun 9, 2011 at 3:21 AM, Ian Hickson <ian at hixie.ch> wrote:
> If you're not loading the main page from the cache, what does this gain
> you that regular HTTP caching doesn't?

Suppose the content of the main page change very often (like news site).
In this case, you don't want to cache the main page since the users
want to see the latest main page, not the cached ones when they open
the main page later.
However, should the network connectivity is down, the user should be
presented with the cached main page.

This problem can be solved by having the main page to NOT include the
news content, but only a static template.
The news content is fetched dynamically through XHR and stored in localStorage.
However, this complicates the news site (a major redesign of the
website is necessary).

It would be far easier if there is an option in the MANIFEST file to
NOT CACHE the main page.
So that the behavior is exactly like caching, but it is far stronger,
since the rest of the resources (css, js, images, etc... are never
re-fetched from the network).
The current HTTP Caching still checks whether the resources are
modified, but in app cache, we can explicitly say that they are not
modified unless we change the manifest hash.

So, in this case, HTML5 App Cache can help make regular online
websites far faster, as well as provide offline access should the
network is down (or the server is down).
This would make the online news site feels online when it's online and
offline when it's offline. I don't think HTTP Cache can serve the
content if the network / server is down.

If the main page is always cached, then the next time the user visits
the main page, it will (almost) always see the STALE content of the
main page.
Then a split second later, the main page refreshes with the most
up-to-date version, which is very annoying to the users.


> On Mon, 14 Feb 2011, Felix Halim wrote:
>>
>> I have a use case where it is preferable that the main page is not
>> cached:
>>
>> Suppose you have a main page that changes based on it's ID:
>>
>> http://example.com/page.php?id=10
>>
>> The appCache will store each main page with different id in separate
>> cache, which is undesirable! And we DON'T want to cache the main pages,
>> since the content differs significantly (think of it as a forum
>> website).
>
> The idea of the appcache feature is to enable offline usage. If you don't
> want it cached, how is it going to work offline?

It will work offline when the network or the server is down?
In such case, the latest (cached) main page is shown.

I wasn't very clear when I say "the main page should not be cached".
I was saying, we should still keep the main page cached,
but always show the online (non cached) main page if the network and
the server are alive.


>> The main goal here is NOT to make the page offline, but to cache the
>> resources that the page uses (i.e, .js, .css, images, etc...) that are
>> very likely to be IMMUTABLE (particularly the jQuery.js and jQueryUI
>> css+images that almost every sites uses!).
>
> Appcache only adds one feature: The ability to work offline.
>
> Everything else that appcache does is already possible with regular HTTP
> caching.
>
> So if you don't want to work offline, just use regular HTTP caching.


HTTP Caching requires server modifications on altering the headers and
is a non option for users that have no control on the server side.
Also, many servers are mostly mis-configured on how to send the
correct headers and some proxies may alter them on its way to the
client.

It would be great to be able to specify what to CACHE and what not in
the MANIFEST in the HTML file no matter what HTTP Caching says!

HTML5 App Cache here works as the complement for web-developers that
cannot do HTTP Caching.

Moreover, some HTTP Caching strategies do requires round-trip to the
servers which can be hundred of milliseconds slower!
If we specify everything in the manifest file, no such round-trip ever
necessary.

In fact, we can do even better than that by not fetching the MANIFEST
itself by including an (optional) manifest's HASH inside the HTML
like:

<html useManifest="my.manifest" manifestHash="asdfasdfasd">

If not specified, then the my.manifest will always be checked for modifications.


>> Or i would like to update this file, or any file else, i would like to
>> update, on demand.
>
> Not sure what this means.

I think it means that we should be able to selectively update any file
in the manifest,
rather than blindly updating everything if the manifest's hash changes.

The ability to selectively update the cached files is very appealing.
If your resources are 5 MB, and you know you only want to update on a
small file of 1KB...

I believe the way the current App Cache updates everything if the
manifest file changes is just too inefficient.
You can say it can be no worse than HTTP Caching, but it can be made far better!


>> The application cache is very powerful. But it is very disappointing,
>> that it is only useful for static pages. With a little improvement to
>> the Offline Web applications chapter, and of course to the browsers, it
>> would be possible to cache any Content Manager or dynamic page. And that
>> would let the appcache become one of the most powerful things in the
>> world.
>
> HTTP caches already do most of this.

It's far harder to setup HTTP Cache properly, than a simple manifest file.
Even we setup HTTP Cache properly, it may still not work properly if
there are proxies.
HTTP Cache is very fragile and not reliable.



>> I could read my Joomla! offline, could update the cached files, if i
>> want to, on a click or if the cache expires. I could let the half of the
>> CMS load from the cache. But for that, the index.php, where the manifest
>> is, has to be updateable. Correct me, if i am wrong. But this is not
>> possible today, the master file can not be influenced. And there is no
>> expiration or a possibility to update or manipulate the cache and even
>> no way to find out which files are cached, what would let me/us have
>> control over the Offline Web application.
>
> I'm not sure I really follow here.
>
> I don't really understand how offline access would work if we're not
> caching the main file...

The main file is still cached, but not shown, unless the network is
down or the server is down.
If the network and the server is alive, the previously cached main
file should not be used (use the latest online main file instead).


> On Fri, 1 Apr 2011, Edward Gerhold wrote:
>>
>> The appCache is not ready for storing dynamic data. This could be done
>> by the user by simply pressing a "cache this" button or a link or some
>> other function in a script.
>
> What do you mean by "dynamic data"?

Everything in App Cache is Static.
If the main page is always changing from time to time, using the App
Cache will make the site experience very ugly.
That is, it will always show the stale version of the main website.
Only when the user refreshes later time (usually few seconds later
after the App Cache is updated) then the user will see the latest
version.

This "Dynamic Data" inside the main page is THE MAIN reason many
people DON'T WANT the App Cache to CACHE the main page!

Of course you can then say you should separate the "dynamic" from the
"static" and store the "dynamic" in the localStorage / indexedDB...
However, this is NOT what the current majority of websites like
forums, blogs, news sites were designed!
In order to separate the dynamic from the static, a MAJOR OVERHAUL of
the site is necessary.
I don't think the world would care to put a lot of effort just to make
their site offline in "clean" manner.

The easiest, is to give the App Cache to present the main page online
if the network and the server is online,
and show the cached version of the main page if either the network or
the server is offline.
Most people will (mistakenly) say the above sentence as "DO NOT CACHE
THE MAIN PAGE".





>> The current App Cache design updates the cache to the latest version in
>> the background when the user visit the page for the second time and then
>> it needs to refresh the page to actually update the display. This is
>> annoying since the user will first see stale data, then a few second
>> later, it's updated with a giant refresh (including all the static
>> resources).
>
> You shouldn't store data in the appcache, only logic, otherwise yes, the
> user will always be one version behind.
>
> Note that there is no giant refresh unless the page makes it so.


The page or the user MUST do giant refresh, otherwise the user do not
see the latest main page!


>> This is because the App Cache is too COARSE grained. It doesn't know
>> what actually changes (which data are static, which data are dynamic).
>
> Right. It uses regular HTTP semantics to update the cache.

Which can be improved!


>> That is another reason why we need pageStorage: to separate the dynamic
>> and the static resources.
>
> Don't we already have enough ways to store data?

pageStorage Quota is different from localStorage.
localStorage Quota is per domain, while pageStorage is per page.
one page may have entirely different unrelated dynamic data than
another page on the same domain.
Their quota should be separated, otherwise the localStorage domain
quota will be too small if there are many pages in that domain.

This can give the browsers options to give quota based on PAGE rather
than based on DOMAIN.
Which I think is more reasonable if each PAGE is unique even though
they are in the same DOMAIN.

Felix Halim


More information about the whatwg mailing list