[whatwg] HTML resource packages

Tue Aug 10 11:40:30 PDT 2010

On Mon, Aug 9, 2010 at 1:40 PM, Justin Lebar <justin.lebar at gmail.com> wrote:

> > Can you provide the content of the page which you used in your
> whitepaper?
> > (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820)
>
> I'll post this to the bug when I get home tonight.  But your comments
> are astute -- the page I used is a pretty bad benchmark for a variety
> of reasons.  It sounds like you probably could hack up a much better
> one.
>
> >    a) Looks like pages were loaded exactly once, as per your notes?  How
> > hard is it to run the tests long enough to get to a 95% confidence
> interval?
>
> Since I was running on a simulated network with no random parameters
> (e.g. no packet loss), there was very little variance in load time
> across runs.
>

I suspect you are right.  Still, it's good due diligence - especially for a
whitepaper :-)  The good news is that if it really is consistent, then it
should be easy...

>
> >    d) What did you do about subdomains in the test?  I assume your test
> > loaded from one subdomain?
>
> That's correct.
>
> > I'm betting time-to-paint goes through the roof with resource bundles:-)
>
> It does right now because we don't support incremental extraction,
> which is why I didn't bother measuring time-to-paint.  The hope is
> that with incremental extraction, we won't take too much of a hit.
>

Well, here is the crux then.

What should browsers optimize for?  Should we take performance features
which optimize for PLT or time-to-first-paint or something else?  I have
spent a *ton* of time trying to answer this question (as have many others),
and this is just a tough one to answer.

For now, I believe the Chrome/WebKit teams are in agreement that sacrificing
time-to-first render to decrease PLT is a bad idea.  I'm not sure what the
firefox philosophy here is?

One thing we can do to better evaluate features is to simply always measure
both metrics.  If both metrics get better, then it is a clear win.  But
without recording both metrics, we just don't really know how to evaluate if
a feature is good or bad.

Sorry to send you through more work - I am not trying to nix your feature
:-(  I think it is great you are taking the time to study all of this.

Mike

>
> -Justin
>
> On Mon, Aug 9, 2010 at 1:30 PM, Mike Belshe <mike at belshe.com> wrote:
> > Justin -
> > Can you provide the content of the page which you used in your
> whitepaper?
> > (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820)
> > I have a few concerns about the benchmark:
> >    a) Looks like pages were loaded exactly once, as per your notes?  How
> > hard is it to run the tests long enough to get to a 95% confidence
> interval?
> >    b) As you note in the report, slow start will kill you.  I've verified
> > this so many times it makes me sick.  If you try more combinations, I
> > believe you'll see this.
> >    c) The 1.3MB of subresources in a single bundle seems unrealistic to
> me.
> >  On one hand you say that its similar to CNN, but note that CNN has
> > JS/CSS/images, not just thumbnails like your test.  Further, note that
> CNN
> > pulls these resources from multiple domains; combining them into one
> domain
> > may work, but certainly makes the test content very different from CNN.
>  So
> > the claim that it is somehow representative seems incorrect.   For more
> > accurate data on what websites look like,
> > see http://code.google.com/speed/articles/web-metrics.html
> >    d) What did you do about subdomains in the test?  I assume your test
> > loaded from one subdomain?
> >    e) There is more to a browser than page-load-time.
>  Time-to-first-paint
> > is critical as well.  For instance, in WebKit and Chrome, we have
> specific
> > heuristics which optimize for time-to-render instead of total page load.
> >  CNN is always cited as a "bad page", but it's really not - it just has a
> > lot of content, both below and above the fold.  When the user can
> interact
> > with the page successfully, the user is happy.  In other words, I know I
> can
> > make webkit's PLT much faster by removing a couple of throttles.  But I
> also
> > know that doing so worsens the user experience by delaying the time to
> first
> > paint.  So - is it possible to measure both times?  I'm betting
> > time-to-paint goes through the roof with resource bundles:-)
> > If you provide the content, I'll try to run some tests.  It will take a
> few
> > days.
> > Mike
> >
> > On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar <justin.lebar at gmail.com>
> wrote:
> >>
> >> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com<Simetrical%2Bw3c at gmail.com>
> >
> >> wrote:
> >> > If UAs can assume that files with the same path
> >> > are the same regardless of whether they came from a resource package
> >> > or which, and they have all but a couple of the files cached, they
> >> > could request those directly instead of from the resource package,
> >> > even if a resource package is specified.
> >>
> >> These kinds of heuristics are far beyond the scope of resource
> >> packages as we're planning to implement them.  Again, I think this
> >> type of behavior is the domain of a large change to the networking
> >> stack, such as SPDY, not a small hack like resource packages.
> >>
> >> -Justin
> >>
> >> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com<Simetrical%2Bw3c at gmail.com>
> >
> >> wrote:
> >> > On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar <justin.lebar at gmail.com>
> >> > wrote:
> >> >> I think this is a fair point.  But I'd suggest we consider the
> >> >> following:
> >> >>
> >> >> * It might be confusing for resources from a resource package to show
> >> >> up on a page which doesn't "opt-in" to resource packages in general
> or
> >> >> to that specific resource package.
> >> >
> >> > Only if the resource package contains a different file from the real
> >> > one.  I suggest we treat this as a pathological case and accept that
> >> > it will be broken and confusing -- or at least we consider how many
> >> > extra optimizations we could make if we did accept that, before
> >> > deciding whether the extra performance is worth the confusion.
> >> >
> >> >> * There's no easy way to opt out of this behavior.  That is, if I
> >> >> explicitly *don't* want to load content cached from a resource
> >> >> package, I have to name that content differently.
> >> >
> >> > Why would you want that, if the files are the same anyway?
> >> >
> >> >> * The avatars-on-a-forum use case is less convincing the more I think
> >> >> about it.  Certainly you'd want each page which displays many avatars
> >> >> to package up all the avatars into a single package.  So you wouldn't
> >> >> benefit from the suggested caching changes on those pages.
> >> >
> >> > I don't see why not.  If UAs can assume that files with the same path
> >> > are the same regardless of whether they came from a resource package
> >> > or which, and they have all but a couple of the files cached, they
> >> > could request those directly instead of from the resource package,
> >> > even if a resource package is specified.  So if twenty different
> >> > people post on the page, and you've been browsing for a while and have
> >> > eighteen of their avatars (this will be common, a handful of people
> >> > tend to account for most posts in a given forum):
> >> >
> >> > 1) With no resource packages, you fetch two separate avatars (but on
> >> > earlier page views you suffered).
> >> >
> >> > 2) With resource packages as you suggest, you fetch a whole resource
> >> > package, 90% of which you don't need.  In fact, you have to fetch a
> >> > resource package even if you have 100% of the avatars on the page!  No
> >> > two pages will be likely to have the same resource package, so you
> >> > can't share cache at all.
> >> >
> >> > 3) With resource packages as I suggest, you fetch only two separate
> >> > avatars, *and* you got the benefits of resource packages on earlier
> >> > pages.  The UA gets to guess whether using resource packages would be
> >> > a win on a case-by-case basis, so in particular, it should be able to
> >> > perform strictly better than either (1) or (2), given decent
> >> > heuristics.  E.g., the heuristic "fetch the resource package if I need
> >> > at least two files, fetch the file if I only need one" will perform
> >> > better than either (1) or (2) in any reasonable circumstance.
> >> >
> >> > I think this sort of situation will be fairly common.  Has anyone
> >> > looked at a bunch of different types of web pages and done a breakdown
> >> > of how many assets they have, and how they're reused across pages?  If
> >> > we're talking about assets that are used only on one page (image
> >> > search) or all pages (logos, shared scripts), your approach works
> >> > fine, but not if they're used on a random mix of pages.  I think a lot
> >> > of files will wind up being used on only particular subsets of pages.
> >> >
> >> >> In general, I think we need something like SPDY to really address the
> >> >> problem of duplicated downloads.  I don't think resource packages can
> >> >> fix it with any caching policy.
> >> >
> >> > Certainly there are limits to what resource packages can do, but we
> >> > can wind up closer to the limits or farther from them depending on the
> >> > implementation details.
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100810/b62327cd/attachment-0002.htm>