[whatwg] HTML resource packages

Mon Aug 9 22:44:18 PDT 2010

The files I used for the rough benchmarks are available in a tarball
at [1].  Live pages are at [2] and [3].

[1] http://people.mozilla.org/~jlebar/respkg/test/benchmark_files.tgz
[2] http://people.mozilla.org/~jlebar/respkg/test/test-pkg.html
[3] http://people.mozilla.org/~jlebar/respkg/test/test-nopkg.html

-Justin

On Mon, Aug 9, 2010 at 1:40 PM, Justin Lebar <justin.lebar at gmail.com> wrote:
>> Can you provide the content of the page which you used in your whitepaper?
>> (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820)
>
> I'll post this to the bug when I get home tonight.  But your comments
> are astute -- the page I used is a pretty bad benchmark for a variety
> of reasons.  It sounds like you probably could hack up a much better
> one.
>
>>    a) Looks like pages were loaded exactly once, as per your notes?  How
>> hard is it to run the tests long enough to get to a 95% confidence interval?
>
> Since I was running on a simulated network with no random parameters
> (e.g. no packet loss), there was very little variance in load time
> across runs.
>
>>    d) What did you do about subdomains in the test?  I assume your test
>> loaded from one subdomain?
>
> That's correct.
>
>> I'm betting time-to-paint goes through the roof with resource bundles:-)
>
> It does right now because we don't support incremental extraction,
> which is why I didn't bother measuring time-to-paint.  The hope is
> that with incremental extraction, we won't take too much of a hit.
>
> -Justin
>
> On Mon, Aug 9, 2010 at 1:30 PM, Mike Belshe <mike at belshe.com> wrote:
>> Justin -
>> Can you provide the content of the page which you used in your whitepaper?
>> (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820)
>> I have a few concerns about the benchmark:
>>    a) Looks like pages were loaded exactly once, as per your notes?  How
>> hard is it to run the tests long enough to get to a 95% confidence interval?
>>    b) As you note in the report, slow start will kill you.  I've verified
>> this so many times it makes me sick.  If you try more combinations, I
>> believe you'll see this.
>>    c) The 1.3MB of subresources in a single bundle seems unrealistic to me.
>>  On one hand you say that its similar to CNN, but note that CNN has
>> JS/CSS/images, not just thumbnails like your test.  Further, note that CNN
>> pulls these resources from multiple domains; combining them into one domain
>> may work, but certainly makes the test content very different from CNN.  So
>> the claim that it is somehow representative seems incorrect.   For more
>> accurate data on what websites look like,
>> see http://code.google.com/speed/articles/web-metrics.html
>>    d) What did you do about subdomains in the test?  I assume your test
>> loaded from one subdomain?
>>    e) There is more to a browser than page-load-time.  Time-to-first-paint
>> is critical as well.  For instance, in WebKit and Chrome, we have specific
>> heuristics which optimize for time-to-render instead of total page load.
>>  CNN is always cited as a "bad page", but it's really not - it just has a
>> lot of content, both below and above the fold.  When the user can interact
>> with the page successfully, the user is happy.  In other words, I know I can
>> make webkit's PLT much faster by removing a couple of throttles.  But I also
>> know that doing so worsens the user experience by delaying the time to first
>> paint.  So - is it possible to measure both times?  I'm betting
>> time-to-paint goes through the roof with resource bundles:-)
>> If you provide the content, I'll try to run some tests.  It will take a few
>> days.
>> Mike
>>
>> On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar <justin.lebar at gmail.com> wrote:
>>>
>>> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com>
>>> wrote:
>>> > If UAs can assume that files with the same path
>>> > are the same regardless of whether they came from a resource package
>>> > or which, and they have all but a couple of the files cached, they
>>> > could request those directly instead of from the resource package,
>>> > even if a resource package is specified.
>>>
>>> These kinds of heuristics are far beyond the scope of resource
>>> packages as we're planning to implement them.  Again, I think this
>>> type of behavior is the domain of a large change to the networking
>>> stack, such as SPDY, not a small hack like resource packages.
>>>
>>> -Justin
>>>
>>> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com>
>>> wrote:
>>> > On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar <justin.lebar at gmail.com>
>>> > wrote:
>>> >> I think this is a fair point.  But I'd suggest we consider the
>>> >> following:
>>> >>
>>> >> * It might be confusing for resources from a resource package to show
>>> >> up on a page which doesn't "opt-in" to resource packages in general or
>>> >> to that specific resource package.
>>> >
>>> > Only if the resource package contains a different file from the real
>>> > one.  I suggest we treat this as a pathological case and accept that
>>> > it will be broken and confusing -- or at least we consider how many
>>> > extra optimizations we could make if we did accept that, before
>>> > deciding whether the extra performance is worth the confusion.
>>> >
>>> >> * There's no easy way to opt out of this behavior.  That is, if I
>>> >> explicitly *don't* want to load content cached from a resource
>>> >> package, I have to name that content differently.
>>> >
>>> > Why would you want that, if the files are the same anyway?
>>> >
>>> >> * The avatars-on-a-forum use case is less convincing the more I think
>>> >> about it.  Certainly you'd want each page which displays many avatars
>>> >> to package up all the avatars into a single package.  So you wouldn't
>>> >> benefit from the suggested caching changes on those pages.
>>> >
>>> > I don't see why not.  If UAs can assume that files with the same path
>>> > are the same regardless of whether they came from a resource package
>>> > or which, and they have all but a couple of the files cached, they
>>> > could request those directly instead of from the resource package,
>>> > even if a resource package is specified.  So if twenty different
>>> > people post on the page, and you've been browsing for a while and have
>>> > eighteen of their avatars (this will be common, a handful of people
>>> > tend to account for most posts in a given forum):
>>> >
>>> > 1) With no resource packages, you fetch two separate avatars (but on
>>> > earlier page views you suffered).
>>> >
>>> > 2) With resource packages as you suggest, you fetch a whole resource
>>> > package, 90% of which you don't need.  In fact, you have to fetch a
>>> > resource package even if you have 100% of the avatars on the page!  No
>>> > two pages will be likely to have the same resource package, so you
>>> > can't share cache at all.
>>> >
>>> > 3) With resource packages as I suggest, you fetch only two separate
>>> > avatars, *and* you got the benefits of resource packages on earlier
>>> > pages.  The UA gets to guess whether using resource packages would be
>>> > a win on a case-by-case basis, so in particular, it should be able to
>>> > perform strictly better than either (1) or (2), given decent
>>> > heuristics.  E.g., the heuristic "fetch the resource package if I need
>>> > at least two files, fetch the file if I only need one" will perform
>>> > better than either (1) or (2) in any reasonable circumstance.
>>> >
>>> > I think this sort of situation will be fairly common.  Has anyone
>>> > looked at a bunch of different types of web pages and done a breakdown
>>> > of how many assets they have, and how they're reused across pages?  If
>>> > we're talking about assets that are used only on one page (image
>>> > search) or all pages (logos, shared scripts), your approach works
>>> > fine, but not if they're used on a random mix of pages.  I think a lot
>>> > of files will wind up being used on only particular subsets of pages.
>>> >
>>> >> In general, I think we need something like SPDY to really address the
>>> >> problem of duplicated downloads.  I don't think resource packages can
>>> >> fix it with any caching policy.
>>> >
>>> > Certainly there are limits to what resource packages can do, but we
>>> > can wind up closer to the limits or farther from them depending on the
>>> > implementation details.
>>> >
>>
>>
>