[whatwg] HTML resource packages

Justin Lebar justin.lebar at gmail.com
Mon Aug 9 13:40:27 PDT 2010


> Can you provide the content of the page which you used in your whitepaper?
> (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820)

I'll post this to the bug when I get home tonight.  But your comments
are astute -- the page I used is a pretty bad benchmark for a variety
of reasons.  It sounds like you probably could hack up a much better
one.

>    a) Looks like pages were loaded exactly once, as per your notes?  How
> hard is it to run the tests long enough to get to a 95% confidence interval?

Since I was running on a simulated network with no random parameters
(e.g. no packet loss), there was very little variance in load time
across runs.

>    d) What did you do about subdomains in the test?  I assume your test
> loaded from one subdomain?

That's correct.

> I'm betting time-to-paint goes through the roof with resource bundles:-)

It does right now because we don't support incremental extraction,
which is why I didn't bother measuring time-to-paint.  The hope is
that with incremental extraction, we won't take too much of a hit.

-Justin

On Mon, Aug 9, 2010 at 1:30 PM, Mike Belshe <mike at belshe.com> wrote:
> Justin -
> Can you provide the content of the page which you used in your whitepaper?
> (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820)
> I have a few concerns about the benchmark:
>    a) Looks like pages were loaded exactly once, as per your notes?  How
> hard is it to run the tests long enough to get to a 95% confidence interval?
>    b) As you note in the report, slow start will kill you.  I've verified
> this so many times it makes me sick.  If you try more combinations, I
> believe you'll see this.
>    c) The 1.3MB of subresources in a single bundle seems unrealistic to me.
>  On one hand you say that its similar to CNN, but note that CNN has
> JS/CSS/images, not just thumbnails like your test.  Further, note that CNN
> pulls these resources from multiple domains; combining them into one domain
> may work, but certainly makes the test content very different from CNN.  So
> the claim that it is somehow representative seems incorrect.   For more
> accurate data on what websites look like,
> see http://code.google.com/speed/articles/web-metrics.html
>    d) What did you do about subdomains in the test?  I assume your test
> loaded from one subdomain?
>    e) There is more to a browser than page-load-time.  Time-to-first-paint
> is critical as well.  For instance, in WebKit and Chrome, we have specific
> heuristics which optimize for time-to-render instead of total page load.
>  CNN is always cited as a "bad page", but it's really not - it just has a
> lot of content, both below and above the fold.  When the user can interact
> with the page successfully, the user is happy.  In other words, I know I can
> make webkit's PLT much faster by removing a couple of throttles.  But I also
> know that doing so worsens the user experience by delaying the time to first
> paint.  So - is it possible to measure both times?  I'm betting
> time-to-paint goes through the roof with resource bundles:-)
> If you provide the content, I'll try to run some tests.  It will take a few
> days.
> Mike
>
> On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar <justin.lebar at gmail.com> wrote:
>>
>> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com>
>> wrote:
>> > If UAs can assume that files with the same path
>> > are the same regardless of whether they came from a resource package
>> > or which, and they have all but a couple of the files cached, they
>> > could request those directly instead of from the resource package,
>> > even if a resource package is specified.
>>
>> These kinds of heuristics are far beyond the scope of resource
>> packages as we're planning to implement them.  Again, I think this
>> type of behavior is the domain of a large change to the networking
>> stack, such as SPDY, not a small hack like resource packages.
>>
>> -Justin
>>
>> On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor <Simetrical+w3c at gmail.com>
>> wrote:
>> > On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar <justin.lebar at gmail.com>
>> > wrote:
>> >> I think this is a fair point.  But I'd suggest we consider the
>> >> following:
>> >>
>> >> * It might be confusing for resources from a resource package to show
>> >> up on a page which doesn't "opt-in" to resource packages in general or
>> >> to that specific resource package.
>> >
>> > Only if the resource package contains a different file from the real
>> > one.  I suggest we treat this as a pathological case and accept that
>> > it will be broken and confusing -- or at least we consider how many
>> > extra optimizations we could make if we did accept that, before
>> > deciding whether the extra performance is worth the confusion.
>> >
>> >> * There's no easy way to opt out of this behavior.  That is, if I
>> >> explicitly *don't* want to load content cached from a resource
>> >> package, I have to name that content differently.
>> >
>> > Why would you want that, if the files are the same anyway?
>> >
>> >> * The avatars-on-a-forum use case is less convincing the more I think
>> >> about it.  Certainly you'd want each page which displays many avatars
>> >> to package up all the avatars into a single package.  So you wouldn't
>> >> benefit from the suggested caching changes on those pages.
>> >
>> > I don't see why not.  If UAs can assume that files with the same path
>> > are the same regardless of whether they came from a resource package
>> > or which, and they have all but a couple of the files cached, they
>> > could request those directly instead of from the resource package,
>> > even if a resource package is specified.  So if twenty different
>> > people post on the page, and you've been browsing for a while and have
>> > eighteen of their avatars (this will be common, a handful of people
>> > tend to account for most posts in a given forum):
>> >
>> > 1) With no resource packages, you fetch two separate avatars (but on
>> > earlier page views you suffered).
>> >
>> > 2) With resource packages as you suggest, you fetch a whole resource
>> > package, 90% of which you don't need.  In fact, you have to fetch a
>> > resource package even if you have 100% of the avatars on the page!  No
>> > two pages will be likely to have the same resource package, so you
>> > can't share cache at all.
>> >
>> > 3) With resource packages as I suggest, you fetch only two separate
>> > avatars, *and* you got the benefits of resource packages on earlier
>> > pages.  The UA gets to guess whether using resource packages would be
>> > a win on a case-by-case basis, so in particular, it should be able to
>> > perform strictly better than either (1) or (2), given decent
>> > heuristics.  E.g., the heuristic "fetch the resource package if I need
>> > at least two files, fetch the file if I only need one" will perform
>> > better than either (1) or (2) in any reasonable circumstance.
>> >
>> > I think this sort of situation will be fairly common.  Has anyone
>> > looked at a bunch of different types of web pages and done a breakdown
>> > of how many assets they have, and how they're reused across pages?  If
>> > we're talking about assets that are used only on one page (image
>> > search) or all pages (logos, shared scripts), your approach works
>> > fine, but not if they're used on a random mix of pages.  I think a lot
>> > of files will wind up being used on only particular subsets of pages.
>> >
>> >> In general, I think we need something like SPDY to really address the
>> >> problem of duplicated downloads.  I don't think resource packages can
>> >> fix it with any caching policy.
>> >
>> > Certainly there are limits to what resource packages can do, but we
>> > can wind up closer to the limits or farther from them depending on the
>> > implementation details.
>> >
>
>



More information about the whatwg mailing list