[whatwg] Zip archives as first-class citizens
ericu at chromium.org
Wed Aug 28 08:54:21 PDT 2013
Again from the right address...
On Wed, Aug 28, 2013 at 8:47 AM, Eric U <ericu at google.com> wrote:
> Without commenting on the other parts of the proposal, let me just
> mention that every time .zip support comes up, we notice that it's not
> a great web archive format because it's not streamable. That is, you
> can't actually use any of the contents until you've downloaded the
> whole file.
> Perhaps some other archive format would be a better fit for the web?
> [Before you respond that it's streamable, please look in the archives
> for the rebuttal.]
> On Wed, Aug 28, 2013 at 6:32 AM, Anne van Kesteren <annevk at annevk.nl> wrote:
>> A couple of us have been toying around with the idea of making zip
>> archives first-class citizens on the web. What we want to support:
>> * Package a bunch of related resources together for a game or
>> applications (e.g. icons).
>> * Support self-contained packages, like Flash-ads or Flash-based games.
>> Using zip archives for this makes sense as it has broad tooling
>> support. To lower adoption cost no special configuration should be
>> needed. Existing zip archives should be able to fit right in.
>> The above means we need URLs for zip archives. That is:
>> <img src="... test.zip ... image.gif">
>> should work. As well as
>> <iframe src="... test.zip ... test.html"></iframe>
>> and test.html should be able to contain URLs that reference other
>> resources inside the zip archive.
>> We have thought of three approaches for zip URL design thus far:
>> * Using a sub-scheme (zip) with a zip-path (after !):
>> * Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
>> * Using media fragments: http://www.example.org/zip#path=image.gif
>> High-level drawbacks:
>> * Sub-scheme: requires changing the URL syntax with both sub-scheme
>> and zip-path.
>> * Zip-path: requires changing the URL syntax.
>> * Fragments: fail to work well for URLs relative to a zip archive.
>> Fragments are conceptually the cleanest as the only part of a URL
>> that's supposed to depend on the Content-Type is the fragment.
>> However, if you want to link to an ID inside an HTML resource you'd
>> have to do #path=test.html&id=test which would require adding
>> knowledge to the HTML resource that it is contained in a zip archive
>> and have special processing based on that. And not just HTML, same
>> I'm not sure we need to consider sub-scheme if zip-path can work as
>> it's more complex and not very well thought out. E.g. imagine
>> view-source:zip:http://www.example.org/zip!test.html. (I hope we never
>> need to standardize view-source and that it can be restricted to the
>> address bar in browsers.)
>> zip-path makes zip archive packaging by far the easiest. If we use %!
>> as separator that would cause a network error in some existing
>> browsers (due to an illegal %), which means it's extensible there,
>> though not backwards compatible.
>> We'd adjust the URL parser to build a zip-path once %! is encountered.
>> And relative URLs would first look if there's a zip-path and work
>> against that, and use path otherwise.
>> Fetching would always use the path. If there's a zip-path and the
>> returned resource is not a zip archive it would cause a network error.
>> As for nested zip archives. Andrea suggested we should support this,
>> but that would require zip-path to be a sequence of paths. I think we
>> never went to allow relative URLs to escape the top-most zip archive.
>> But I suppose we could support in a way that
>> goes one level deeper. And "../image.gif" in test.html looks in the
>> enclosing zip. And "../../image.gif" in test.html looks in the
>> enclosing zip as well because it cannot ever be relative to the path,
>> only the zip-path.
More information about the whatwg