[whatwg] Zip archives as first-class citizens
Gordon P. Hemsley
me at gphemsley.org
Wed Aug 28 07:36:14 PDT 2013
On 8/28/13 9:32 AM, Anne van Kesteren wrote:
> We have thought of three approaches for zip URL design thus far:
>
> * Using a sub-scheme (zip) with a zip-path (after !):
> zip:http://www.example.org/zip!image.gif
> * Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
> * Using media fragments: http://www.example.org/zip#path=image.gif
>
> High-level drawbacks:
>
> * Sub-scheme: requires changing the URL syntax with both sub-scheme
> and zip-path.
> * Zip-path: requires changing the URL syntax.
> * Fragments: fail to work well for URLs relative to a zip archive.
>
> Fragments are conceptually the cleanest as the only part of a URL
> that's supposed to depend on the Content-Type is the fragment.
> However, if you want to link to an ID inside an HTML resource you'd
> have to do #path=test.html&id=test which would require adding
> knowledge to the HTML resource that it is contained in a zip archive
> and have special processing based on that. And not just HTML, same
> goes for CSS or JavaScript.
>
> I'm not sure we need to consider sub-scheme if zip-path can work as
> it's more complex and not very well thought out. E.g. imagine
> view-source:zip:http://www.example.org/zip!test.html. (I hope we never
> need to standardize view-source and that it can be restricted to the
> address bar in browsers.)
>
> zip-path makes zip archive packaging by far the easiest. If we use %!
> as separator that would cause a network error in some existing
> browsers (due to an illegal %), which means it's extensible there,
> though not backwards compatible.
>
> We'd adjust the URL parser to build a zip-path once %! is encountered.
> And relative URLs would first look if there's a zip-path and work
> against that, and use path otherwise.
>
> Fetching would always use the path. If there's a zip-path and the
> returned resource is not a zip archive it would cause a network error.
>
> As for nested zip archives. Andrea suggested we should support this,
> but that would require zip-path to be a sequence of paths. I think we
> never went to allow relative URLs to escape the top-most zip archive.
> But I suppose we could support in a way that
>
> %!test.zip!test.html
>
> goes one level deeper. And "../image.gif" in test.html looks in the
> enclosing zip. And "../../image.gif" in test.html looks in the
> enclosing zip as well because it cannot ever be relative to the path,
> only the zip-path.
>
As the following URLs suggest, the %! (or %-anything) will likely not
work for ZIP files generated by a script using the query portion of the
URL, as the path information will be subsumed into the last value
without causing a network error:
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%!example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1%/example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1?example.png
(And feel free to use that script to try out any other combos.)
However, since fragments (i.e. anything beginning with '#') are already
not sent to the server, what if you modified the URL parser to use a
special hash-prefix combo that indicates the path? Then you could avoid
the problem of having to make documents aware of the fact that they're
in a ZIP because the hash-prefix combo would come before the plain hash
which holds the ID.
So, for example:
http://whatwg.gphemsley.org/url_test.php?file=test.zip&spacer=1#/example.html#middle
Then you could also take the opportunity to spec the #! prefix (and
other hash-combo prefixes) that is used by a lot of sites nowadays.
--
Gordon P. Hemsley
me at gphemsley.org
http://gphemsley.org/
More information about the whatwg
mailing list