[whatwg] Archive API - proposal

Andrea Marchesini baku at mozilla.com
Wed Aug 15 04:24:29 PDT 2012


Thanks for your feedback.

When I was implementing the ArchiveAPI, my idea was to have a generic Archive API and not just a ZIP API.
Of course the current implementation supports just ZIP but in the future we could have support for more formats.

> This interface is problematic. Since ZIP files don't have a standard
> encoding, filenames in ZIPs are often garbage. This API requires
> that filenames round-trip uniquely, or else files aren't accessible
> t all. For example, if you have two filenames in CP932, "日" and "本",
> but the encoding isn't determined correctly, you may end up with two
> files both with a filename of "??". Either you can't open either
> file, or you can only open one of them. This isn't theoretical; I
> hit ZIP files like this in the wild regularly.

I agree. I was thinking that the default encoding for filenames is:
UTF-8. If filename is not a valid UTF-8 string we can use the caller-supplied encoding:

var reader = new ArchiveReader(blob, "Windows-1252");

If this fails, this filename/file will be excluded from the results.

> It should be possible to get the CRC32 of files, which ZIP stores in
> the central directory. This both allows the user to perform checksum
> verification himself if wanted, and all the other variously useful
> things about being able to get a file's checksum without having to
> read the whole file.

can we have 'generic' archive API supporting CRC32?

Andrea



More information about the whatwg mailing list