[whatwg] Archive API - proposal

Tobie Langel tobie.langel at gmail.com
Tue Aug 14 14:05:27 PDT 2012


On Aug 14, 2012, at 21:21, Glenn Maynard <glenn at zewt.org> wrote:

> (I've reordered my responses to give a more logical progression.)
>
> On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini <baku at mozilla.com> wrote:
>
>> // The getFilenames handler receives a list of DOMString:
>> var handle = this.reader.getFile(this.result[i]);
>>
>
> This interface is problematic.  Since ZIP files don't have a standard
> encoding, filenames in ZIPs are often garbage.  This API requires that
> filenames round-trip uniquely, or else files aren't accessible t all.  For
> example, if you have two filenames in CP932, "日" and "本", but the encoding
> isn't determined correctly, you may end up with two files both with a
> filename of "??".  Either you can't open either file, or you can only open
> one of them.  This isn't theoretical; I hit ZIP files like this in the wild
> regularly.
>
> Instead, I'd recommend that the primary API simply returns File objects
> directly from the ZIP.  For example:
>
> var reader = archive.getFiles();
> reader.onsuccess = function(result) {
>    // result = [File, File, File, File...];
>
>    console.log(result[0].name);
>    // read the file
>    new FileReader(result[0]);
> }
>
> This allows opening files without any dependency on the filename.  Since
> File objects are by design lightweight--no decompression should happen
> until you actually read from the file--this isn't expensive and won't
> perform any extra I/O.  All the information you need to expose a File
> object is in the central directory (filename, mtime, decompressed size).
>
> I would like to receive feedback about this.. In particular:
>> . Do you think it can be useful?
>> . Do you see any limitation, any feature missing?
>>
>
> It should be possible to get the CRC32 of files, which ZIP stores in the
> central directory.  This both allows the user to perform checksum
> verification himself if wanted, and all the other variously useful things
> about being able to get a file's checksum without having to read the whole
> file.
>
> (I don't think CRC32 checks should be performed automatically, since it's
> too hard for that to make sense when random access is involved.)
>
>  // The ArchiveReader object works with Blob objects:
>>  var archiveReader = new ArchiveReader(file);
>>
>>  // Any request is asynchronous:
>>
>
> The only operation that needs to be asynchronous is creating the
> ArchiveReader itself.  It should parse the ZIP central record before before
> returning a result.  Once you've done that you can do the rest
> synchronously, because no further I/O is necessary until you actually read
> data from a file.
>
> This gives the following, simpler interface:
>
> var opener = new ZipOpener(file);
> opener.onerror = function() { console.error("Loading failed"); }
> opener.onsuccess = function(zipFile)
> {
>    // .files is a FileList, representing each file in the archive.
>    if(zipFile.files.length == 0) { console.error("ZIP file is empty");
> return; }
>
>    var example_file = zipFile.files[0];
>    console.log("The first filename is", example_file.name, "with an
> expected CRC of", example_file.expectedCRC);
>
>    // Read from the file:
>    var reader = new FileReader(example_file);
>
>    // For convenience, add "getter File? (DOMString name)" to FileList, to
> find a file by name.  This is equivalent
>    // to iterating through files[] and comparing .name.  If no match is
> found, return null.  This could be a function
>    // instead of a getter.
>    var example_file2 = zipFile.files["file.txt"];
>    if(example_file2 == null) { console.error("file.txt not found in ZIP";
> return; }
> }
>
> (To fit expectedCRC in there, it would actually need to use a subclass of
> File, not File itself.)
>
> This also eliminates an error condition (no getFile error callback), and
> since .files looks just like HTMLInputElement.files, it can be used
> directly with code written for it.  For example, if you have a function
> "uploadAllFiles(files)", you can pass in both an <input type=file
> multiple>'s .input or a zipFile.files, and they'll both work.

How are nested directories handled in your counter proposal?

--tobie



More information about the whatwg mailing list