[whatwg] Archive API - proposal
Tobie Langel
tobie.langel at gmail.com
Tue Aug 14 14:05:27 PDT 2012
On Aug 14, 2012, at 21:21, Glenn Maynard <glenn at zewt.org> wrote:
> (I've reordered my responses to give a more logical progression.)
>
> On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini <baku at mozilla.com> wrote:
>
>> // The getFilenames handler receives a list of DOMString:
>> var handle = this.reader.getFile(this.result[i]);
>>
>
> This interface is problematic. Since ZIP files don't have a standard
> encoding, filenames in ZIPs are often garbage. This API requires that
> filenames round-trip uniquely, or else files aren't accessible t all. For
> example, if you have two filenames in CP932, "日" and "本", but the encoding
> isn't determined correctly, you may end up with two files both with a
> filename of "??". Either you can't open either file, or you can only open
> one of them. This isn't theoretical; I hit ZIP files like this in the wild
> regularly.
>
> Instead, I'd recommend that the primary API simply returns File objects
> directly from the ZIP. For example:
>
> var reader = archive.getFiles();
> reader.onsuccess = function(result) {
> // result = [File, File, File, File...];
>
> console.log(result[0].name);
> // read the file
> new FileReader(result[0]);
> }
>
> This allows opening files without any dependency on the filename. Since
> File objects are by design lightweight--no decompression should happen
> until you actually read from the file--this isn't expensive and won't
> perform any extra I/O. All the information you need to expose a File
> object is in the central directory (filename, mtime, decompressed size).
>
> I would like to receive feedback about this.. In particular:
>> . Do you think it can be useful?
>> . Do you see any limitation, any feature missing?
>>
>
> It should be possible to get the CRC32 of files, which ZIP stores in the
> central directory. This both allows the user to perform checksum
> verification himself if wanted, and all the other variously useful things
> about being able to get a file's checksum without having to read the whole
> file.
>
> (I don't think CRC32 checks should be performed automatically, since it's
> too hard for that to make sense when random access is involved.)
>
> // The ArchiveReader object works with Blob objects:
>> var archiveReader = new ArchiveReader(file);
>>
>> // Any request is asynchronous:
>>
>
> The only operation that needs to be asynchronous is creating the
> ArchiveReader itself. It should parse the ZIP central record before before
> returning a result. Once you've done that you can do the rest
> synchronously, because no further I/O is necessary until you actually read
> data from a file.
>
> This gives the following, simpler interface:
>
> var opener = new ZipOpener(file);
> opener.onerror = function() { console.error("Loading failed"); }
> opener.onsuccess = function(zipFile)
> {
> // .files is a FileList, representing each file in the archive.
> if(zipFile.files.length == 0) { console.error("ZIP file is empty");
> return; }
>
> var example_file = zipFile.files[0];
> console.log("The first filename is", example_file.name, "with an
> expected CRC of", example_file.expectedCRC);
>
> // Read from the file:
> var reader = new FileReader(example_file);
>
> // For convenience, add "getter File? (DOMString name)" to FileList, to
> find a file by name. This is equivalent
> // to iterating through files[] and comparing .name. If no match is
> found, return null. This could be a function
> // instead of a getter.
> var example_file2 = zipFile.files["file.txt"];
> if(example_file2 == null) { console.error("file.txt not found in ZIP";
> return; }
> }
>
> (To fit expectedCRC in there, it would actually need to use a subclass of
> File, not File itself.)
>
> This also eliminates an error condition (no getFile error callback), and
> since .files looks just like HTMLInputElement.files, it can be used
> directly with code written for it. For example, if you have a function
> "uploadAllFiles(files)", you can pass in both an <input type=file
> multiple>'s .input or a zipFile.files, and they'll both work.
How are nested directories handled in your counter proposal?
--tobie
More information about the whatwg
mailing list