[whatwg] Archive API - proposal

Wed Aug 15 04:14:15 PDT 2012

On Tue, Aug 14, 2012 at 11:20 PM, Glenn Maynard <glenn at zewt.org> wrote:
> On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini <baku at mozilla.com> wrote:
>
>> // The getFilenames handler receives a list of DOMString:
>> var handle = this.reader.getFile(this.result[i]);
>
> This interface is problematic.  Since ZIP files don't have a standard
> encoding, filenames in ZIPs are often garbage.  This API requires that
> filenames round-trip uniquely, or else files aren't accessible t all.

Indeed, in the case of zip files, file names themselves are dangerous
as handles that get past passed back and forth, so it seems like a
good idea to be able to extract the contents of a file inside the
archive without having to address the file by name.

As for the filenames, after an off-list discussion, I think the best
solution is that UTF-8 is tried first but the ArchiveReader
constructor takes an optional second argument that names a character
encoding from the Encoding Standard. This will be known as the
fallback encoding. If no fallback encoding is provided by the caller
of the constructor, "Windows-1252" is set as the fallback encoding.
When it ArchiveReader processes a filename from the zip archive, it
first tests if the byte string is a valid UTF-8 string. If it is, the
byte string is interpreted as UTF-8 when converting to UTF-16. If the
filename is not a valid UTF-8 string, it is decoded into UTF-16 using
the fallback encoding.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/