[whatwg] Drag-and-drop folders/files support with directory structure using DirectoryEntry

Wed Nov 16 15:58:14 PST 2011

On Wed, Nov 16, 2011 at 2:33 PM, Daniel Cheng <dcheng at chromium.org> wrote:
> I'm trying to better understand the use case for DataTransfer.entries.
> Using the example you listed in your first post, if I dragged those folders
> into a browser, I'd expect to see File objects with the following names in
> DataTransfer.files:
>    trip/1.jpg
>    trip/2.jpg
>    trip/3.jpg
>    halloween/a.jpg
>    halloween/b.jpg
>    tokyo/1.jpg
>    tokyo/2.jpg

> It seems like with that, a web app could implement a progress meter and
> handle subdirectories easily while using workers. What does the FileSystem
> API provide on top of that?

There's no chance to set up a progress meter--when you ask for
dataTransfer.files, the browser must give you the whole list.  If it's
lazily-implemented, and the list is long [or comes from a slow network
filesystem, etc.], your script will lock up until the browser finishes
its depth-first search.  If it's eagerly-implemented, that pause will
happen between when the user drops the files and when you get the
event.

What you would see in Kinuko's proposal would depend on what you'd
dragged in.  Kinuko, please correct me if I'm wrong here:

Did you just drag in your entire Photos folder [which happened to
contain 3 subdirs and 7 files]?
Then entries would hold a single DirectoryEntry, representing Photos.

Did you individually select and drag in 3 folders [trip, halloween,
tokyo] which all happened to be in Photos?
Then entries would hold 3 DirectoryEntries, one for each selected folder.

Did you select each of the 7 files?
Then entries would hold 7 FileEntries.

So if you dragged in directories, you could put up a progress meter
while iterating down through them to discover subdirectories and
files.

> Also, if a page caches a DirectoryEntry from entries, does that mean it can
> continuously poll the DirectoryEntry to see if the contents have changed to
> contain something interesting? That seems undesirable.

That remains to be decided.

> Daniel
>
> On Wed, Nov 16, 2011 at 10:21, Glenn Maynard <glenn at zewt.org> wrote:
>
>> On Wed, Nov 16, 2011 at 3:42 AM, Jonas Sicking <jonas at sicking.cc> wrote:
>>
>> > > That requires a full directory traversal in advance to find all of the
>> > > files, though; the tree could be very large.
>> >
>> > You need to do that anyway to implement the .files attribute, no?
>> >
>>
>> .files shouldn't recursively include all files inside directories.  (If you
>> actually select tens of thousands of files and drag them, then yes, but in
>> most cases when you have that many files, they're split into directories
>> and you don't normally drag them individually.)
>>
>>
>> On Wed, Nov 16, 2011 at 9:59 AM, Kinuko Yasuda <kinuko at chromium.org>
>> wrote:
>>
>> >  The unsandboxed storage and actual data doesn't belong to origin, but
>> > the 'origin-specific' concept can be applied to the filesystem
>> > namespace.
>> >
>> > I haven't thought about workers cases deeply yet, but am thinking that
>> > we should prohibit access to the dropped folders from the other pages
>> > than the one that received the drop event.
>>
>>
>> Access to a file should just be limited by whoever has an Entry object
>> pointing at it.  The Entry object is essentially a token granting access to
>> its associated file(s).
>>
>>
>>
>> > As for the entry URLs I'm planning to make the URLs to the dropped
>> entries
>> > and the filesystem
>> > namespace (that only contains the dropped files) expire when the page
>> > goes away, hoping this would largely simplify the lifetime and
>> > security issues.
>> >
>>
>> I don't think it's possible to do this correctly, because URLs created with
>> toURL have no equivalent to revokeObjectURL.  A long-running page has no
>> way to avoid "leaking" these references until the page exits.  Adding a
>> revoke method for toURL would essentially turn it into URL.createObjectURL.
>>
>> Needing to revoke URLs when dealing with worker communication also makes it
>> very hard for users to get it right.  For example, suppose a Window sends a
>> toURL-generated URL to a Worker.  How do you ensure that the URL is revoked
>> after the worker has received it and finished converting it back to an
>> Entry?  The Worker might be killed (eg. due to CPU quotas) at any time,
>> making avoiding resource leaks very hard.
>>
>> These are just the usual problems with manual resource management, which
>> should be avoided if at all possible.  We already have a mechanism that
>> cleanly avoids all of this, with structured clone and File.
>>
>>  > Off-hand, the main issue that directly affects reading is that most
>> > > non-Windows filesystems can store filenames which can't be represented
>> > by a
>> > > DOMString, such as invalid codepoints (most commonly mismatched
>> > encodings).
>> >
>> > How do they appear in File.name in existing .files approach?
>> >
>>
>> I don't have a Linux browser to check.  I'm guessing it won't inform us
>> much here, since that didn't have to worry about general file access.
>>
>> A naive solution in filesystem approach would be silently ignoring
>> > such files (probably bad) or having in-memory path mapping (would be
>> > slightly better).  For limited read-only drag-and-drop cases we
>> > wouldn't need to think about remapping and the mapping could just go
>> > away when the page goes away, so hopefully implementing such mapping
>> > wouldn't be that hard.
>> >
>>
>> There are probably some cases that we'll just have to accept will never
>> work perfectly, and design with that in mind.
>>
>> To take a common case, suppose a script does the following, a commonplace
>> method for safe file overwriting (relatively; the needed flush operations
>> don't exist here):
>>
>> 1. Create a file with the name filename + ".new".
>> 2. Write the new file contents to the file.
>> 3. Rename filename + ".new" to filename, overwriting the original file.
>>
>> This is a useful case: it's real-world--I've done this countless times--and
>> it's a case where unrepresentable filenames affects both reading and
>> writing, plus the auxiliary operation of renaming.
>>
>> I suppose the mapping approach could work here.  Associate the mapping with
>> the DirectoryEntry containing it, from invalid filenames to generated
>> filenames.  Then, if the invalid filename is "X", and the DOMString mapping
>> is "MAPPING1", then this would first create the literal filename
>> "MAPPING1.new", followed by renaming it to the original "invalid" filename
>> "X".
>>
>> (In particular, though, I think it should not be possible to create *new*
>> garbage filenames on people's systems, that didn't exist to begin with.
>> That is, it should map to the filenames that really exist, not just string
>> escaping.)
>>
>> This is complex, though, and leads to new questions, like how long the
>> mappings last if the underlying file is deleted.  As a data point, note
>> that most Windows applications are unable to access files whose filenames
>> can't be represented in the current ANSI codepage.  That is, if you're on a
>> US English system, you can't access filenames with Japanese in them.
>> (Unicode applications can, but tons of applications in Windows aren't
>> Unicode; Windows has never made it simple to support Unicode.)  If users
>> find that reasonable, it might not be worth all this for the even rarer
>> case of illegal codepoints in Linux.
>>
>> Yup, writing side would have tougher issues, and that's why I started
>> > this proposal only with read-only scenarios.  (I agree that it'd be
>> > good to give another thought about unsandboxed writing cases though)
>> >
>>
>> For what it's worth, I think the only sane approach here is an isolated
>> break from attempting to make everything interoperable, and allow the
>> platform's limitations to be visible.  (That is, fail file creation if the
>> path depth or filename length is too long on the platform; succeed with
>> file creation even if it would fail on a different platform, and so on.)  I
>> think this is just inherent to allowing this sort of access to real
>> filesystems, and trying to avoid it just causes other, stranger problems.
>>
>> (For example, if you prevent creating filenames in Linux which are illegal
>> in Windows, then things get strange if an "illegal" filename already exists
>> on a filesystem where it's not actually disallowed.)
>>
>>
>>
>> On Wed, Nov 16, 2011 at 12:01 PM, Eric U <ericu at google.com> wrote:
>>
>> >  While the URL format for non-sandboxed files has yet to be worked out,
>> > I think we need toURL to work no matter where the file comes from.
>> > It's already the case that an Entry can expire if the underlying file
>> > is deleted or moved;
>>
>>
>> But there's no revocation mechanism for toURL URLs.
>>
>> Also, if toURL URLs to non-sandboxed storage expires with the context it
>> was created in (which it would have to, I think), it loses a whole category
>> of use cases covered by structured clone: the ability to persist an access
>> token.  For example, the spec allows storing a File within a History
>> state.  That allows history navigation to restore its state properly: if
>> the user opened a local picture into an image viewer app, navigating
>> through history can correctly show the files in older history states, and
>> even restore correctly through browser restarts and session restores.  The
>> same should apply to Entry and DirectoryEntry.
>>
>> (Nobody implements this yet, as far as I know, but I hope it'll happen
>> eventually.  It's a limitation today, and it'll become a more annoying one
>> as local file access mechanisms like this one are fleshed out.)
>>
>> Also, if non-sandboxed toURL URLs are same-origin only, then that also
>> loses functionality that structured cloning allows: using Web Messaging to
>> pass an access token to a page with a different origin.  (This is much
>> safer than allowing cross-origin use of the URLs, since it's far easier to
>> accidentally expose a URL string than to accidentally transfer an object.)
>>
>> File API has already solved all of this by using structured clone.  I think
>> it makes a lot of sense to follow its lead.
>>
>> --
>> Glenn Maynard
>>
>