[whatwg] Fakepath revisited

Mon Sep 7 11:44:39 PDT 2009

Oops... the following was meant to be a "reply to all" but I hit
"reply" instead; so here it goes a copy for the list:

On Mon, Sep 7, 2009 at 8:43 PM, Eduard Pascual<herenvardo at gmail.com> wrote:
> On Mon, Sep 7, 2009 at 5:10 PM, Tab Atkins Jr. <jackalmage at gmail.com> wrote:
>>
>> On Mon, Sep 7, 2009 at 3:24 AM, Alex Henrie<alexhenrie24 at gmail.com> wrote:
>> > Expecting developers to hack out a substring at all will only lead to
>> > more bad designs. For example, Linux and Mac OS allow filenames to
>> > contain backslashes. So if the filename was "up\load.txt" then
>> > foo.value would be "C:\fakepath\up\load.txt" which could easily be
>> > mistaken for "load.txt". Fakepath will actually encourage developers
>> > to fall into this trap, which just goes to show that it is not a
>> > perfect solution.
>>
>> Well, no, not really.  If they're hacking out a substring, they'll
>> *hack out a substring*, since the prefix is of a known fixed length.
>> Just lop off the first 12 characters, and whatever's left is your
>> filename.  Splitting on "\" is just plain silly in this instance.
>>
>> ~TJ
>
> That wouldn't work.
> There is an important ammount of browsers that include the full path
> in the value. So web authors would need to know *a lot* of guesswork
> if they are to hack a substring from such value. They have to figure
> out whether they'll be getting a plain file name, a file name with a
> path, or a fakepath, and treat each case separately. If a site tries
> to just substring(12), it will break on any non-HTML5 browser (except
> on the corner case where the value contains a full path and it is
> exactly 12 characters long). If they try to split on \, they will
> break when a file on a non-Windows system contains that character.
>
> To put things on a more obvious shape, imagine the following scenarios:
>
> A file named "up\load.txt" (on a non-Windows OS) is given from an
> HTML5 browser. We get a value="C:\fakepath\up\load.txt".
> A file named "load.txt", and located at "C:\fakepath\up\" from a
> browser that includes full path. We get a
> value="C:\fakepath\up\load.txt".
> Two different file-names end up yielding the same value string. So,
> basically, it is impossible to reliably recover the name of the file
> from only the "value" string: there will be ambiguous cases. While the
> examples above may seem corner cases, they are just intended to show
> off the ambiguity issue.
>
> Ok, so some (horribly-designed) sites break without fakepath. Since
> the HTML5 spec likes so much to include explicit algorythms, is there
> any reliably algorythm that web authors can use to recover the actual
> filename? (Without having to assume that everybody switches
> immediatelly to HTML5-compliant browsers, of course.) If there isn't,
> then every other site (including all the decently-designed ones) that
> need/use the filename would break. What would be the point to keep
> compatibility with some bad-sites if it would break many good sites?
>
> Regards,
> Eduard Pascual
>