[whatwg] multipart/form-data filename encoding: unicode and special characters
julian.reschke at gmx.de
Wed May 2 04:43:25 PDT 2012
On 2012-05-02 13:05, Evan Jones wrote:
> On May 1, 2012, at 22:38 , Ashley Sheridan wrote:
>> The Webkit method looks the better of the two with regards to how
>> server-side languages might interpret it, but it would need work to
>> ensure everything that should be escaped is, and that everything that is
>> unescaped on the server should be and is done so correctly.
> The problem is that currently I am unable to correctly "round trip" an uploaded file name. I would like users to upload a file, and be able to later download the file with the *exact same* file name. If you follow the specifications, this is not possible. Firefox is closer to the MIME RFCs (which specifies backslash quoting in quoted-strings), but apparently that will break IE6, 7, and 8:
> Webkit's %-escaping behaviour is *not* part of the referenced MIME RFCs (which specifies either backslash quoting in quoted-strings, base64 encoding, or %-escaping in special "filename*=" arguments). Thus, if this is the "right answer," it should be specified somewhere. I'm assuming that this needs to be in the HTML5 spec, since HTTP calls this the "body" of the the POST and declares that it is outside the HTTP specification.
> Webkit's escaping is also flawed (see bug 62107 above). Files with that contain %-escapes (eg. my%22file.txt, admittedly very rare) will get mangled, because there is no difference between my%22file.txt and my"file.txt.
> Currently, I need to detect the browser in order to figure out what kind of unescaping to apply to the file name, and even then in some cases I can't figure out what the right file name is. Webkit claims this is a specification bug, so I'm hoping someone here might tell me if this is the case, and if so where can I file bugs, create test cases, etc?
I did spend a considerable amount of time with Content-Disposition, the
*response* header field (resulting in RFC 6266 and
However, this has little to do with the representation in form uploads.
If browser implementers want to try something new that will not affect
the old code paths, supporting the encoding defined in RFC 5987 might be
the right thing to do (yes, it's ugly, but it's unambiguous).
Best regards, Julian
More information about the whatwg