[whatwg] multipart/form-data filename encoding: unicode and special characters
evanj at csail.mit.edu
Tue May 1 18:12:36 PDT 2012
I am not an experienced web standards wonk, so please forgive me if I'm making a mistake here.
When uploading files that contain special characters in their name, it appears to me that it is unspecified as to how those file names should be escaped. As a result, Webkit/Safari/Chrome appear to handle these filenames in one way, while Firefox handles them in another. I'm implementing the server side of this equation, and it is unclear to me what I should be doing. Am I missing something? Webkit even has a bug on this issue that states "I suggest working with WHATWG or HTML WG to get something specified in HTML5, and getting browsers converge on that." Is anyone working on this?
Create a file named: bàz'\"hi%22.txt eg. using the unix command: touch bàz\'\\\"hi%22.txt
Firefox (13.0 beta on Mac) sends the following header, backslash escaping the double quote but not escaping the backslash.
Content-Disposition: form-data; name="somefile"; filename="bàz'\\"hi%22.txt"
Webkit (latest nightly r115711 on Mac): %-escapes the double quote, but does nothing to the literal %
Content-Disposition: form-data; name="somefile"; filename="bàz'\%22hi%22.txt"
THE SPECS: HTML5 states:
Encode the (now mutated) form data set using the rules described by RFC 2388. […] File names […] must use the character encoding selected above, though the precise name may be approximated if necessary (e.g. […]). User agents must not use the RFC 2231 encoding suggested by RFC 2388.
… this seems contradictory: Encode using RFC 2388, but do not using the encoding suggested by the RFC. Worse, no browser actually follows the RFC (e.g. they all use UTF-8 encoded parameter values), so that doesn't seem like the right answer. Is there a way out of this mess?
More information about the whatwg