[whatwg] [WF2] action="mailto:" - encoding spaces

Michael A. Puls II shadow2531 at gmail.com
Tue Dec 2 01:34:44 PST 2008

On Tue, 02 Dec 2008 02:48:15 -0500, Ian Hickson <ian at hixie.ch> wrote:

> On Wed, 29 Oct 2008, Michael A. Puls II wrote:
>> On Wed, 29 Oct 2008 03:42:17 -0400, Ian Hickson <ian at hixie.ch> wrote:
>> > On Wed, 29 Oct 2008, Michael A. Puls II wrote:
>> > >
>> > > What about the method="POST" case where the query string is kept?
>> > >
>> > > <form action="mailto:?subject=1+2" method="POST">
>> > >     <input type="text" name="body" value="1+2">
>> > >     <input type="text" name="other" value="1 2">
>> > >     <input type="submit">
>> > > </form>
>> > >
>> > > When submitting that, I expect to see:
>> > >
>> > > mailto:?subject=1%2B2&body=body%3D1%252B2%26other%3D1%25202
>> > >
>> > > submitted to the mail client.
>> > >
>> > > The current POST section seems to say that this would be submitted
>> > > instead:
>> > >
>> > > mailto:?subject=1+2&body=body%3D1%252B2%26other%3D1+2
>> > >
>> > > In other words, I think spaces in values should be emitted as %20
>> > > for POST too and in the case there's a query string present in the
>> > > action attribute for POST, any + in the hvalues of the query string
>> > > should be normalized to %2B (to be consistent with a + inside a form
>> > > control's value that gets converted to %2B)
>> >
>> > The idea is that the same thing as would be posted to an HTTP server
>> > is what is sent using the e-mail body, so I think we'd want the exact
>> > same "+" behavior as normally.
>> O.K., but in the case of the + that's in the mailto URI in the action
>> attribute, the author means a '+' and not a space (they're allowed to be
>> left in raw form in a mailto URI). If it gets sent to a server, the +
>> will be treated as a space, which is not what is intended.
> I actually can't find where it is defined that the + in an HTTP URI
> represents a space. (I can find where it says that a space is to be
> converted into a +, but not the other way around.)
> My understanding, though, is that the convention that + represents a  
> space
> is not part of the URI syntax, but part of the syntax of the format used
> to encode the data into the URI, which for HTTP URIs is generally
> application/x-www-form-urlencoded. But nothing stops this format from
> being used elsewhere, e.g. in the body of an e-mail or a POST submission.
>> The workaround is of course for the author to make sure to encode that +
>> as %2B (or never use anything but action="mailto:" even for POST). But,
>> for good measure, it seems like the UA should fix that if the + will
>> ever end up in an HTTP URI.
> I don't follow.
>> Of course right now, browsers only pass the data as a mailto URI to an
>> email program, so the + from the query string will be a + and come out
>> fine in the compose window. As for spaces in form control values coming
>> out as + (for POST) in a programs's body field, that's not as big of a
>> deal as there's no use-case to *see* any of the data *like that* anyway.
>> But it does seem incorrect to encode mailto spaces as + though.
> I don't follow.
>> However, if for POST, if everything after 'mailto:' in the action
>> attribute was dropped (like get) and all you ever had was
>> mailto:?body=encoded_stuff that was POSTed, then the spec could say that
>> the value you might see in the body field represents *HTTP* url encoded
>> data.
> We can't drop everything, because then you'd lost the Subject: line, etc.
>> Or, the spec could say that if the protocol in the action attribute is
>> mailto:, +s in the action attribute have to be encoded as %2B and spaces
>> in the action attribute have to be encoded as %20. Then, the validator
>> can catch that and the spec can say (for POST), that the body hvalue
>> that gets generated from the form represents *HTTP* form data. Then,
>> it'll be clear why +s in the value are represented as + instead of %20.
> I don't follow here either.
>> Or, if it's O.K. for a UA's URI normalizer/resolver to take
>> action="mailto:?subject=1+2 3" and normalize that to
>> "action="mailto:?subject=1%2B2%203" for use with the form's .action
>> getter, I guess that might solve it to.
> I think we may be talking at cross-purposes... which requirements in the
> spec are you referring to?

I'll try to explain more.

Consider this form:

<form action="mailto:?subject=1+2" action="POST">
    <input type="submit" value="Compose">

(which contains a valid mailto URI meaning that "1+2" should be the value of the subject)

Imagine in your browser that it supports setting the default mailto URI handler to Gmail (a web-based client that uses *http* URIs).

If you submit that form, you'd get <https://mail.google.com/mail/?compose=1&view=cm&fs=1&su=1+2>
, which if you try, you'll see emits "1 2" instead of "1+2" in the subject field.

Basically, HTML is trying to say that "+" is equal to " ", but mailto URI hnames and hvalues are not application/x-www-form-urlencoded. They're close, but have less reserved characters.

So, the browser has to convert the + to %2B before submitting (because the value will end up in an http URI, in this case) to Gmail so the correct value ends up in the subject field. (This isn't a problem for non-web-based email clients because they don't treat hnames and hvalues in mailto URIs as application/x-www-form-urlencoded).

If you specify action="mailto:?subject=1%2B2", you avoid the problem and there are ways that Gmail could avoid the problem. Yet, the problem will still be there.

So, basically, the problem is, RFC2368 says '+' is a '+' and a space is "%20". HTML says that for the value in action="", '+' is a space, "%20" is a space and "%2B" is a '+'. 

So, when you put a mailto URI in the action attribute, you have a conflict of specs. If application/x-www-form-urlencoded gets priority over RFC2368 in this case, that's fine. I just think it needs to be spelled out in the HTML spec more.

Personally, I'd suggest the UA should do this for action="mailto:":

if (form.action.search(/mailto:/i) == 0 && form.method == "post") {
    form.action = form.action.replace(/\+/g, "%2B");

, then things will come out fine in both web-based and non-web-based clients if the author of the markup didn't know they had to convert the regular mailto URI to an HTML action attribute mailto URI.


More information about the whatwg mailing list