[whatwg] Web Address and its escape
NARUSE, Yui
naruse at airemix.jp
Wed Sep 9 07:33:39 PDT 2009
Anne van Kesteren wrote:
> On Tue, 08 Sep 2009 21:40:22 +0200, NARUSE, Yui <naruse at airemix.jp> wrote:
>> First is about 4.10.16.4 URL-encoded form data.
>> http://www.whatwg.org/specs/web-apps/current-work/#application/x-www-form-urlencoded-encoding-algorithm
>>
>>
>> In this algorithm at 6.2.1,
>> "SP, *, -, ., 0 .. 9, A .. Z, _, a .. z" is not escaped.
>> But many other specs which use application/x-www-form-urlencoded refers
>
> Which other specifications?
Following specifications. (sorry some of them are earlier RFC)
XForms 1.0
http://www.w3.org/TR/xforms/#serialize-urlencode
"then non-ASCII and reserved characters (as defined by [RFC 2396] as
amended by subsequent documents in the IETF track) are escaped"
-> so RFC3986
HTML 4
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
"reserved characters are escaped as described in [RFC1738]"
RFC1738 http://www.faqs.org/rfcs/rfc1738.html
unreserved = alpha | digit | safe | extra
safe = "$" | "-" | "_" | "." | "+"
extra = "!" | "*" | "'" | "(" | ")" | ","
TAG Finding
"refer to section 2.1 of [RFC2396]."
http://www.w3.org/2001/tag/doc/whenToUseGet.html#i18n
RFC2396 http://www.faqs.org/rfcs/rfc2396.html
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
WSDL 2.0
http://www.w3.org/TR/wsdl20-bindings/#_http_x-www-form-urlencoded
"Replacement values falling outside the range (ALPHA and DIGIT below are defined
as per [IETF RFC 4234]): ALPHA | DIGIT | "-" | "." | "_" | "~" | "!" |
"$" | "&" | "'" | "(" | ")" | "*" | "+" | "," | ";" | "=" | ":" | "@",
MUST be percent-encoded."
>> URI's unreserved. And it in RFC3986 is
>> unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
>> Why ~ is escaped and * is not escaped?
>
> What do browsers do?
IE8
QUERY_STRING: t=+%21%5C%22%5C%23%24%25%26%27%28%29*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F at ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
not escaped: *-. at _
Firefox 3.5
QUERY_STRING: t=+%21%5C%22%5C%23%24%25%26%27%28%29*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
not escaped: *-._
Chrome2
QUERY_STRING: t=+%21%5C%22%5C%23%24%25%26%27%28%29*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
not escaped: *-._
Opera9
QUERY_STRING: t=+%21%5C%22%5C%23%24%25%26%27%28%29%2A%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
not escaped: -._
Hmm, Firefox and Chrome follow this, IE adds @, Opera removes *.
If this spec use safer side, * may be also escaped.
>> Third is about Web addresses in HTML 5. (this spec is also this ML?)
>> http://www.w3.org/html/wg/href/draft
>
> You want public-iri at w3.org or public-html at w3.org for that draft.
Thanks, I'll send it.
--
NARUSE, Yui <naruse at airemix.jp>
More information about the whatwg
mailing list