[whatwg] Web Address and its escape

Ian Hickson ian at hixie.ch
Mon Sep 14 18:12:34 PDT 2009


On Wed, 9 Sep 2009, NARUSE, Yui wrote:
> 
> First is about 4.10.16.4 URL-encoded form data.
> http://www.whatwg.org/specs/web-apps/current-work/#application/x-www-form-urlencoded-encoding-algorithm
> 
> In this algorithm at 6.2.1,
> "SP, *, -, ., 0 .. 9, A .. Z, _, a .. z" is not escaped.
> But many other specs which use application/x-www-form-urlencoded refers
> URI's unreserved. And it in RFC3986 is
>    unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
> Why ~ is escaped and * is not escaped?

No idea, but that's what browsers do.


> Second is also URL-encoded form data 6.2.1.
> This says:
> > the string a U+0025 PERCENT SIGN character (%) followed by two
> > characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE
> > (9) and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z
> But hexadecimal is 0-9 A-F,
> so to "U+0046 LATIN CAPITAL LETTER F" seems right.

Oops, thanks. Fixed.


> Third is about Web addresses in HTML 5. (this spec is also this ML?)
> http://www.w3.org/html/wg/href/draft
> 
> In 2 Parsing Web addresses at 2. Percent-encode all non-URI characters in w,
> percent-encoding many characters includeing U+0025 percent sign.
> But by this spec, if a Web address w is already escaped URL,
> this process double-escape those characters.
> 
> For example, w is http://www.example.org/D%C3%BCrst,
> on step 2, w comes to be http://www.example.org/D%25C3%25BCrst.
> And on step 5, w is broken.

Please send this feedback to Larry Masinter <masinter at adobe.com>.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



More information about the whatwg mailing list