[whatwg] URL query component

Fri Apr 20 05:52:53 PDT 2012

On Fri, 20 Apr 2012 14:37:10 +0200, And Clover <and-py at doxdesk.com> wrote:
> On 2012-04-20 09:15, Anne van Kesteren wrote:
>> Currently browsers differ for what happens when the code point cannot  
>> be encoded.
>> What Gecko does [?%C2%A3] makes the resulting data impossible to  
>> interpret.
>> What WebKit does [?%26%23163%3B] is consistent with form submission. I  
>> like it.
>
> I do not! It makes the data impossible to recover just as Gecko does...  
> in fact worse, because at least Gecko preserves ASCII. With the WebKit  
> behaviour it becomes impossible to determine from an pure ASCII string  
> '£' whether the user really typed '€' or '£' into the input  
> field.

You have the same problem with Gecko's behavior and multi-byte encodings.  
That's actually worse, since an erroneous three byte sequence will put the  
multi-byte decoders off.

> It has the advantage of consistency with the POST behaviour, but that  
> behaviour is an unpleasant legacy hack which encourages a  
> misunderstanding of HTML-escaping that promotes XSS vulns. I would not  
> like to see it spread any further than it already has.

It's both GET and POST. So really the only difference here is manually  
constructed URLs.

Also, I think we should flag all non-utf-8 usage. This is mostly about  
deciding behavior for legacy content, which will already be broken if it  
runs into this minor edge case.

-- 
Anne van Kesteren
http://annevankesteren.nl/