[whatwg] Question about document.referrer (and document.URL, document.location.href) when IDN domains are in use

Thu Sep 12 02:56:32 PDT 2013

On Wed, Sep 11, 2013 at 7:21 PM, Ian Hickson <ian at hixie.ch> wrote:
> Surely the consistency of the API matching the input is more important
> than the consistency of the API _not_ matching the input...

The input will be mangled anyway. E.g. domain label separators are
normalized to ".". And all kinds of other parts of the URL undergo
normalization.

>> It means the entire URL is effectively a byte sequence.
>
> I don't know what you mean here.

No code point is higher than 7F. And given the way HTTP operates on
URLs, and we extract data from data URLs, making it a byte sequence
might not be a such a bad idea...

>> And it's very clear what the DNS lookup will be.
>
> Why do you think people care more about that than about the URL matching
> what they wrote in the markup?

It won't match that anyway.

>> And given that they keep insisting on changing what certain code points
>> map to over in IETF-land (with limited support from browser vendors :/),
>> it seems safer too.
>
> I don't understand what is safer. Surely if the punycoding step keeps
> changing, it's less safe, since it'll mean that the results will change
> without the author expecting it. If we don't punycode in the API, then the
> result will be the same regardless of the punycode step.

It depends on what you do with the result I suppose.

https://groups.google.com/a/chromium.org/forum/?fromgroups=#!topic/blink-dev/fBsVRcEOTWM
seems relevant.

http://url.spec.whatwg.org/ defines ASCII at the moment. The other
reason that is I just remembered is because ToASCII can fail and at
that point we want to return failure for the URL. I suppose we could
run ToASCII and then ToUnicode...

-- 
http://annevankesteren.nl/