[whatwg] Question about document.referrer (and document.URL, document.location.href) when IDN domains are in use

Ian Hickson ian at hixie.ch
Tue Sep 10 12:54:31 PDT 2013


On Fri, 12 Jul 2013, Boris Zbarsky wrote:
> On 7/12/13 2:15 PM, Ian Hickson wrote:
> > >
> > > The "document's referrer" is not really defined anywhere in a useful 
> > > way that I can find.
> >
> > What's not useful about the way it's defined? It's set to a specific 
> > string.
> 
> I couldn't find where it was normatively set to anything.

It's normatively set in two places. For the about:blank Document, it's set 
(if necessary) here:

# If the browsing context has a creator Document, then the browsing 
# context's Document's referrer must be set to the address of that creator 
# Document at the time of the browsing context's creation.
 -- http://whatwg.org/html#windows

For regular Documents, it's set in the "Creating a new Document object" 
steps:

# Set the document's referrer to the address of the resource from which 
# Request-URIs are obtained as determined when the fetch algorithm 
# obtained the resource, if that algorithm was used and determined such a 
# value; otherwise, set it to the empty string.
 -- http://whatwg.org/html#create-a-document-object

These seem like the logical places for it to be set. Am I missing 
something?


> > > In cases when the hostname is non-ASCII, the Referer header will 
> > > have it encoded in punycode.
> > 
> > Is that defined anywhere?
> 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.36 which 
> defines it syntactically as a URI, which means that if you have an IRI 
> you have to convert it to an IRI before putting it in there.

That's normatively imported by the HTML spec's "fetch" algorithm, so it's 
the case per HTML too. Specifically, HTML just says to generate the 
Referer header's value "as required by HTTP" using a particular URL as 
the input.


> > That's correct per spec (assuming the punycoding is required 
> > anywhere). The latter two are set separately than document.referrer:
> > 
> >     http://whatwg.org/html/#set-the-document's-address
> 
> The thing is, people are comparing origins from postMessage to origins 
> from document.referrer.  See 
> <https://bugzilla.mozilla.org/show_bug.cgi?id=852796#c6>.  Also see 
> <https://bugzilla.mozilla.org/show_bug.cgi?id=720331>.

Well, then they'll be broken, I guess. (They'll break safe, though.)


> Also, as a note, nothing above makes it particularly clear that "the URL 
> that was originally to be fetched" is not already punycode...  Ah, well.

It might be, depends on what the URL is.


> > If other browsers don't match this, file bugs on them. :-)
> 
> <shrug>.  It probably won't do much good, but:
> 
> http://code.google.com/p/chromium/issues/detail?id=259920&thanks=259920&ts=1373653828
> 
> https://bugs.webkit.org/show_bug.cgi?id=118611

Thanks.


On Fri, 12 Jul 2013, Adam Barth wrote:
> 
> I don't think we're likely to change this behavior.  We always use 
> punycode for URLs except in the location bar.

Why?


On Fri, 12 Jul 2013, Boris Zbarsky wrote:
> On 7/12/13 2:40 PM, Adam Barth wrote:
> > Why not change Firefox to use punycode in window.location?
> 
> If nothing else because that seems user-hostile (both to web developers 
> examining location values and users who are shown document.URL or 
> location.href in web pages).

Yeah...


On Fri, 12 Jul 2013, Anne van Kesteren wrote:
> 
> But then we shouldn't garble pathname either and we do because we have 
> to. So I'm not sure that line of reasoning makes sense. I do think we 
> should offer some kind of conversion utility between the two.

It is unfortunate that resolving URLs does that, it's true. But just 
because we're constrained here, why should we mess up domains also?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the whatwg mailing list