[whatwg] Navigation and history traversal issues

Tue Sep 18 17:18:11 PDT 2012

On Tue, 12 Jun 2012, James Graham wrote:
>
> In particular, what stops such navigations from re-triggering the unload 
> handler, and thus starting yet another navigation?

I've updated the spec to have guards in place for 'pagehide' and 'unload'.

(Not yet 'beforeunload'. Should we do that too?)

> It looks like the spec tries to make a distinction between navigations 
> that are cross-origin and those that are not (step 4 in the "navigating 
> across documents" algorithm); I'm not sure why this inconsistency is 
> desirable rather than using the cross-origin approach always.
> 
> Based on some tests ([1]-[5]), it seems that WebKit seems to cancel the 
> navigation in the unload handler always, Opera seems to always carry out 
> the navigation in the unload handler, and Gecko seems to follow WebKit 
> in the cross-origin case and Opera in the same-origin case. In all cases 
> the unload handler is only called once.
> 
> [1] http://hoppipolla.co.uk/tests/navigation/003.html
> [2] http://hoppipolla.co.uk/tests/navigation/004.html
> [3] http://hoppipolla.co.uk/tests/navigation/005.html
> [4] http://hoppipolla.co.uk/tests/navigation/006.html
> [5] http://hoppipolla.co.uk/tests/navigation/007.html

On Tue, 12 Jun 2012, Boris Zbarsky wrote:
> 
> For what it's worth, we initially tried to do what you say WebKit does 
> but ran into web compat issues.  See 
> https://bugzilla.mozilla.org/show_bug.cgi?id=371360 for the original bug 
> where we blocked all navigation during unload and 
> https://bugzilla.mozilla.org/show_bug.cgi?id=409888 for the bug where we 
> changed to the current behavior.  I believe the spec says what it says 
> based on our implementation experience here...

Yeah, the spec's behaviour is intentional here. The error in the spec was 
just that it still fired unload again. I've fixed that.

On Wed, 13 Jun 2012, James Graham wrote:
> 
> That seems to be true. On the other hand it appears that gecko will 
> still respect navigation from unload even if the unload was triggered by 
> explicit user interaction (e.g. by editing the address bar), as long as 
> all the origins match, so you can end up at a different page to the one 
> you expected. That is very surprising behaviour (although I see that you 
> can argue that it is possible in other ways).

When it's same origin, you really have no way to know what's going on. The 
page could trivially pushState() a continuously changing URL, for example, 
and could serve random files from the server for any URL.

On Thu, 14 Jun 2012, James Graham wrote:
> On 06/13/2012 11:18 PM, Ian Hickson wrote:
> > On Fri, 20 Apr 2012, Henri Sivonen wrote:
> > > > 
> > > > * Should window.stop() really not abort the parser like the spec 
> > > > seems to suggest?
> > > 
> > > Looks like Opera is alone with the non-aborting behavior. The spec 
> > > is wrong.
> > 
> > Can you elaborate on this? How can you tell?
> 
> I presume the TC is something like
> 
> <!doctype html>
> Before stop
> <script>
> window.stop()
> </script>
> After stop
> 
> Only Opera displays "after stop" here. We are planning to change this 
> behaviour, so that window.stop is much more like the "abort the 
> document" (I haven't yet closely studied how this interacts with the 
> readystate and other things that Henri has been looking at).

The spec now clearly requires the parser-stopping behaviour.

See also this bug where I'm tracking an issue with the word "cancel":
   https://www.w3.org/Bugs/Public/show_bug.cgi?id=16801

On Fri, 15 Jun 2012, James Graham wrote:
> 
> FWIW I think the conceptually simplest solution here is for aborting the 
> document to go through "The End", so that defer scripts are run, 
> DOMContentLoaded and load events fire, and the readyState changes in the 
> normal way. This isn't quite like the behaviour of Gecko or WebKit 
> today, but is spec-wise easy to understand, and hopefully no one is 
> relying too much on specific behaviour of window.stop().

Aborting a document happens for many reasons other than stop(). For 
example, document.open(), navigation, the user hitting "STOP", going 
back() in history, etc. In particular, "The End" can block on network, so 
we definitely don't want to require that UAs do that when you close a tab, 
for example.

On Wed, 15 Aug 2012, Glenn Maynard wrote:
>
> Should this alert on initial load?
> 
> <!doctype html><body onpopstate="alert('xxx')">
> 
> [1] says "After creating the Document object, but before any script 
> execution, certainly before the parser stops, the user agent must update 
> the session history with the new page."  That invokes [2] "update the 
> session history with the new page", which invokes [3] "Traverse the 
> history to the new entry", which fires popstate in step 14.
> 
> However, "After creating the Document object, but before any script 
> execution" seems like it could happen before or after the <body> element 
> has been parsed, so the alert may or may not happen.

Yeah, this is an oversight as specced. Fixed.

On Sun, 16 Sep 2012, Justin Lebar wrote:
>
> Suppose an attack page evil.html controls a separate frame F (e.g. 
> evil.html frames F, evil.html opened F as a popup window, or vice 
> versa).
> 
> We discovered that if evil.html causes F to
> 
>   1. load a.html
>   2. start loading b.html
>   3. load a.html#h
> 
> then step (3) cannot cancel the load of b.html.  That is, the final
> session history from this sequence must be either
> 
>   a.html  <-- oldest
>   a.html#h
>   b.html  <-- current
> 
> or
> 
>   a.html <-- oldest
>   b.html <-- current.
> 
> All browsers I tested gave one of the above two results.
>
> Doing anything else breaks the web (we shipped this in Firefox Nightly 
> and people were unable to log into ingdirect.com, for example).  I 
> didn't investigate too thoroughly, but I believe what happens is, some 
> sites use a link with href "#" and then navigate themselves in the 
> link's onclick handler, without cancelling the click event.  In that 
> case, we do precisely steps 1-3 above.
>
> As I read the spec, browsers are supposed to cancel the load of b.html 
> in step 3 above.  In the navigation algorithm [1], step 6 explicitly 
> cancels the load of b.html, because the load of b.html has not matured.  
> So if I understand correctly, the spec is dictating behavior that we 
> know won't work and that no browser implements.
> 
> The presence of steps 6 and 8 in the algorithm suggest that the spec is 
> already trying to walk this line, so maybe I misunderstand what's going 
> on, either in my tests or in the spec.

The existing text in the spec step 4 is attempting to prevent a page from 
having you click on a link to <a href="http://paypal.com/"> and in the 
unload change that to a location.href="http://paypa1.com/" navigation, or 
something similar but with the user typing in the location bar and the 
page hijacking that navigation.

If it turns out that you can't ever block a cross-origin navigation, 
though, that's a lot easier to fix. :-)

It's not that simple though. Browsers agree on this page that we should go 
to the second of the two cross-origin navigations (replace "false" with 
"1" in the script to run the test):

   http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1778

This one too (frame nav):

   http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1780

So this is presumably specific to fragment identifiers. And sure enough, 
when we change the latter one above to changing to a fragment identifier, 
it works as you describe:

   http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1782

(Things aren't so simple in this example (same-page nav):

   http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1784

...where Firefox no longer exhibits the restraint we're looking for here, 
but Chrome and Opera still do.)

Anyway, yeah, looks like step 6 is just bogus. I've removed it. This now 
means that fragment identifier navigations just happen without screwing 
around with ongoing loads.

> == Issue #2 ==
> 
> Suppose again that evil.com controls a frame F, and evil.com causes F to
> 
>   1. load a.html
>   2. load a.html#h
>   3. start loading b.html
>   4. go back
> 
> When we go back, we traverse the history [2] from a.html#h to a.html.
> Per the spec, this doesn't cancel the load of b.html.
> 
> This caused a problem for us in Firefox because we create a session 
> history entry for b.html at the beginning of step 3 and insert it after 
> the current one.  Then, when the load of b.html completes, we use 
> whichever session history entry happens to be after the current one, 
> assuming that it was the session history entry we created earlier. [...]
> 
> The fix for this bug is not as simple as merely ensuring that the 
> session history entry's URL matches the document's URL.  Due to hash 
> navigations and pushstate, these URLs may not match even when we're 
> behaving correctly.
> 
> We fixed this bug by cancelling the load of b.html when you go back. 
> This matches Chrome's behavior in my tests [3].
> 
> Notice that this means we're cancelling an outstanding network load due 
> to a synchronous same-document load, which I said in part 1 breaks the 
> web.  But based on the (lack of) feedback we've received from our test 
> audience, it seems that cancelling the load of b.html does /not/ break 
> the web if the navigation from a.html to a.html#h is a history 
> navigation.
> 
> The right thing to do is probably to load b.html after a.html, so the 
> final session history is
> 
>   a.html <-- oldest
>   b.html <-- current.
> 
> I /think/ this is what the spec says should happen, but I'm not sure. 
> But matching the spec here would be difficult in our current 
> architecture, and anyway wouldn't match the one other browser I was able 
> to test, so perhaps a spec should be changed to match.

The way the spec is written, if I'm not mistaken, you only create the new 
session history entry when you're ready to make it active. So I don't 
think the spec has the problem you ran into; as you describe, it just 
works.

However, if it doesn't match browsers, that's of little comfort.

I've changed the spec so that traversing the history by a delta always 
cancels any pending navigations unless you're in the middle of an unload, 
in which case it just aborts the algorithm entirely.

I've also made back()/forward()/go() not work during the document's unload 
handler, since that could be used for griefing. I'm tempted to disable it 
entirely for all docs a la alert(), but I've no idea if that's Web- 
compatible and I suspect not.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'