[whatwg] onclose events for MessagePort

Fri Dec 6 17:22:35 PST 2013

On Tue, 1 Oct 2013, Ehsan Akhgari wrote:
> On Tue, Oct 1, 2013 at 1:45 PM, Ian Hickson <ian at hixie.ch> wrote:
> > On Tue, 1 Oct 2013, Boris Zbarsky wrote:
> > > On 10/1/13 12:58 PM, Ian Hickson wrote:
> > > > If the browser crashes, it's not going to be able to send messages 
> > > > anyway
> > >
> > > This concept of "the browser"...
> > >
> > > The situation being considered here is that you have two web pages 
> > > in two different OS-level processes that have a MessageChannel 
> > > between them.  One of the OS-level processes crashes (and hence 
> > > obviously can't send any more messages, as you note).  The other one 
> > > would like to know that its counterparty is gone.  How should it 
> > > find out?  That's the problem I think Ehsan is trying to solve.
> >
> > Crashing is non-conforming. Having features explicitly to handle the 
> > non-conforming case is a bit weird. :-)
> 
> Well, I'm not talking about content intentionally crashing.  Out of 
> memory crashes are an example of the kind of crash which you can't 
> really prevent.

Theoretically you can, but point taken.

I've added a feature to MessagePort that causes an 'error' event to be 
fired at a MessagePort if its other side is killed prematurely.

It doesn't fire when a port's owner document is navigated away from, 
because that would prevent the page from being bfcached, which we try hard 
to avoid. For pages it's reasonably easy to just hook into the document's 
onunload handler and send a message to all ports then.

It doesn't fire when a port's owner worker terminates, because it seems 
weird to fire it sometimes but not others when the owner closes. For 
dedicated workers there is unfortunately no unload handler, because that 
would similarly prevent the bfcache... Not sure what the right thing to do 
is there. We don't want to expose the bfcache eviction model either.

It doesn't fire for port.close() because that would let you probe GC 
behaviour, and you can easily just send a message first anyway.

> > How often are we really expecting one tab to be talking to another 
> > tab, and then that tab crashes, and the author was able to prepare 
> > code to handle that gracefully, in a production environment?
> 
> That is not the only use case.  Another use case which is currently more 
> interesting for us in Firefox OS is to know when the connection to 
> another application such as the music player is dead because the other 
> side suffered an OOM crash for example, so that we can update the UI 
> letting the user know that the music is no longer playing.

I understand that in practice this does occur, but it still seems like the 
right solution would be for the music app here to gracefully report that 
it has been asked by the operating system to shut down, and to then do so. 
And even then, that's presumably not what the user really wanted...

On Thu, 10 Oct 2013, David Barrett-Kahn wrote:
>
> This is a feature we've long wanted for Google Docs, but not for the 
> most obvious reason.  We have a situation where more than one tab can be 
> visualizing the same document.  Under those conditions, we have a 
> requirement that one of the documents hold a lock which entitles it to 
> do things with the locally stored version of the document which other 
> tabs and workers in the system cannot.  Implementing this locking system 
> is painful and inefficient.  One of the main reasons is that it's 
> impossible to get a notification in the shared worker (where the lock 
> 'lives') when the lock-holding tab has closed.  We would use a message 
> port onclose event for this, reducing the complexity of our current 
> system (which involves polling loops and other very undesirable things) 
> tremendously.

This feature won't help you there, since this is a case where the port is 
going away gracefully.

Is there any reason you can't just use onunload handlers to send a message 
to the shared worker when the page goes away?

> Generally speaking it's just a disappointing moment when you discover 
> message ports don't have this capability and you have to jury rig an 
> unreliable workaround, like so many times before.  A communications 
> channel with no 'is connected' status on it is just... not finished.

It originally did have such a mechanism, actually, but we removed it to 
avoid exposing GC. So I guess it was "definished". :-)

> Why is revealing when garbage collection happens such a terrible thing 
> anyway?  Java does it...

It dramatically reduces the ability for multiple implementations to have 
widely varying GC models.

On Thu, 10 Oct 2013, David Barrett-Kahn wrote:
>
> On GC being a source of cross-browser difficulty: I think you can fix 
> that by stating in the messageport spec when we guarantee to implicitly 
> close the connection (when its host page closes) and when we provide no 
> guarantees (when it loses all its references).

We don't guarantee it when the host page is navigated (assuming that's 
what you mean by "closes"), because the user could go back to the page, in 
which case, if the page didn't use onunload="" and various other things, 
it's possible for the page to just be reopenend unmodified, with the 
literally same live DOM and for the ports to continue to work.

> On people relying on GC timing: Those people are being silly and deserve 
> what they get, as they do in Java.  Using destructors in that language 
> is very nearly always a bad idea, but they still put them there and it 
> was fine.

Yeah. Unfortunately, that line of reasoning doesn't work on the Web. We're 
not avoiding exposing GC behaviour because we want to prevent authors from 
shooting themselves in the foot; we're avoiding exposing GC behaviour 
because wo want to prevent authors from shooting _us_ in the foot. :-)

If we expose GC, and a browser gets popular, it's likely that pages will 
unwittingly end up dependent on that browser's GC model, and then all 
other browsers get stuck implementing that GC model.

This is the same reason why all browsers now implement the same crazy 
parsing with all its wacky quirks that everyone says makes us look dumb.

> I guess I think people who misinterpret the spec and do things which are 
> obviously a bad idea are only to a limited extent our problem.

It's not them who are our problem, it's their popular pages later.

> The web needs to become a place where serious, large applications can be 
> written, it's not going to get there if the standard we set ourselves 
> for the APIs is "they can't possibly misuse this even if they've read no 
> documentation and are just guessing".

The Web will also not become a place where "serious, large applications" 
can be written if we end up constrained along every axis, unfortunately. 
(For example, if we can no longer innovate the GC model.)

On Wed, 9 Oct 2013, Ehsan Akhgari wrote:
> 
> So with that behind us, how about we add an explicit event to be fired 
> when the other side of a message channel gets destroyed in a 
> catastrophic way which is not observable from the web content code 
> running on that side, such as a process crash for example?

The spec now fires an 'error' event in this case.

> The basic idea behind why this more restricted proposal is useful is 
> that barring the catastrophic failure case, applications can detect the 
> other cases why further communication may be impossible (such as 
> navigating away from the page) themselves and notify the other side of 
> the channel as desired -- it is only the catastrophic termination case 
> which is not detectable from content script.

The case of workers is pretty hard to handle well (without multiple ports 
just for this purpose) as well, unfortunately, but I don't see a solution.

On Wed, 9 Oct 2013, Jonas Sicking wrote:
> 
> But is there a reason that we couldn't also fire the event if the other 
> side is forcefully terminated through a navigation or a 
> Worker.terminate() call?

It would either prevent bfcache usage, or, if we did it only when the 
worker as reaped while in the bfcache, it would expose the bfcache 
eviction model (and likely wouldn't be sufficiently reliable anyway).

On Thu, 10 Oct 2013, Andrew Wilson wrote:
> 
> I still have the concerns I expressed earlier about figuring out who the 
> owner is of the port in the case where you've passed a reference around 
> to multiple contexts.

The spec defines the owner in (hopefully) rigorous detail.

> What does "other side is forcefully terminated" mean in the case where 
> you may have multiple iframes with references to the same port?
> 
> i.e. if my iframe does this:
> 
> channel = new MessageChannel();
> window.parent.port = channel.port1;
> sharedWorker.port.postMessage("port", [channel.port2]);
> window.location.href = "<some other url>"
> 
> What happens?

The owner of port1 is the window that was navigated (and it is now in the 
bfcache, probably).
The owner of port2's clone is the shared worker.

> Does the sharedWorker get channeldropped on it's cloned port?

No messages or events are sent in this example other than the explicit 
one (and the events around navigation, like pagehide).

> I suspect this would be confusing to developers, who might otherwise 
> expect that merely handing a reference to port to its same-origin parent 
> would be sufficient to keep it alive.

I agree that it's not intuitive, but you need an explicit owner to 
determine which Document is responsible for the tasks that are sent on the 
event loop. Otherwise, if a page had a port and the page was navigated 
away from and then returned to, it might have missed all the events, or 
the events might have been fired while the scripts in that document 
weren't able to run properly (since the document isn't fully active).

On Thu, 10 Oct 2013, Ehsan Akhgari wrote:
> 
> The current spec doesn't mention what happens in the case of navigation 
> in the owner for a port as far as I can tell.  But I consider that a bug 
> in the spec -- navigation _should_ disentangle ports.

That would break the bfcache.

On Thu, 10 Oct 2013, Jonas Sicking wrote:
> 
> While technically possible for a webpage to handle ports that were 
> passed to a worker and send a signal before the worker is terminated, it 
> is really hard.
> 
> First off it means that you have to create a separate MessageChannel 
> just for the close-signal. You can't get the worker to to send the 
> message without first finishing both the currently running task, and 
> also processing all the tasks on the workers task queue. This would 
> defeat the whole purpose of terminate(). So you need to keep a separate 
> channel specifically to send the close message.
> 
> Second, you need to track all the ports that are own by a specific 
> worker so that you know which channels to send a close message for.
> 
> Third, since the close message comes from a separate channel than other 
> messages, it means that you have to deal with races. When you get a 
> message from the separate channel that the main channel is dying, there 
> might still be message in the pipe for the main channel. But there is no 
> way to know when you got the last one. Timeouts are probably the only 
> way, and that's obviously racy/slow.
> 
> In short: The pain! It is burning!

Yeah, I agree. Not sure what to do about it though.

> It occurs to me that all of the proposals here does expose some amount 
> of GC behavior. Even a "channeldropped" message which is sent only when 
> the other side crashes exposes GC behavior. If GC happens to run before 
> the crash and then collect the MessageChannel ports, then no channel 
> exists at the time of crash, and thus no event is sent. However if the 
> GC runs later, or if it doesn't successfully collect the MessageChannel 
> ports, then the "channeldropped" event does fire.

One would hope, however, that crashes can't be caused deterministically, 
and thus this doesn't actually let you dependen on GC behaviour. If you 
could, then the only way that this GC exposure would lead to a constraint 
compatibility state is if the page also depended on triggering a crash to 
work correctly. There's crazy stuff on the Web, but that would be pushing 
the envelope on crazy. I'm also pretty confident that browser vendors 
would be significantly more willing to break compatibility with a page 
that relied on triggering crashes in a specific way and the resulting GC 
behaviour being exposed in a specific way.

> One solution which I think would never expose GC behavior is to simply 
> have a property on the MessagePort which indicates if the owner of the 
> other side has been killed or navigated away from. No event would fire 
> as that property changes value.
>
> Since it's a property, it can only be read if a reference to the 
> MessagePort is being held. As long as such a reference exists neither 
> side of the MessageChannel can be GCed.

Well, we could make the 'error' event handler also block GC, but that 
seems like it would be unfortunate. It would be a high memory cost for a 
hopefully very rarely-triggered feature.

(If we don't do that, then it's just a property with polling, and a 
property with polling isn't really any better than an event, as far as 
hiding GC behaviour goes. A property with polling is essentially how 
PortCollection exposes GC behaviour.)

On Fri, 11 Oct 2013, Andrew Wilson wrote:
> 
> Interesting. Section 5.3.1 of the MessagePort spec

(BTW, I really recommend using the WHATWG HTML spec rather than the 
MessagePort spec, especially the version on the TR/ page. The W3C version 
on the TR/ page is about 18 months old, which in this space is _ancient_.)

> states that the ports should only be GC'd if there are no references to 
> either side and if there are no pending messages for either port 
> (basically, meaning that neither port is reachable), but this 
> "channeldropped" API provides a new way of access.

In what sense?

If the port is GC'ed before the other side is destroyed, then you miss the 
message, but so what?

If the port is not GC'ed before the other side is destroyed, but could 
have been, then it exposes GC behaviour, but that seems like a very minor 
concern in this case, as noted in my response to Jonas above.

If the port is not GC'ed before the other side is destroyed, and could not 
have been, then it's the same as if you had a reference, so it seems like 
a non-issue.

On Fri, 11 Oct 2013, Anne van Kesteren wrote:
> On Fri, Oct 11, 2013 at 9:38 AM, Andrew Wilson <atwilson at google.com> 
> wrote:
> > *"or while there exists an event listener on either port for the 
> > channeldropped event."*
> 
> Once you do that you basically rely on the developer to handle GC and 
> you'll end up with memory leaks instead.

Yeah.

On Fri, 11 Oct 2013, Andrew Wilson wrote:
>
> Agreed. I'm just pointing out that language/behavior like this is 
> basically required if you're going to support a channeldropped event 
> that can be spontaneously generated even on ports that have no live 
> external references.

Presumably the only case that this can matter is if the event is 
dispatched. If it gets GC'ed before then, then it didn't matter. I've made 
sure the spec says that while the event (any event, actually) is queued, 
the object can't go away.

On Fri, 18 Oct 2013, Jonas Sicking wrote:
> 
> I thought the proposal was to not fire "channeldropped" when the channel 
> is GCed. Thus allowing channels with both "message" and "channeldropped" 
> event listeners on either side to still be GCed. Is that correct?

It's what the spec says.

> If so, that exposes GC behavior. If at some point both pages that hold 
> on to an endpoint of a message channel drop their references the channel 
> can get GCed. If it is GCed no events fire.
> 
> However if the page holding on to either port crashes before the GC 
> happens, then the "channeldropped" event is fired on the other port.
> 
> Hence the timing of the GC affects whether "channeldropped" is fired. 
> Hence GC behavior is exposed.

The GC behaviour is exposed _if one of the sides crashes_. As noted above, 
that's a case where even if we do expose GC behaviour, we're not likely to 
really constrain ourselves.

On Mon, 21 Oct 2013, Andrew Wilson wrote:
>
> Makes sense, although I'm a bit fuzzy about the rules around 
> MessagePorts and window navigation (for example, if I navigate a window, 
> is all content in that window now shutdown/discarded, even though I 
> could in theory get back to the window by immediately clicking "back")?

No, nothing is discarded, because of, as you say, the bfcache.

On Mon, 21 Oct 2013, Ehsan Akhgari wrote:
> 
> I think we may need to mandate that a "channeldropped" eventis fired 
> when you register a handler on a port with the other side having already 
> crashed.

That would be very weird behaviour for an event.

But as designed, I think it works ok to just always hook the listener if 
you need it, since it doesn't prevent GC. So this is probably a non-issue.

On Tue, 22 Oct 2013, Jonas Sicking wrote:
>
> So we could have:
> 
> interface MessagePort {
>   ...
>   Promise pin();
>   void unpin(optional any value);
> };
> 
> Rather than firing channeldropped we reject any promise returned from 
> pin(). Once the caller receives an expected answer he/she calls unpin() 
> which resolves the promise using whatever value is passed in and so the 
> port becomes GCable again.
> 
> When pin() is called again after the unpin call we create a new promise 
> which again prevents the port from getting GCed.
> 
> We could even expose a failAndUnpin function which rejects the promise. 
> This could be useful to enable the page to implement timeouts etc.

This seems like a rather elaborate API that's easy to misuse...

On Tue, 22 Oct 2013, Jonas Sicking wrote:
> 
> As the API stands in the proposal above you could write code like:
> 
> port.postMessage({ doStuff: "using-this-data" });
> port.onmessage = e => { port.unpin(e.data); };
> port.pin().then(d => doAsync(d)).then(...);
> 
> Which is great.

I dunno how great it is. It's not hugehly readable. I can't tell what the 
heck it's doing. :-)

On Wed, 2 Oct 2013, Andrew Wilson wrote:
> 
> I don't have any objections to adding some kind of close event to detect 
> cases where an owner goes away

I haven't done this. If this is something we do want to do, then I can 
spec it (the spec in fact used to do something like it), but it wasn't 
clear to me from this thread that this was the immediate need, and given 
the risks involved, I wanted to avoid scope creep.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'