[whatwg] Combining the DedicatedWorker and SharedWorker interfaces

Thu Nov 13 16:59:49 PST 2008

On Thu, 13 Nov 2008, Alexey Proskuryakov wrote:
> 
> I like that it doesn't tie Worker and MessagePort lifetimes too closely, 
> which means that it has a higher chance of being paradox-free.

Could you elaborate on this? I'm not sure I understand. What paradoxes 
might exist?

On Thu, 13 Nov 2008, Jonas Sicking wrote:
>
> The main two things that people seem to dislike in the current are
> 1. The many communication mechanisms.
> 2. Different APIs for shared and dedicated workers.
> 
> I've said before that I don't really think 1 is true. There is currently 
> one communication mechanism (postMessage/onmessage) and one connection 
> mechanism (onconnect). The communication mechanism (postMessage/ 
> onmessage) does come in two flavors though as you for shared workers and 
> dedicated workers call the functions on different objects.

I agree that there is only one real communication mechanism, and I don't 
understand the problem with that.

The onconnect mechanism is unfortunate, but I don't see any other good way 
to get the ports into the shared worker case. Certainly we don't want to 
force the dedicated worker users to have to use it, IMHO.

> 2 I think is as much of a feature as a bug. Dedicated workers are by 
> nature simpler since there is a one-to-one relationship between browsing 
> context and worker rather than a many-to-one. So by having different 
> APIs we can allow the dedicated worker API to be simpler. That said, I 
> do agree that it is unfortunate that the mechanisms are different.

Agreed, somewhat.

> So, here are some concrete proposals for a few changes we can make, and 
> comments i've heard/made about them. The changes refer to the *current 
> draft*, so please check the behavior defined there.
> 
> * Remove startConversation
> Details:
> Simply remove the startConversaion function on all interfaces where it
> is defined. Since it doesn't define any new events no other changes
> are needed.

Done, for now. We might bring it back later, but it seemed simpler to 
remove it since it was causing confusion.

> * Make the external API for shared workers that of the current dedicated worker
> Details:
> Move the postMessage/onmessage functions from the SharedWorker.port
> object to the SharedWorker object. The SharedWorker would act as a
> MessagePort that is entangled with the port that is provided to the
> SharedWorkerGlobalScope through already specified 'connect' event that
> is fired when a SharedWorker is created.
> 
> Comments:
> The result of this would be that on the outside shared workers and
> dedicated workers have exactly the same API to the outside world,
> except that dedicated workers have a terminate() function (formerly
> known as close(), changed in the latest version of the spec).

Right.

> [...] Hixie expressed some dislike about the fact that we'd end up with 
> MessagePort entangled with something that isn't a MessagePort. This can 
> result in uglyness if the MessagePort is passed out outside the 
> SharedWorker, and then passed on anywhere. A page could create a setup 
> where calling postMessage on a SharedWorker object actually resulted in 
> onmessage being called inside another window rather than inside a worker 
> global scope.

Right. There's also the problem with whether to expose .close(), start(), 
etc, on the SharedWorker object -- i.e. whether to flatten the whole 
MessagePort API in, or whether to only expose some parts -- in the latter 
case, we'd have a weird asymetry where e.g. the worker could call 
.close(), but the SharedWorker wouldn't get the close event, etc.

I really don't like this.

> I don't really think this is a big deal though, you have a very
> similar situation today where calling postMessage on a
> SharedWorker.port object can do exactly the same thing.

We could change .port to .getPort(), or .connect(), and have it return a 
new port. That would remove the artificial link between the SharedWorker 
object and the MessagePort. Would that work for people?

> * Make dedicated workers receive a 'connect' event when they are created
> Details:
> Make the internal communication API for a dedicated worker exactly
> that of what a shared worker is currently specced as. This means
>  - Once a dedicated worker is instantiated automatically fire a
>    'connect' event which contains a MessagePort object (accessible
>    through event.port).
>  - Make the Worker object entangled with this MessagePort.
>  - Remove the postMessage/onmessage functions from DedicatedWorkerGlobalScope
> 
> Comments:
> I don't feel super strongly about this. From a purely dedicated worker 
> perspective this doesn't really add any value but rather just 
> complexity. Everyone using dedicated workers will have to set up a dummy 
> function that just listens for a 'connect' event and sets a global port 
> variable. The upside is that combined with the above change it makes the 
> API for dedicated and shared workers exactly the same.

This is what we used to have. People didn't like it, so much so that we 
ended up calling a small meeting with the people who'd spoken up on it, 
and the spec changed away from this model.

The current model makes dedicated workers much simpler to author with. 
With the connect case, you end up with multiple nested lambdas, which is 
really ugly.

I thus really don't want to go back there.

> * Add a connect() method to Worker and/or SharedWorker
> There has been lots of talk about this, but I'm still confused as to
> what the exact proposals are due to lack of details. But here is my
> interpretation
> Details:
>  - Make instantiating a SharedWorker *not* fire a 'connect' event 
>    automatically.
>  - Remove the .port property from SharedWorker
>  - Remove the postMessage/onmessage functions from Worker and
>    DedicatedWorkerGlobalScope
>  - Add a onconnect property on WorkerGlobalScope
>  - Add a connect() method on AbstractWorker. The function fires a
>    'connect' even on the WorkerGlobalScope, the event has a .port
>    property which is a MessagePort. This MessagePort is entangled with
>    another MessagePort which is the value from the connect() function.
> 
> Comments:
> Compared to just doing the other above proposals I think this adds 
> needless complexion for value that I don't quite see. If you want to 
> have several 'conversations', I.e. several separate MessagePorts, with a 
> dedicated worker you can use postMessage and |new MessageChannel| (or 
> the startConversation shorthand) to accomplish that. If you want several 
> conversations with a dedicated worker you can do the same thing, or even 
> call |new SharedWorker| multiple times.

I wouldn't mind doing this just for the shared case, to further separate 
the port from the SharedWorker object, but I don't see any good reason to 
do this for the dedicated worker case.

On Thu, 13 Nov 2008, Aaron Boodman wrote:
> 
> This is true, the worst I can think of happening as a result of the API 
> Mozilla is planning no shipping would be "frustrating for developers" or 
> "frustrating for implementors" as more feature are added that don't fit 
> well.

I'm not convinced that the current API will result in that problem, and I 
think that the proposed APIs are no better, frankly.

> Here are my preference on changes, in descending order:
> 
> > * Add a connect() method to Worker and/or SharedWorker
> > There has been lots of talk about this, but I'm still confused as to
> > what the exact proposals are due to lack of details. But here is my
> > interpretation
> > Details:
> >  - Make instantiating a SharedWorker *not* fire a 'connect' event automatically.
> >  - Remove the .port property from SharedWorker
> >  - Remove the postMessage/onmessage functions from Worker and
> > DedicatedWorkerGlobalScope
> >  - Add a onconnect property on WorkerGlobalScope
> >  - Add a connect() method on AbstractWorker. The function fires a
> > 'connect' even on the WorkerGlobalScope, the event has a .port
> > property which is a MessagePort. This MessagePort is entangled with
> > another MessagePort which is the value from the connect() function.
> >
> > Comments:
> > Compared to just doing the other above proposals I think this adds
> > needless complexion for value that I don't quite see. If you want to
> > have several 'conversations', I.e. several separate MessagePorts, with
> > a dedicated worker you can use postMessage and |new MessageChannel|
> > (or the startConversation shorthand) to accomplish that. If you want
> > several conversations with a dedicated worker you can do the same
> > thing, or even call |new SharedWorker| multiple times.
> 
> I think this is the best API because it offers the most functionality 
> with the smallest area.

I don't mind doing this for SharedWorkers if people want, but I don't see 
a reason to do this for dedicated workers.

Would that work? Or is having them be the same the goal for you?

> I also like that the API for dedicated and shared workers is identical 
> because it means that one you learn to use dedicated workers, you 
> already know how to use shared workers.

I think the fact that it makes you think that is an argument _against_ 
doing this. There is a huge difference between what you'd need to do for a 
dedicated worker (one onconnect event, in this model) and what you'd need 
to do for shared workers (many connections). Misleading people into 
thinking you can take a dedicated worker and have it just work in the 
shared case is IMHO bad language design.

> If we don't make the above change, I think that we should remove
> startConversation().

Gone.

On Thu, 13 Nov 2008, Jonas Sicking wrote:
> 
> Comparing to doing the other set of changes the differences in API are 
> as follows:
> 
>  - Add a 'connect()' method on AbstractWorker which must be called every
>    time after instantiating a worker.
>  - The postMessage/onmessage functions are moved from the worker object
>    to the port object.
> 
> To me this seems like strictly a bigger API. As far as functionality goes the
> differences are as follows:
> 
>  - You have to deal with two separate objects, the port and the worker.
>  - You can create multiple communication channels with a worker by
>    calling connect() multiple times.
> 
> Only the second thing here seems like a win.

I'm not convinced it's that much of a win, either.

> And only for dedicated workers since for shared workers you can simply 
> call |new SharedWorker| multiple times if you want multiple 
> communication channels.
> 
> So it seems to me like the pros and cons fall out as:
>  Pros:
>    - Easier to create multiple connection channels to dedicated workers
>  Cons:
>    - Bigger API (an extra connect() function)
>    - More code required (an extra call to connect() required)
>    - More objects (port and worker)
> 
> To me the cons outweigh the pros here. Is there a goal with connect() 
> that I'm missing?

I agree.

> At this point I have to ask what the problem you are trying to solve is? 
> What is wrong with the current spec as is? I.e. rather than talking 
> about various proposals, maybe we need to align the goals first as we 
> might be talking past each other.

I agree that this would be useful (and necessary) to understand.

On Fri, 14 Nov 2008, Alexey Proskuryakov wrote:
> 
> For the sake of completeness, a connect/startConversation method on a 
> worker really should automatically open the receiving port - this is 
> what examples posted so far implied, and it would cause a lot of 
> aggravation if it didn't. I know I'm often forgetting to open the port 
> when writing my tests, and it's not a very easy mistake to spot.

What do you mean by "open the port"? Do you mean calling start()? If so, 
that should happen automatically when you set onmessage the first time, 
per spec.

> Besides API usability that we've already discussed back and forth 
> without reaching an agreement, I'm very much concerned about the current 
> spec being implementable in its current form. It has a lot of notions 
> and algorithms that are not correctly defined. For a randomly picked 
> example:
> 
> -----------------------------
> Each WorkerGlobalScope worker global scope has a list of the worker's 
> ports, which consists of all the MessagePort objects that are entangled 
> with another port and that have one (but only one) port owned by worker 
> global scope. This list includes all the MessagePort objects that are in 
> events pending in the queue of events, as well as the implicit 
> MessagePort in the case of dedicated workers. 
> -----------------------------
> 
> In an async processing model, there is simply no way for the receiver to 
> have a list of all objects that were posted to it - it's exactly the 
> reason for the existence of the queue that events are delivered 
> asynchronously and cannot be peeked before being delivered. For example, 
> in a multi-process implementation, these events may still be across 
> process boundary.

It actually doesn't really matter if there is something that has been 
posted but not yet received, because that is indistinguishable (as far as 
I can tell) from the case of the worker having shut down a split second 
before that object was posted.

> Also (from HTML5):
> -----------------------------
> Each MessagePort object can be entangled with another (a symmetric
> relationship).
> -----------------------------
> 
> It is not possible to have a symmetric relationship in an asynchronous 
> messaging model - we need a multi-step entagling/unentangling protocol, 
> so the relationship is necessarily asymmetric. One can't freeze another 
> process (or really, even another thread) to change something in it 
> synchronously.

The above is not a requirement, it's just a description of the concept. I 
don't think anything actually depends on it being symmetric; all the parts 
that actually entangle ports have (or, are intended to have, maybe I 
missed some) pretty well-defined synchronisation points. For example, any 
method that entangles two ports blocks until both threads are synchronised 
and entangled.

(The spec is somewhat implicit about this, but the intent is that workers 
really be implemented either as two system threads, one doing 
communication and one running the JS, or by one system thread that runs 
the JS in an interruptible fashion. In particular, doing something that 
synchronises with a worker isn't expected to have to wait for that worker 
to finish running its current JS.)

> Some instances of implied synchronous thinking can be corrected rather 
> easily, but not all of them. So, I do not really see how anyone can 
> claim implementing the spec, or even a subset of it at this point.

Do you have any specific examples of what can't be implemented?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'