[whatwg] Workers feedback
Ian Hickson
ian at hixie.ch
Thu Nov 13 15:30:47 PST 2008
I haven't written a summary of changes because this is a rather involved
issue and I'd like everyone who has an opinion to actually read this.
I missed a few e-mails sent in the last few hours in this reply, as I
started this yesterday. I'll read and respond to those in a bit.
On Thu, 28 Aug 2008, Jonas Sicking wrote:
>
> The spec currently says:
>
> Once the WorkerGlobalScope's closing flag is set to true, the queue must
> discard anything else that would be added to it. Effectively, once the
> closing flag is true, timers stop firing, notifications for all pending
> asynchronous operations are dropped, etc.
>
> Does this mean that anything already on the queue will remain there? Or
> will it be dropped? It sounds like it will remain, but it's somewhat
> unclear.
I've added a parenthetical clarifying this.
> In general I think the three shutdown mechanisms that exist are somewhat
> messy:
>
> * Kill a worker
> * Terminate a worker
> * WorkerGlobalScope.close()
>
> It seems excessive with 3 possible shutdown sequences, but more
> importantly the differences between them seems unnecessarily big. Mostly
> for users, but to a small extent also for implementors. Currently the
> situation is as follows:
>
>
> | Abort | Processes | Fires | Fires
> | current | more | close on | close on
> | script | events | scope | tangled
> | | | | ports
> ---------------------------------------------------------------------
> WorkerGlobalScope.close() | No | Maybe[2] | Yes | Yes
> Kill a worker | Maybe[1]| Maybe[1] | Maybe[1] | No
> Terminate a worker | Yes | No | Yes | No
> ---------------------------------------------------------------------
>
> [1] Implementation dependent. Presumably depends on how much patience
> that the implementation thinks its users has.
>
> [2] Depends on if the event has been placed in the queue yet or not,
> somewhat racy.
There are other ways to kill the worker:
The worker is orphaned | Maybe[1]| Yes | Yes | No
The browser dies | Yes | No | No | No
Also, your "No" in the top-left cell is really a Maybe[1], since if the
script doesn't stop then it'll trigger the Kill algorithm.
> This seems excessively messy. The number of differences in the columns
> and the number of maybes seems bad. I propose the following:
>
> * Remove the "Kill a worker" algorithm and use "Terminate a worker"
> everywhere it is used.
I strongly disagree with that. The whole point of having a distinction is
that we don't want scripts just being killed willy-nilly when the user
navigates away from the page. Scripts in the page itself aren't
terminated, why would we want such drastic behaviour in the threads?
> * Make WorkerGlobalScope.close() not process any more events. I.e. make
> setting the 'closing flag' to true always clear out all events except a
> single close event.
Again, this seems bad as it would mean that if you navigated away from a
page that happened to use a worker, you could get data loss.
> * Always fire close on tangled ports. In many cases this will be a no-op
> since we're doing it in workers that are being closed. However if the
> port is in another window or a shared worker this might not be the case.
I thought we weren't doing this because it exposed the details of garbage
collection?
On Fri, 12 Sep 2008, Aaron Boodman wrote:
>
> * We have discussed having onerror expose runtime script errors that
> happened inside the worker. I don't think this makes sense for shared
> workers, so I propose that it be spec'd to only expose load errors.
> Script errors can still be exposed via a global onerror property inside
> the worker, and they can still be reported to the error console. I don't
> think having script errors that happened inside a worker be exposed
> outside it is that useful (load errors are useful, though).
Right now only load errors are reported.
I'll wait til the API is more stable before exposing script errors and the
like at all (whether on a global onerror or whatever). It is noted as an
XXX issues in the spec source.
On Thu, 28 Aug 2008, Jonas Sicking wrote:
>
> Why is importScripts imposing a same origin restriction? This won't
> increase security in any way since cross-origin scripts can always be
> loaded from the main thread. I think cross-site loading is fairly common
> exactly for the case that importScripts, which is loading libraries.
I don't recall the precise reason, but I seem to recall concern over
specific attack vectors are what caused us to restrict this.
> Also, the spec doesn't seem clear on what to do if compiling a script
> fails. I think some sort of exception should be thrown, probably the
> same one that is thrown if eval() is given a non-compiling script.
Done.
On Tue, 30 Sep 2008, Alexey Proskuryakov wrote:
>
> I've been trying to understand the difference between SharedWorker and
> DedicatedWorker interfaces. Besides the ability to pick an existing
> worker by name, are there any other semantic differences? I may be
> missing something, but it looks like a single Worker interface with an
> optional name parameter to constructor would work just as well.
That's what we used to have. It was changed because having shared and
dedicated workers be too similar was considered bad API design. To be
honest I somewhat agree with the position that says they should be
distinct. I don't understand the benefit of making them the same.
On Thu, 9 Oct 2008 kevin.hakanson at thomsonreuters.com wrote:
>
> 1.1.3 Worker used for backgroud I/O
> backgroud (should be background)
>
> 1.1.4 Shared workers
> idependently (should be independently)
>
> 2. Infrastructure
> multple (should be multiple)
Thanks, fixed.
On Thu, 30 Oct 2008, Jonas Sicking wrote:
>
> Only the globalscope is specified to implement EventTarget, the actual
> Worker should too.
Fixed.
On Mon, 3 Nov 2008, Jonas Sicking wrote:
>
> For future compat it would be good to expose to workers information on
> what browser is currently being used. This can be used to work around
> bugs and lack of features.
>
> In a 'normal' window context the navigator object exposes a set of
> properties, such as userAgent, that can be used for this purpose. I
> suggest we add something similar to the worker context. The HTML5 spec
> defines the following:
>
> interface Navigator {
> // client identification
> readonly attribute DOMString appName;
> readonly attribute DOMString appVersion;
> readonly attribute DOMString platform;
> readonly attribute DOMString userAgent;
>
> // ... other things not related to identifying the UA
> };
>
> I'm not sure how stable this part of the HTML5 spec is, (I know firefox
> exposes a whole host of more properties), but it seems like a good set
> to start with. We should probably keep the two in sync if the window
> context Navigator object changes in the future.
>
> Orthoginally, it seems like at least the 'onLine' boolean on the
> Navigator interface would be useful too, and could be exposed at the
> same place.
Done.
On Wed, 12 Nov 2008, Dmitry Titov wrote:
>
> 1. The sample code looks as
> if setTimeout/clearTimeout/setInterval/clearInterval should be available to
> Workers (as methods of WorkerUtils?) but they are not explicitly specified
> on any interface. Should they be there?
Added.
> 2. It seems workers should be able to create workers (including creating
> 'themselves' in case of SharedWorker). It is especially useful for a
> SharedWorker to create dedicated workers - since one of the popular
> scenarios for it is likely a "state container" that communicates with UI
> pages while using dedicated workers to do other operations. Will spec
> include this?
Added.
On Fri, 12 Sep 2008, Aaron Boodman wrote:
>
> * Similarly, I'd like to rename startConversation() to connect(). I
> think this aligns nicely with the onconnect event (connect() should also
> trigger a 'connect' event inside the worker).
On Tue, 4 Nov 2008, Jonas Sicking wrote:
>
> I'm fine with removing things like 'startConversation' and the implicit
> call to connect() on shared workers. 'startConversation' isn't really a
> new communication mechanism, but rather a convenience method on top of
> postMessage.
I removed startConversation() from the specs altogether, since it was
causing too much confusion. It was only a convenience method (it was
exactly equivalent to creating a new MessageChannel followed by calling
postMessage with one of the new ports), so this doesn't change anything
about the actual API.
I've snipped reference to startConversation from the following feedback,
as noted below, to avoid confusion over this.
On Fri, 12 Sep 2008, Aaron Boodman wrote:
>
> * I think it was an interesting idea to have separate interfaces for
> Dedicated and Shared workers, but in the end I don't think there's
> enough difference between the two cases to justify it. I'd rather have
> the total API surface be smaller, and generalize concepts as much as
> possible. So...
> - We should remove all the postMessage/onmessage stuff from
> DedicatedWorker, and just use the port convenience property.
> - We should move onconnect() up into WorkerGlobalScope
On Tue, 30 Sep 2008, Aaron Boodman wrote:
>
> That is what we are debating here. Logically, there is a concept of a
> "shared worker", which can be referenced from multiple pages in the same
> origin. There is a debate about how much the interface between shared
> workers and dedicated workers should be different.
>
> I'm of the opinion that there should be as little difference as
> possible, to lower the amount of API to learn. Therefore in my preferred
> proposal, the only difference between SharedWorker and DedicatedWorker
> is that the latter has a close() method. It doesn't make sense to allow
> one user of a SharedWorker to close something others are depending on.
>
> Jonas is in favor of making a particular use case of DedicatedWorker as
> simple to use as possible. This requires extra API on DedicatedWorker
> that doesn't make sense for SharedWorker.
On Tue, 30 Sep 2008, Alexey Proskuryakov wrote:
>
> I'm not sure it's so good in the case of dedicated workers either, as
> they can be used from other contexts via additional message ports. The
> close() method could just close the default port.
>
> Both shared and dedicated workers have to maintain a strong reference to
> a context that created them, just to have a context to execute event
> listeners in. So, they are basically the same as far as implementation
> is concerned AFAICT.
On Tue, 30 Sep 2008, Aaron Boodman wrote:
>
> Sure, but in order for that to have happened, whoever created the worker
> in the first place must have done it on purpose. The original worker
> instance is anonymous. If the creator of that worker decides to share
> it, that's fine, but it's more like cooperative sharing.
>
> close() was added so that you could forcibly kill a worker. For example,
> if you are searching a large set with many workers, you may want to kill
> them once one finds a match.
>
> The same could be achieved by just setting all the ports to null and
> waiting for GC, but:
>
> a) GC might not be for awhile, which is wasteful
> b) It is hard to track where all the ports went
>
> So I think it is useful to have a conceptual difference between workers
> that are 'dedicated' and those that are 'shared'.
On Tue, 30 Sep 2008, Alexey Proskuryakov wrote:
>
> Hmm... So this is more about how you use the interface, not what the
> object behind it is. If one chooses to never call close() on a shared
> worker (or, say, sets myWorker.close to null right after invoking
> constructor), it becomes indistinguishable from a dedicated worker.
>
> Hiding close() possibly sounds more like something a high-level
> framework may want to do to enforce a certain design pattern than a core
> feature.
On Tue, 30 Sep 2008, Aaron Boodman wrote:
>
> I could see that too. When all the parties accessing a shared worker are
> from the same origin (as they are today) it is less of an issue. You can
> probably assume that they know not to close() the worker.
On Thu, 6 Nov 2008, Alexey Proskuryakov wrote:
> Nov 6, 2008, × 2:18 AM, Jonas Sicking ÎÁÐÉÓÁÌ(Á):
> >
> > A shared worker is shared between all scripts on a single site[*] that
> > instantiates a worker with the same name. I.e. where the second
> > argument to the constructor is the same. (Don't remember what happens
> > if the second argument is the same as an existing worker, but the
> > first is not, check with the spec).
>
> Sure, that part is clear - but it's only about the behavior of the
> object's constructor, not the object itself! It alone doesn't warrant
> having a separate interface.
>
> As an example from another area, see mmap(2) function - you can pass
> MAP_ANON or MAP_FILE via its flags to achieve similar results. Note also
> that it has a number of other options. If we create a separate interface
> for every Worker isolation level needed (both inside and outside), we'll
> soon end up with PrivateWorker, SharedDataWorker and who knows what
> else.
We have different objects for shared and dedicated workers for a multitude
of reasons:
1. Shared workers have a name.
2. Dedicated workers have a method to terminate() them.
3. The two have different communication needs (see below for detail).
4. It allows us to have clearly named constructors, which makes for
more self-documenting code.
5. It's less confusing to authors if the two concepts are distinct,
since they have such different use cases and use patterns.
The reasons for mixing them back together (as they used to be in the
original proposal) seem very unclear to me.
On Fri, 12 Sep 2008, Aaron Boodman wrote:
>
> * I think onclose makes sense on Port instead of on Worker. The other
> side of a Port can close out from under you, even if it is a window.
MessagePorts do have an onclose; it's separate from the onclose of the
Worker objects.
On Fri, 12 Sep 2008, Aaron Boodman wrote:
>
> * Ojan brought this up earlier, but I don't think there should be
> anything added to the global scope of workers except a single 'self'
> object, which implements all the APIs that are available there.
I did this once, and was immediately told to undo it, so I'm reluctant to
make this change again.
On Fri, 12 Sep 2008, Aaron Boodman wrote:
>
> * I think it would be a nice convenience to have an onmessage event
> inside workers that receives all messages sent to any port in the
> worker.
On Tue, 30 Sep 2008, Alexey Proskuryakov wrote:
>
> Creating/accessing a shared worker could also create a default port that
> for use with Worker::postMessage, while all messages from such would be
> forwarded to WorkerGlobalScope::onmessage perhaps? Closing such a port
> wouldn't kill the worker thread, of course.
What's the use case for a global watching point like this? This seems very
odd.
On Mon, 3 Nov 2008, Aaron Boodman wrote:
>
> My biggest issue with the proposal as currently drafted is that there
> are so many different ways to send and receive messages. I think this
> overcomplicates the proposal for both developers and implementors.
>
> For dedicated workers, you can either send single messages using the
> Worker object directly, like in Gears:
>
> var worker = new Worker("foo.js");
> worker.postMessage("ping");
>
> [... snipped startConversation, which is equivalent to sending
> MessagePorts manually ...]
>
> Note that the worker has to know ahead of time which API the callers
> will use since the way that it replies is different depending on that.
> If the caller used Worker.postMessage(), the worker should reply like
> this:
>
> onmessage = function() {
> postMessage("pong");
> }
>
> ... but if the caller [posted a port to talk over], then the worker
> should reply like this:
>
> onmessage = function(e) {
> e.port.postMesage("pong");
> }
>
> * Workers have to know what interface was used to send them messages. If
> the page using a worker decide to start using a more powerful send API,
> the worker must also be upgraded. You can already see examples of this
> problem in the samples at the beginning of the draft. They are marked
> with the comments "// support being used as a shared worker as well as a
> dedicated worker".
This is just like if the client sent "PING" instead of "ping". It's not a
different mechanism, it's just a different way of using the mechanism.
Sure, you have to set up a convention for messages for your worker. I
don't see this as a problem.
On Fri, 12 Sep 2008, Aaron Boodman wrote:
>
> Thinking about this some more, having the "port" convenience properties
> gets confusing when there are multiple clients sending messages, and
> doesn't make a whole lot of sense with shared workers.
>
> I think we should just get rid of these. It only adds one line of code
> to the simple case.
On Mon, 15 Sep 2008, Chris Prince wrote:
>
> I like this a lot. +1 to making connect() always explicit. Implicit
> creation of ports led to many rough edges.
On Mon, 15 Sep 2008, Chris Prince wrote:
>
> I think your proposal nearly works for window.postMessage() too. If you
> move 'onconnect' and 'connect()' into a MessageReceiver interface
> [better name TBD], and make Worker and Window both inherit from
> MessageReceiver, do you end up with a unified messaging model?
On Wed, 24 Sep 2008, Aaron Boodman wrote:
>
> In the current design, there are three separate mechanisms to connect to
> and communicate with a worker:
>
> a) DedicatedWorker::postMessage() + DedicatedWorkerGlobalScope::onmessage
>
> [... snipped startConversation ...]
>
> c) new SharedWorker() + SharedWorkerGlobalScope::onconnect
>
> I would like to combine all of these into one common mechanism:
>
> - Create a worker using either new Worker() or new SharedWorker()
> - Call connect() to get a channel
> - Inside the worker, listen for onconnect, then receive messages using
> the port's onmessage event
On Tue, 30 Sep 2008, Aaron Boodman wrote:
>
> SharedWorkers are by definition meant to be used from multiple pages, so
> a developer will usually not use the default port since it would only
> work for the first client and not any other. If the developer only
> intended for their to be one client, he would just use DedicatedWorker.
On Mon, 3 Nov 2008, Aaron Boodman wrote:
>
> SharedWorkers require a third, completely different API to send messages:
>
> var w = new SharedWorker("foo.js", "foo");
> w.port.postMessage("ping");
> w.port.onmessage = function(e) {};
>
> The interface to receive messages in a SharedWorker is also special:
>
> onconnect = function(e) {
> e.port.onmessage = function(e) {
> e.port.postMessage("pong");
> }
> }
>
> This lack of generality bothers me on an aesthetic level, but I also
> think it has the following real problems:
>
> * Having different interfaces for each use case means that each new
> feature has to be added to each interface separately. [snip
> misunderstanding over startConversation]
>
> * Having multiple interfaces probably increases the chance of developers
> misunderstanding and using the wrong tool for the job. [snip example
> involving startConversation]
>
> * More API for developers to learn and implementors to build.
>
> I think that these issues can all be addressed by simplifying and
> combining the various APIs. This will make the simplest examples of
> workers require slightly more code, but I think it is much simpler and
> more elegant.
>
> Here is how it would work:
>
> * Get rid of the DedicatedWorker interface.
>
> * Add [a method called] "connect()" and make the onconnect event fire
> inside the worker each time it is called.
>
> Here's an example in code:
>
> // dedicated workers (outside)
> var worker = new Worker("foo.js");
> var port = worker.connect();
> port.onmessage = function() { }
> port.postMessage("ping");
>
> // dedicated workers (inside)
> onconnect = function(e) {
> e.port.onmessage = function(e) {
> e.port.postMessage("pong");
> }
> }
>
> Shared workers are exactly the same except the constructor is
> SharedWorker("foo.js", "foo");
>
> Note that I do not think it is necessary to implement this all at
> once. For one, the SharedWorker constructor could easily be punted for
> future releases.
On Tue, 4 Nov 2008, Jonas Sicking wrote:
>
> My main concern is that I think this makes the most simple use case a
> bit too complicated. In the case when you have a dedicated worker that
> you want to offload some calculations too you need quite a bit of code
> to set up that communication.
>
> With the current API you'd do the following:
>
> main.js:
> w = new Worker('worker.js');
> w.postMessage(17);
> w.onmessage = function(e) {
> answer = e.data;
> }
>
> worker.js:
> function heavyCalulation(inValue) {
> ...
> }
> onmessage = function(e) {
> postMessage(heavyCalculation(e.data));
> }
>
>
> With the proposed API:
>
> main.js:
> w = new Worker('worker.js');
> p = w.connect();
> p.postMessage(17);
> p.onmessage = function(e) {
> answer = e.data;
> }
>
> worker.js:
> function heavyCalulation(inValue) {
> ...
> }
> onconnect = function(e) {
> e.port.onmessage = function(e2) {
> e.port.postMessage(heavyCalulation(e2.data));
> }
> }
>
> This complexity I feel is extra bad since I suspect the simple case is
> going to be the common case (I know we disagree there). I especially
> dislike the fact that you have to wait for two events, first a 'connect'
> event and then the actual message event. This seems overly complex for
> the simple case of simply wanting to use a single communication channel
> with a dedicated worker. And even though there isn't that much more code
> in my example above, it took significantly more effort to get it right
> given the nested two handlers that were needed.
>
> So I think we should keep the simple case of a dedicated worker and a
> single communication channel as simple as possible. This means that I
> think we should keep postMessage/onmessage available on the dedicated
> worker directly, as well as the dedicated worker global scope.
>
> As an added bonus this keeps things very similar to message passing
> between windows.
On Tue, 4 Nov 2008, Jonas Sicking wrote:
>
> So Hixie brought up a good point on IRC, there really is only one
> communication mechanism, which is postMessage/onmessage.
>
> I'd note that [...] all proposals have two 'communication' mechanisms:
> postMessage and connect.
>
> With Aarons proposal you have to both for a shared worker and a
> dedicated worker use both mechanisms; first call connect() and then call
> postMessage(). If keep postMessage on the dedicated worker the only
> difference is that for a dedicated worker you skip the connect() and go
> directly to postMessage.
On Thu, 6 Nov 2008, Alexey Proskuryakov wrote:
>
> Something that seems missing from these discussions is how the API
> changes affect semantics of worker behavior, which makes it hard to
> compare proposals. For example, having some port singled out as an
> attribute of Worker (or as an implicit hidden attribute, used by methods
> defined on Worker itself) sorta implies that closing it should close
> other ports and dispose of the worker global scope soon. Similarly,
> having separate interfaces for Worker and SharedWorker implies that
> there is some fundamental difference in their behavior - a difference
> that eludes me so far.
>
> It would seem that we could make better progress if we had a list of
> requirements that we want to have fulfilled (e.g., should it be possible
> to easily manage worker lifetime manually by closing a dedicated port?
> or should workers strive to remain available for as long as any client
> has an open port, which helps write reliable code?).
>
> I really don't think that the desire to save one line in a 500-line
> program should affect the design too much. Convenience methods can be
> added later, but over-engineered parts cannot be removed later. And at
> this point, I believe that we are mostly speculating about use cases
> (probably except for Google Gears folks, who are shipping a similar API,
> and get developer feedback already).
On Wed, 5 Nov 2008, Jonas Sicking wrote:
> Ian Hickson wrote:
> > > * Remove the port property from the SharedWorker interface and give
> > > it a postMessage and onmessage just like dedicated workers have.
> >
> > I really don't like this. With (Dedicated)Worker it makes sense
> > because both sides bury the underlying message channel and ports and
> > so things like closing the port, or whether the port is active, are
> > hidden on both sides. But with SharedWorker, if we only bury it on one
> > side, there is a lack of symmetry that IMHO is going to lead to all
> > kinds of issues and confusion. I really don't like that. If people
> > start sending one side's pipe down another channel, we can end up with
> > a situation where a SharedWorker object really represents a port that
> > has nothing to do with the worker anymore.
>
> It's not really that different from what you have today where a
> myWorker.port object can send messages to something that isn't a worker
> at all.
>
> It also removes the issue where the .port property on a shared worker is
> readonly but dead.
I believe that the idea that the API for shared and dedicated workers
should be the same is misguided. The spec used to make the two cases
identical. The result was confusion, and the dedicated case was much more
complex than necessary.
Shared workers and dedicated workers are fundamentally different and have
different needs, and we should expose these needs in ways optimised for
the two cases.
The basic need is that dedicated workers be able to have a two-way
communication channel with their creators, and shared workers be able to
have a two-way communication with each user of the worker.
Assuming we use MessagePorts on each end for communication, then we have
to get the ports to each end somehow.
On the outside, we can expose a port on the worker object:
var worker = new Worker(url);
worker.port...
var worker = new SharedWorker(url, name);
worker.port...
...and on the inside we can have either the port provided through an
event:
// case A
var port;
onconnect = function(e) {
port = e.port;
};
...or accessible in general:
self.port // case B
The latter (B) doesn't work for shared workers, since there are multiple
ports in that case, so for shared workers we have to use the former (A).
For dedicated workers though that's way more complexity than we want to
require of authors -- why do they have to listen for a port when there
will always be exactly one? So it makes sense to use
Now, we've at this point made the two different already, so as to simplify
the dedicated worker, so we could (and the spec does) make the dedicated
worker even simpler while we're at it.
One way to do that is to bury the ports into the Worker and global scope
objects. If we do one side, though, we have to do the other, because it
would be really weird to have a message channel that half-acts like a
two-port channel and half doesn't. For example, if we bury it, we
shouldn't expose .close(), since it's better for the worker to be closed
using the actual .close() or .terminate() API, but if we only "bury" one
side, then one end could close the pipe and not the other, and we'd have
to make sure we expose .onclose on the buried end, and so forth.
So, we end up with what the spec has now.
I think what we have now is better than making dedicated and shared
workers superficially the same (as the spec used to be, and as the people
involved in this thread argued was bad) is more confusing for authors.
At this point, if the only arguments for changing the API are "it's
confusing for authors", then I'd rather not change the API. We got to
where we are today by carefully considering what would be better for
authors. We could continue going back-and-forth and reverting earlier
decisions until the cows come home, but I see no benefit to doing so.
On Fri, 12 Sep 2008, Aaron Boodman wrote:
>
> * I still don't buy the utility of passing around MessagePorts, so I
> suggest we table that for v2. It can always be added back later.
Since they so drastically affect the API design, I think putting them off
is a mistake. We might end up constraining ourselves in unobvious ways.
If there are specific points that I have not explicitly replied to that
materially affect the arguments and my responses, please let me know. I
tried to take all the input above into account, but it was non-trivial,
since much of it was contradictory, with even individual people putting
forwards opposite arguments.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list