[whatwg] Web Workers and MessagePort feedback

Tue Aug 5 15:33:05 PDT 2008

Thanks for the quick reply...

On Tue, Aug 5, 2008 at 2:52 PM, Jonas Sicking <jonas at sicking.cc> wrote:
>> I know this is weird wrt GC when combined with MessagePorts, and I
>> don't have a proposed solution.
>
> I don't think we should say much regarding GC at all. All we should say is
> that GC should not affect the operation of the page. I.e. it is not allowed
> to GC an Worker that someone still has references to, or a Worker that has
> XHR loads in progress or timers pending.
>
> Very few other specs mention GC and I haven't noticed that ever being a
> problem. For example everyone agrees that it's a bug that gecko sometimes
> GCs the parent of a node, if you're not actively holding any references to
> anything in the parent chain.

The spec doesn't have to mention GC, but it does mention
'reachability' right now and what happens when an object is no longer
reachable. This has an impact on interop, so I think it should be
well-defined.

>> Here is how the previous two suggestions would look together:
>>
>> var worker = new Worker("foo.js");
>> worker.onload = function() { ... }
>> worker.onerror = function() { ... }
>> worker.onunload = function() { ... }  // called when the worker shuts down
>> worker.sendMessage("hello!");
>
> So I really like this API. However it makes it completely impossible to ever
> pass worker objects across threads. I.e. we could never allow:
>
> worker1.postMessage("...", worker2);
>
> This would be very strange if we had .onload, .onerror etc on the worker
> object itself since those properties wouldn't make much sense living in
> multiple "threads" at once.
>
> While I agree direct communication between sibling workers is an edgecase,
> it's something I would prefer to not make impossible for future versions of
> the spec.
>
> Though I just realized that we could cover that case using only
> MessagePorts. So we say that you can only communicate with your creator, and
> any children using direct .postMessage. If you want to more complex
> communication patterns then set up MessagePorts.

Makes sense, and I like how this is something that could be layered in
a later version.

>> - The spec says that as soon as a worker is not reachable (determined
>> via GC) from any MessagePort, it is eligible for shutdown. Shutdown
>> would attempt to finish all queued messages, but not allow any new
>> ones.
>>
>> This concerns me because it means that workers will have different
>> behavior depending on GC timing. If a worker is not referenced from
>> any port, and it sends an XHR, that XHR may or may not be sent
>> depending on when GC runs. This is different than how XHR behaves
>> normally. Typically, XHR objects that have outstanding IO but no
>> referers will not be GC'd until they complete or fail.
>>
>> Finally this does not allow use cases such as creating a worker to
>> synchronize a local database with the network without ever sending
>> notifications back to the parent.
>>
>> Maybe workers should stay alive as long as any of the following are true:
>>
>> - There is script running in them
>> - There are messages to them queued
>> - There is a messageport alive anywhere that could send messages to them
>> - There are "asynchronous operations" (xhr, timers, database
>> operations) inside them outstanding
>
> Agreed. Like I said above, I think the less we say about GC the better. GC
> effects should not be noticeable to the page.

Ok, but right now, the spec says something that contradicts this. It
should either be changed, or removed, if people think the right
behavior goes without saying.

>> - Why is there an ownerWindow property on MessagePort? If I understand
>> correctly, this is just a synonym for the 'window' object of the
>> currently executing script context.  I think it should go away.
>
> If we put postMessage directly on the Worker object we don't need to mention
> MessagePorts in the Web workers spec at all. They can just be an orthogonal
> specs.

This feedback was referencing the separate MessageChannel section of
the web-apps spec. I glommed my feedback together because it was the
first time I'd looked at MessageChannels closely and they go together
for my purposes.

>> - The string URL property on the WindowWorker interface is less useful
>> than the parsed structure that window.location has. Can we use
>> something like this instead, except making it read-only?
>
> Why do we need it at all?
>
> If we do think it's useful, most of the uses that I've seen for the parsed
> URL structure has been to set the .hash in order to scroll around on a page
> or communicate between iframes of different origins (ugh!!). Neither of
> these applies here I'd say.

The protocol, host, hostname, port, pathname, and search properties
are all very useful. An application might want to compare the origin
of a message it receives with it's own host and port, for example.

Providing these split out avoids common parsing mistakes.

>> - The "front-line" nomenclature was a bit weird to me. How about
>> "top-level"?
>
> I didn't try to grokk this part yet. Is it just about estabilishing lifetime
> of the worker objects? If so, see my previous comments about GC.

Yes, that's mostly what it's about.

>> - Would it be too weird to have createWorker overloaded to take an
>> optional name parameter? This would make the behavior similar to
>> window.open(), which either opens a new window or reuses an existing
>> window with the same name.
>
> What would it be used for? window.open uses the name so you can target links
> at it which doesn't seem like it applies here either.

It's intended to be used by shared workers. Multiple pages are
intended to be able to open workers with the same name and URL, and
they get the same worker instance. This allows an application to
coordinate activities across multiple instances of itself.

This may go to my point about having motivations and sample code high
up in the document to establish the goals.

- a