[whatwg] WebWorkers vs. Threads

Wed Aug 13 11:50:31 PDT 2008

Kristof Zelechovski wrote:
> A background task invoked by setTimeout has to be split to small chunks;
> _yielding_ occurs when each chunk ends (having called setTimeout to execute
> the next chunk).  It is very hard to code in this way; you have to maintain
> an explicit stack and create an exit/entry point at every chunk boundary.
> This technique is interesting as an academic exercise only, real-world
> developers will be right to stay away from it.
>   

I'm not sure I get your meaning. If this is how current browsers 
implement setTimeout then how is it "academic"? Also since nobody is 
talking about deprecating setTimeout I don't see how its relevant. 
Whatever happens setTimeout remains an issue that real-world developers 
can't stay away from.

> Guarding concurrent access to global variables is not enough if those
> variables hold references to objects because an object can end up in a
> logically inconsistent state if two threads try modifying its properties
> concurrently.  The objects would have to be lockable to avoid corrupting
> global state.
> Even if you limit yourself to scalar variables, there is nothing to prevent
> a script to define a compound state as a set of scalar variables, each one
> with its own name.  While it is not a good programming practice, old code
> does it a lot because it is (or was) more efficient to say 'gTransCount'
> than 'gTrans.count'.
> Chris

Ok I'm clear on that, these are good arguments for providing explicit 
locking. I'm still not clear on how variable race conditions in multiple 
interleaved setTimeout  chunks would be different for true threads but 
I'll take your word for it that automated locking is hard or impossible 
to implement.

What I really don't understand is how the WebWorkers proposal solves 
this. As far as I can tell it does some hand-waving with MessagePorts to 
pretend it goes away but what happens when you absolutely DO need 
concurrent access to global variables - say for example the DOM - from 
multiple threads? How do you perform any sort of synchronisation?

Take the example given:
{ var la = g.i; g.i = la + 1 }

The WebWorkers implementation (scary! hide your children!!):

--- worker.js ---
updateGlobalLa = function (e) {
   var localLa = someLongRunningFunction( e );
   workerGlobalScope.port.postMessage("set la = "+ localLa);
}
workerGlobalScope.port.AddEventListener("onmessage", updateGlobalLa, false);
workerGlobalScope.port.postMessage("get la");

--- main.js ---
// global object or variable
var la = 0;

handleMessage = function(e) {
   if (typeof e.match("set la"))
      la = parseInt(e.substr(3));
   } else if (typeof e.match("get la")) {
      worker.postMessage(la.toString());
   }
}
var worker = new Worker("worker.js");
worker.AddEventListener("onmessage", handleMessage, false);

Unlike the one-line example above we increment the global value based on 
some long-running calculation on its original value (rather than just 
add 1). This shows a more realistic use case for threading. 
Unfortunately our potentially dangerous one-liner is now an equally 
dangerous 18-line monster spread over 2 files and we STILL haven't 
solved the issue of another worker or the main context updating 'la' 
between our original postMessage query and our response.

I should also point out that even this simple, naive and probably 
incorrect example still took me nearly 2 hours to write - largely due to 
the complexity of the WebWorkers spec and the lack of any decent 
examples. Honestly anyone who thinks this interface is supposed to make 
things easier is kidding themselves.

Regardless of the kind of Getters/Setters/Managers/Whatever paradigm you 
use in your main thread you can never escape the possibility that 2 
workers might want exclusive access to an essential global object (ie, 
DOM node or global setting). So far I have not found any real-world 
programming language or hardware that can do this without some kind of 
side-effect or programming construct (ie, locks, mutexes, semaphores, 
etc...). What WebWorkers is really doing is requiring the author to 
write their own.

In other words despite all the complexity and limitations of workers all 
that's actually achieved is:
a.) Synchronisation problems simply promoted to the message queue level.
b.) Decrease in performance due to horrible string-only messaging interface.
c.) Increase in browser and javascript bugs due to API complexity.
d.) Decrease in programmer interest in using threads (I certainly 
wouldn't use them in their current state).

I don't think I can stress enough how many important properties and 
functions of a web page are ONLY available as globals. DOM nodes, style 
properties, event handlers, window.status ... the list goes on. These 
can't be duplicated because they are properties of the page all workers 
are sharing. Without direct access to these the only useful thing a 
worker can do is "computation" or more precisely string parsing and 
maths. I've never seen a video encoder, physics engine, artificial 
intelligence or gene modeller written in javascript and I don't really 
think I ever will. Apart from being slow there is the obvious 
correlation that anything that complex is:

a.) The realm of academics and science geeks using highly parallel 
specialist systems and languages, not web developers.
b.) Valuable enough to be commercial software - and therefore requiring 
protection against illicit copying (something Javascript can't provide).

Shannon