[whatwg] WebWorkers vs. Threads

Wed Aug 13 01:14:14 PDT 2008

Jonas Sicking wrote:
> Shannon wrote:
>> I've been following the WebWorkers discussion for some time trying to 
>> make sense of the problems it is trying to solve. I am starting to 
>> come to the conclusion that it provides little not already provided by:
>>
>> setTimeout(mainThreadFunc,1)
>> setTimeout(workThreadFunc,2)
>> setTimeout(workThreadFunc,2)
>
> Web workers provide two things over the above:
>
> 1. It makes it easier for the developer to implement heavy complex 
> algorithms while not hanging the browser.

I suppose the limitations of the current approaches depends largely on 
what Javascript actions actually block a setTimeout or callback 
"thread". I keep being told WebWorkers solves this problem but I don't 
know any examples of code or functions that block the running of other 
callbacks. As with Lua I have always treated setTimeout as a means to 
execute code in parallel with the main "thread" and never had an issue 
with the callback or main loop not running or being delayed.

> What you describe above is also known as cooperative multithreading. 
> I.e. each "thread" has to manually stop itself regularly and give 
> control to the other threads, and eventually they must do the same and 
> give control back.

Actually I was referring to the browser forcefully interleaving the 
callback execution so they appear to run simultaneously. I was under the 
impression this is how they behave now. I don't see how Javascript 
callbacks can be cooperative since they have no yield statement or 
equivalent.

> I'm also unsure which mozilla developer has come out against the idea 
> of web workers. I do know that we absolutely don't want the 
> "traditional" threading APIs that include locks, mutexes, 
> synchronization, shared memory etc. But that's not what the current 
> spec has. It is a much much simpler "shared nothing" API which already 
> has a basic implementation in recent nightlies.

He wasn't against WebWorkers, he was, as you say,  against full 
threading (with all the mutexes and locks etc... exposed to the JS 
author). I can't find the reference site but it doesn't really matter 
except from the point of view that many people (including myself) aren't 
convinced a full pthread -like API is the way to go either. I just don't 
see why locking can't be transparently handled by the interpreter given 
that the language only interacts with true memory registers indirectly.

In other news...

Despite the feedback I've been given I find the examples of potential 
applications pretty unconvincing. Most involve creating workers to wait 
on or manage events like downloads or DB access. However Javascript has 
evolved a fairly complex event system that already appears to provide a 
reasonable simulation of parallelism (yes it isn't _true_ parallel 
processing but like Luas coroutines that isn't really apparent to the 
end user). In practice this means long-running actions like downloading 
and presumably DB interaction are already reasonably "parallel" to the 
main execution thread and/or any setTimeout "subprocesses". I would 
suggest it is even possible for future browsers to shift some of these 
activities to a "true" thread without any need for the authors explicit 
permission.

I would really prefer that WebWorkers were at a minimum a kind of 
syntactic sugar for custom callbacks (ie setTimeout but with arguments 
and a more appropriate name). With the exception of OS threads it seems 
to me that WebWorkers is at best syntactic sugar for existing operations 
but with zero DOM access and serious IO limitations. Also unlike 
existing options and normal threading conventions the WebWorker is 
forced to download its code and import as a string its arguments rather 
than have its code passed in as a function reference and its arguments 
passed by reference or true value. I know all the reasons why these 
limits exist; I'm just saying I think they render the whole proposal 
mostly useless; kind of like a car that only runs on carrots.

I have come up with one valid use case of my own for the current 
proposal: distributed computing like SETI or Folding at Home in Javascript. 
This would allow you to participate ALL of your multi-core or SMP 
computer resources to the project(s) just by visiting their site. 
However on further consideration this has two major flaws:

1.) Being an interpreted language and having no direct access to MMX, 
GPUs and hardware RNGs makes Javascript a poor choice for intensive 
mathematical applications like these. I would expect a plugin or 
standalone version of these tools to have anywhere from a 10x to 10,000x 
improvement in performance depending on the calculations being performed 
and the hardware available. Yes there are a few more risks and a few 
more clicks but I wonder whether just having access to a few more 
threads will sway these groups to start a seperate web-only codebase 
(with all the extra maintenance involved).

2.) Computing power is a resource. It can be bought and sold. However it 
can also be stolen and used by malicious sites for key/password 
cracking, captcha solving, DDOS and other schemes. Downloading a plugin 
or application for distributed computing is generally (on a secure 
browser) an opt-in process. However it is debatable whether visiting a 
website counts as opting-in; especially if the workers are being spawned 
by ad banner iframes rather than the primary site. What (other than 
political implications) stops hacked sites and banner networks from 
selling processing time on my computer each time I visit a sponsored 
site? True this could be happening now using only one thread but since 
we are talking about unlocking more resources the issue of how they are 
allocated becomes more relevant. Could we end up opening access to all 
system CPUs just to have sites abuse it and browser vendors throttle it 
anyway due to end-user complaints.

Anyway, I ask why is it necessary to provide a crippled threading 
library for non-existent applications when the potential for a better 
model surely exists. Right now I would like to see coroutines with full 
global and DOM access adopted with an eye to one day running them across 
multiple cores (using automated serialisation of read/writes on global 
variables). This should greatly simplify moving legacy global/DOM 
dependent code to our shiny new "multi-threaded" environment.

If we adopt WebWorkers in its current form I think we'll just have to 
deprecate it in a few years due to it being clumsy and limited compared 
to coroutines or future alternatives. Naturally this will mean all the 
"top 100" corporate sites that rushed to implement it will hold off the 
actual deprecation for many years beyond that (aka "quirks mode"). I 
have no doubt that true multi-core web applications will happen, and I 
welcome it. I just don't want to see it implemented like WebWorkers in 
anything like its current form.

Shannon