[whatwg] Number of workers (various threads)

Wed Dec 9 07:28:56 PST 2009

On Mon, 7 Dec 2009, David Bruant wrote:
> Ian Hickson:
> > David Bruant:
> > > In the delegation example, the number of workers chosen is an 
> > > arbitrary 10. But, in a single-core processor, having only one 
> > > worker will result in more or less the same running time, because at 
> > > the end, each worker runs on the only core."
> >
> > That depends on the algorithm. If the algorithm uses a lot of data, 
> > then a single hardware thread might be able to run two workers in the 
> > same time as it runs one, with one worker waiting for data while the 
> > other runs code, and with the workers trading back and forth.
> > 
> > Personally I would recommend basing the number of workers on the number 
> > of shards that the input data is split into, and then relying on the UA 
> > to avoid thrashing. I would expect UAs to notice when a script spawns a 
> > bazillion workers, and have the UA run them in a staggered fashion, so 
> > as to not starve the system resources. This is almost certainly needed 
> > anyway, to prevent pages from DOSing the user's system."
> 
> Wouldn't it be preferable to have an implementation-dependant "maximum 
> number of workers" and to raise a security exception when this number is 
> reached ? Maximum per domain ? per document ?

On Mon, 7 Dec 2009, Drew Wilson wrote:
>
> We discussed this previously ( 
> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-July/020865.html) 
> - the consensus was that since the Worker APIs are inherently 
> asynchronous, user agents were free to impose limits on worker creation 
> (and queue up new creation requests when the limit has been hit), but 
> the spec should be silent on this issue.
> 
> FWIW, Chrome does this - we limit each tab to a maximum of 16 workers, 
> with an overall limit of 64 - the 17th worker creation request for a tab 
> is queued up until another worker exits to make room for it. Chrome 
> currently has the limitation that each worker runs in its own process - 
> when we address that limitation, we expect to adjust the limits 
> accordingly.

I agree with Drew that it is better not to have hard limits. I don't 
really see the benefit of hard limits. Sometimes, workers aren't used for 
splitting a high-CPU task into many shards, they're used for splitting 
logically unrelated tasks into separate code units. This case is frankly 
likely to be more common, in fact. And in this case, I don't see why you'd 
want to limit it to the same limit as if the system was under heavy load.

On Mon, 7 Dec 2009, David Bruant wrote:
> Ian Hickson wrote:
> > On Wed, 11 Nov 2009, David Bruant wrote:
> >>
> >> My concern is about the arbitrarity of the "10".
> >
> > I agree that it's suboptimal. However, I think realistically a good 
> > implementation of parallel work would need some sort of dynamic 
> > performance tuning, continuously slowly ramping up the number of workers 
> > while it increases throughput, and when throughput decreases, switching to 
> > reducing the number of workers until throughput increases again. That 
> > would probably be too complicated to show in an example in the spec.
> 
> As far as I know, if the running time is a decreasing function of the
> number of workers (which seems to be a non-such-trivial assumption), the
> best algorithm (on average, of course !) is to double the number of
> workers until finding a barrier in the running time (find n such as the
> max is between [2^n, 2^(n+1)[ ). Then, use a dichotomy algorithm in this
> interval.
> 
> However, running this algorithm each time a UA comes in your website can
> become to be costly.

Sure. You'd want to remember where you left off, and start from there, as 
an optimisation.

> I think that the spec should say something about "finding the correct 
> number of thread for your problem" and encourage authors to store their 
> value with the local storage and add a date to re-run the algorithm if 
> the value is old (more than two months ?)

I think this kind of low-level advice is best left to best-practices 
guides and tutorials. I haven't added it to the spec. I'm concerned that 
it's still early days for this API and so we shouldn't be making hard and 
fast recommendations that may turn out to be wrong.

> >> My point is that this number may be available very easily. For 
> >> example, in my dual-core, Linux, Firefox 3.5, the number is 2. Why 
> >> spare an information that can be useful and reliable (more than 
> >> measurement at least !) ?
> >
> > It's actually probably quite rarely 2. It depends on all kinds of 
> > factors, like the kind of algorithm, what other programs are running, 
> > etc.
> >
> > I still haven't added this feature, as I do not believe the arguments 
> > presented form a convincing case. However, if you are still interested 
> > in persuing this feature, I encourage you to convince a browser vendor 
> > to implement it, as discussed here:
> >    
> > http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F
>
> I understand now that my proposition was not very realistic. However, I 
> think that something should be described in the spec to avoid arbitrary 
> choices of number of workers.
> 
> I'm convinced that with canvas and workers, we will see very soon fancy 
> web video games with AI. Authors of those games will want to use as many 
> computation power as possible for their AI, so the "best number of 
> workers" will be a value that they are interested in.
> 
> So, either an "hint" can be provided by the spec through the Navigator 
> object, either the spec could provide the above described method 
> (improve my description or ask me to improve it if you want) to find the 
> "best" number (according to a particular problem).

Until we know what to recommend, it's probably too early to add something 
to the spec. Specs tend to have a little more weight than we might want to 
give to a recommendation right now. I would suggest writing tutorials and 
best practices guides and seeing what experiences we get from that.

On Mon, 7 Dec 2009, David Bruant wrote:
> >
> > Plus, since browsers don't have thread-safe DOM implementations, we 
> > actually can't expose the DOM in workers. Maybe one day. :-)
>
> I'm sorry for the misunderstanding. I shouldn't have said "the DOM API". 
> To be as accurate as I can be I want to provide the DOMImplementation 
> interface (http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-102161490) 
> to the workers. As I'm going to explain, the point is to be able to 
> create a document and then a documentFragment.

Since browsers don't have thread-safe DOM implementations, that's 
basically a non-starter. It doesn't matter that we aren't offering access 
to the same DOM in pages and workers; the actual innards of the DOM 
implementations aren't thread safe.

As soon as browsers are able to implement this, I'm sure it will be added 
to the spec.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'