[whatwg] [WebWorkers] Advocation to provide the DOM API to the workers

Mon Dec 7 16:38:03 PST 2009

> The reason WebWorkers don't have access to the DOM is concurrency. For
> example, to loop through a list of children I need to first read the
> number of childrens, then have a for loop which starts at 0 and ends
> at length-1. If you have two threads that can access the DOM
> concurrently, then one could change the number of children while the
> other was looping through the list, which would cause bugs in the
> program. The only way to fix this is to make the DOM a monitor or
> introduce semaphores, but then you would have to change the way the
> DOM is accessed in HTML5, breaking backwards compatibility, which is
> not a good idea.
>
> A better solution to your problem is to load fragments of the entire
> document using AJAX and then insert those fragments into the main
> document, when they are needed. You rarely need to see the entire
> document at once anyways.
>
> Marius Gundersen
>> One good way I have found would be to cut the whole page into several 
>> parts (one the server side, what is already done in the multi-page 
>> version) and to launch several workers. Each worker gets one part of the 
>> whole page in the background and could give it to the browsing context 
>> which will append the right part at the right place.
>>     
>
> As others have noted, the slowness turns out to not be parsing, but to be 
> a bunch of scripts that are doing various things such as adding the 
> sidebar annotations, setting up the <dfn> cross-references, and generating 
> the short table of contents.
>
> Plus, since browsers don't have thread-safe DOM implementations, we 
> actually can't expose the DOM in workers. Maybe one day. :-)
>   
> -- Ian Hickson
=> I'm sorry for the misunderstanding. I shouldn't have said "the DOM
API". To be as accurate as I can be I want to provide the
DOMImplementation interface
(http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-102161490) to the
workers. As I'm going to explain, the point is to be able to create a
document and then a documentFragment.

I will explain my point through another use case. (Sorry for the
confusion with the HTML5 one-page version.)

Let imagine that I want to build a single page with several non-HTML
sources of information. They can be in different formats (RSS, datas got
from XML-RPC requests, any other kind of XML file, JSON...). I suppose
that each source is a different JSON file with different structures
(different properties, different nestings). Each source needs a
particular treatment. As I said in my first e-mail, there are 3 mains
steps before visualizing my page fully loaded. For each source of
content, we have to :
(1) get the content
(2) transform it into a DOM tree (as a documentFragment or a string that
is the representation of a HTML fragment, for example)
(3) append this to the main document at the right place. (which triggers
graphical rendering)
This last step is either an appendChild or a ".innerHTML=" and must be
done in the main browsing context, there is no choice.

Let imagine that I want that one workers per source.
For the moment, WebWorkers can do the step (1) independently (thanks to
XMLHttpRequest).

When each workers receives its JSON string, this string must be
transformed into an HTML DOM tree (2) (let say a <table> for example).
Because none of the DOM core API is currently available to the
WebWorkers, we have two solutions to turn the JSON string received in
(1) into an HTML DOM tree :
(2.1) Send the JSON string (or the resulting object, whatever) to the
main document which will create a documentFragment, run through the JSON
object and append the <table>,<tbody>, <tr>s and <td>s and contents to
this fragment for all the sources.
(2.2) Each worker create a string which looks like "<table
id=blabla><tbody><tr class="blibli"><td>1</td><td>2</td></tr><tr
class="blibli"><td>3</td><td>37</td></tr></tbody></table>" with "+="
while running through the JSON object. Then send the string through
postMessage() and the main browsing context can do a
"rightPlace.innerHTML = e.data" (where e.data is the string).

(2.1) We have the document/documentFragment/Element/Node abstraction,
but we loose all the parallelism, because the browsing context is
handling all the sources of information (and creating a documentFragment
and all the appendings for each source)

(2.2) We have the parallelism, because each Worker handles a source.
However, we loose the DOM abstraction. I hope that I have made the
string ridiculously long enough to convince you that it is not a good
solution. For complicated examples, by experience, using += and
.innerHTML is always a source of error especially because of closing
tags. These problems don't occur when developing with the DOM abstraction.

My proposition is :
(2.3) Assuming that we have access to the DOMImplementation interface,
we can create an object implementing the document interface which is
DIFFERENT from the main document object and I insist on this point. I am
NOT proposing to provide an access to the main document (the one which
"created the workers").
Thanks to this document, we can create a different documentFragment in
each worker and do in a parallel way the documentFragment appendings
described in (2.1).
The receiving context could have the following code :
"onmessage_handler(e){
/* Some code to identify which worker it was and where its
documentFragment should be
** inserted in the document.
*/
rightPlace = some_function(e); // An element in the main document.
df = e.data; // this data is the documentFragment sent by the worker.
rightPlace.appendChild(df);
}"

postMessag-ing an element/document/documentFragment/Node can cause a
problem because of references to document that they contain (because a
worker must NOT have access to the main document and the main document
must NOT have access to a worker document either)
As far as I can see, there are only two potential problems to
postMessage such objects "from one document to another":
* From Node interface : ownerDocument.
For this, it can be decided that a when postMessage is called on a node,
this node and all the subtree are automatically .adoptNode-ed by the
main document (window.document) of the receiving context.

* From Node interface : parentNode.
When a node is postMessage-ed, if its parentNode is a reference to the a
node in the worker context (a document or documentFragment), we can
automatically do an importNode() from the main document
(window.document) on it. By the way, document-s and documentFragment-s
have no parent
(http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1060184317), so this
step is not even necessary for them.

This way, between the postMessage() context and the event.data (used
during onmessage handler) context, I have broken all the references (if
I have forgotten some, tell me, I can propose a solution for them too)
to a document living in a different and asynchronous running context. I
have described a safe means to send a
document/documentFragment/Element/Node from a worker to the main
browsing context. The other direction shouldn't be hard to get either.

I think that providing the DOMImplementation interface is a good way for
implementors to provide a light-weight DOM implementation (because the
DOM API needs for workers are not the same than for documents as we know
them now). I may be wrong.

Note :
With a DOMImplementation available, the document response entity body of
XMLHttpRequest has no reason to be null anymore.

Thanks for your time and your feedback,

David