[html5] JSON parsing in Web Worker

Igor Minar iiminar at gmail.com
Mon Jan 3 11:46:33 PST 2011


So I built a small app that demonstrates:
- parsing JSON payload in the main thread
- fetching JSON payload in the main thread, but asking a web worker to
parse it and return the object
- fetching JSON payload from webworker, parsing it and returning it to
the main thread

I'm working with a 1MB payload, so it might take a while for the tests
to run due to the "large" download. Some browsers don't seem to adhere
to the caching headers, so they keep on redownloading the payload over
and over. The tests run much faster if ran locally.

I noticed that:
- parsing json with webworkers does take considerably longer (2x or
more) than sync parsing
- each browser has different perf characteristics (chrome takes longer
to do the message passing, ff is awfully slow in making xhr from
worker)
- despite the fact that it takes overall longer to parse stuff in
webworker, the main ui thread gets blocked for just about the half of
the time compared to sync parsing.

Site: http://igorminar.github.com/webworker-json-perf/index.html
Source: https://github.com/IgorMinar/webworker-json-perf

cheers,
Igor

On Wed, Dec 29, 2010 at 11:16 PM, Igor Minar <iiminar at gmail.com> wrote:
> I modified the code to deal with most/all of the issues below.
>
> I still see significant overhead in all browsers:
>
> chrome 9: +240%
> chrome 10: +180%
> safari 5: 125%
> firefox 4beta: +61%
>
> Here I'm comparing synchronous parsing to async parsing implemented
> via a preinitialized web worker (Test #1 vs Test #5 in my code [1]).
>
> I'm going to play with this a bit more tomorrow, but at the moment it
> seems that json parsing is one of the activities that should not be
> done in a webworker, except in cases when the parsed object doesn't
> need to be passed to the main thread.
>
> /i
>
>
> [1] https://github.com/IgorMinar/angular.js/blob/json-webworker/perf/jsonPerfSpec.js
>
>
> On Tue, Dec 28, 2010 at 8:22 PM, Igor Minar <iiminar at gmail.com> wrote:
>> Ricardo, Drew,
>>
>> My code is here:
>> https://github.com/IgorMinar/angular.js/blob/json-webworker/perf/jsonPerfSpec.js
>>
>> The harness is not perfect, but all should be good enough. The code is
>> of primarily exploratory quality in addition to being work in progress
>> :)
>>
>> The main issues are:
>> -  I should wait for a signal from the worker that it's ready, before
>> I send the first request to it. So if initializing worker takes a long
>> time, I might be partially including the startup time in the duration
>> - I should repeat the test 100 or 100s of times and calculate the
>> average and possibly use bigger payloads or slower computer because
>> the results I'm seeing are in 10s to 100s ms range.
>> - I'm using JS Test Driver as my harness so the text output sometimes
>> looks weird or doesn't make sense. I look at the times printed on the
>> "[LOG] took:" lines
>>
>>
>> Out of all the test, Test #1 and Test #5 are the most interesting.
>>
>> #1 tests synchronous parsing and #5 tests async parsing, when
>> webworker processes payload that originates in the worker context
>> (simulating xhr executed from within worker and worker parsing the
>> response returned before handing it over to the main thread).
>>
>>
>> Currently I'm getting results like these:
>>
>> Total 16 tests (Passed: 16; Fails: 0; Errors: 0) (11586.00 ms)
>>
>>  Safari 533.18.5 Mac OS: Run 8 tests (Passed: 8; Fails: 0; Errors 0)
>> (6447.00 ms)
>>    json.test that it Test #0: native json passed (5878.00 ms)
>>      [LOG] 58.74 ms per iteration
>>    Test #1: Synchronous Json parser.testParsing passed (42.00 ms)
>>      [LOG] took: 19
>>      [LOG] took: 27
>>    Test #2: WebWorker Json parser.test passed (85.00 ms)
>>      [LOG] took: 82
>>    Test #3: Preinitialized WebWorker Json parser.test passed (104.00 ms)
>>      [LOG] took: 101
>>    Test #4: WebWorker Json parser with inlined payload.test passed (110.00 ms)
>>      [LOG] took: 107
>>    Test #5: Preinitialized WebWorker Json parser with inlined
>> payload.test passed (95.00 ms)
>>      [LOG] took: 92
>>    Test #6: WebWorker Json parser with inlined payload without return
>> value.test passed (69.00 ms)
>>      [LOG] took: 66
>>    Test #7: Preinitialized WebWorker Json parser with inlined payload
>> without return value.test passed (64.00 ms)
>>      [LOG] took: 61
>>
>>  Chrome 9.0.572.0 Mac OS: Run 8 tests (Passed: 8; Fails: 0; Errors 0)
>> (11586.00 ms)
>>    json.test that it Test #0: native json passed (9260.00 ms)
>>      [LOG] 92.59 ms per iteration
>>    Test #1: Synchronous Json parser.testParsing passed (198.00 ms)
>>      [LOG] took: 187
>>      [LOG] took: 193
>>    Test #2: WebWorker Json parser.test passed (554.00 ms)
>>      [LOG] took: 551
>>    Test #3: Preinitialized WebWorker Json parser.test passed (297.00 ms)
>>      [LOG] took: 294
>>    Test #4: WebWorker Json parser with inlined payload.test passed (459.00 ms)
>>      [LOG] took: 457
>>    Test #5: Preinitialized WebWorker Json parser with inlined
>> payload.test passed (344.00 ms)
>>      [LOG] took: 341
>>    Test #6: WebWorker Json parser with inlined payload without return
>> value.test passed (232.00 ms)
>>      [LOG] took: 230
>>    Test #7: Preinitialized WebWorker Json parser with inlined payload
>> without return value.test passed (242.00 ms)
>>      [LOG] took: 240
>>
>> cheers,
>> Igor
>>
>>
>>
>> On Tue, Dec 28, 2010 at 5:06 PM, Drew Wilson <atwilson at chromium.org> wrote:
>>> Forgive what's probably a very naive suggestion, but I'm assuming you're
>>> measuring just the parse + messaging time, and not the thread startup time
>>> in your 3x measurement below (i.e. you're doing the measurements on an
>>> already-running worker)?
>>> -atw
>>>
>>> On Tue, Dec 28, 2010 at 11:21 AM, Igor Minar <iiminar at gmail.com> wrote:
>>>>
>>>> Drew,
>>>>
>>>> I tested Safari 5.0.2 (6533.18.5) and while it's one of the faster
>>>> browsers out there, my tests show that parsing 650kb json string takes
>>>> 3x longer when I use webworker than when I parse it in the main
>>>> thread.
>>>>
>>>> Parsing alone, take equivalent amount of time, it's the async
>>>> messaging and mainly transfer of data from the worker that adds 2x
>>>> overhead.
>>>>
>>>> I use JSON.parse to do the parsing, and while this method is snappy,
>>>> with payloads bigger than 500kb, I can make the UI freeze just long
>>>> enough to make it noticeable.
>>>>
>>>> I think what I really want is for JSON.parse to by implemented as
>>>> async and executed in it's own thread. I would then just pass in a
>>>> callback that would handled the parsed object when it's ready. Web
>>>> workers get pretty close to allowing me to do something similar, but
>>>> the messaging overhead is killing all the benefits I'm getting from
>>>> the async parsing in worker thread.
>>>>
>>>> /i
>>>>
>>>>
>>>>
>>>> On Tue, Dec 28, 2010 at 10:51 AM, Drew Wilson <atwilson at chromium.org>
>>>> wrote:
>>>> > Hi Igor,
>>>> > Objects passed via message ports (including the intrinsic port for
>>>> > dedicated
>>>> > workers) are cloned. I can't speak for other implementations, but in
>>>> > WebKit
>>>> > I believe cloned objects aren't JSON encoded/decoded, but instead there
>>>> > is
>>>> > another native mechanism for cloning these objects that will likely be
>>>> > faster than JSON encoding.
>>>> > That said, I'm not sure that "parsing large JSON files" is the best
>>>> > WebWorker use case, depending on how you're doing the parsing and how
>>>> > large
>>>> > the files are.
>>>> > -atw
>>>> >
>>>> > On Tue, Dec 28, 2010 at 10:35 AM, Igor Minar <iiminar at gmail.com> wrote:
>>>> >>
>>>> >> Hello,
>>>> >>
>>>> >> I'm exploring the possibilities of using web workers for parsing large
>>>> >> JSON files outside of the main UI thread.
>>>> >>
>>>> >> I found several references that this could be one of the use cases for
>>>> >> web workers (e.g. oreilly's intro to web workers [1]). However, the
>>>> >> more I read about webworkers, the less attractive they are for this
>>>> >> purpose, mainly because of how data is passed from worker to the main
>>>> >> thread.
>>>> >>
>>>> >> Please correct me if I'm wrong, but my understanding is that any data
>>>> >> that is returned in the message from the worker, is copied rather than
>>>> >> shared and it seems that this is often implemented by serializing the
>>>> >> data into a json string and then deserializing it in the main script.
>>>> >> Is this right? Because if it is, then what's the point of parsing the
>>>> >> json string in worker thread, just to serialize it and then parse it
>>>> >> again in the main thread.
>>>> >>
>>>> >> I'd love to be wrong about this because the concept of workers looks
>>>> >> like a perfect match for my use case (parsing large json payloads
>>>> >> quickly without affecting the UI), but my trivial microbenchmarks show
>>>> >> that the overhead of passing the data to, as well as from the
>>>> >> webworker is just too big to use it for this purpose.
>>>> >>
>>>> >> thanks,
>>>> >> Igor
>>>> >>
>>>> >>
>>>> >> [1]
>>>> >> http://answers.oreilly.com/topic/1358-introducing-the-web-workers-api/
>>>> >> _______________________________________________
>>>> >> Help mailing list
>>>> >> Help at lists.whatwg.org
>>>> >> http://lists.whatwg.org/listinfo.cgi/help-whatwg.org
>>>> >
>>>> >
>>>
>>>
>>
>



More information about the Help mailing list