[whatwg] <canvas> and high-density displays

Mon Sep 24 16:40:28 PDT 2012

On Thu, 1 Dec 2011, John Knottenbelt wrote:
>
> How should the data url returned by toDataURL be sized in the case of a 
> high device dpi resolution system?

This is now specced, at least in theory. Please let me know if you still 
think this is underdefined.

> The test 
> http://philip.html5.org/tests/canvas/suite/tests/toDataURL.png.primarycolours.html 
> makes a drawing with canvas, saves this drawing to a data url, loads the 
> data url into an Image element and then draws that back to the canvas, 
> and then performs some assertions that the image is as expected.
> 
> I've been trying this out in the DumpRenderTree test runner of WebKit, 
> where toDataURL returns an image derived from the canvas' backing store 
> image data. If I set the device dpi scale to 2.0 to imitate a high DPI 
> display, the test will fail because the image returned by toDataURL is 
> now four times as big as the test expects it to be.

Yeah, this should be changed to use toDataURLHD.

> Is this correct behaviour, or is the test correct and I simply have a 
> bug in WebKit?

You had a bug in WebKit according to the old spec, but since none of the 
browsers correctly implemented the resolution stuff, I've changed the way 
to the spec does it and now it's the test that's got the bug. :-)

On Mon, 16 Apr 2012, Darin Fisher wrote:
> 
> Aren't we missing an opportunity here?  By giving web developers this 
> easy migration path, you're also giving up the opportunity to encourage 
> them to use a better API.  Asynchronous APIs are harder to use, and 
> that's why we need to encourage their adoption.  If you just give people 
> a synchronous version that accomplishes the same thing, then they will 
> just use that, even if doing so causes their app to perform poorly.
> 
> See synchronous XMLHttpRequest.  I'm sure every browser vendor wishes 
> that didn't exist.  Note how we recently withdrew support for 
> synchronous ArrayBuffer access on XHR?  We did this precisely to 
> discourage use of synchronous mode XHR. Doing so actually broke some 
> existing web pages.  The pain was deemed worth it.
> 
> GPU readback of a HD buffer is going to suck.  Any use of this new API 
> is going to suck.

On Mon, 16 Apr 2012, Oliver Hunt wrote:
> 
> Any use of imagedata i've seen assumes that they can avoid intermediate 
> states in the canvas ever being visible, if you make reading and writing 
> the data asynchronous you break that invariant and suddenly makes things 
> much harder for the user.
> 
> The reason we don't want IO synchronous is because IO can take a 
> potentially unbound amount of time, if you're on a platform that makes a 
> memcpy take similarly unbound time, i recommend that you work around it.
> 
> Anyway, the sensible approach to imagedata + hardware backed canvas is 
> to revert to a software backed canvas, as once someone has used 
> imagedata once, they're likely to do it again (and again, and again) so 
> it is probably a win to just do everything in software at that point.  
> Presumably you could through in heuristics to determine whether or not 
> it's worth going back to the GPU at some point, but many of the common 
> image data use cases will have awful perf if you try to keep them on the 
> GPU 100% of the time.

On Mon, 16 Apr 2012, Darin Fisher wrote:
> 
> Of course, GPU readbacks do not compare to network IO.  However, if the 
> goal is to achieve smooth animations, then it is important that the main 
> thread not hitch for multiple animation frames.  GPU readbacks are 
> irregular in duration and can sometimes be quite expensive if the GPU 
> pipeline is heavily burdened.
> 
> I don't think it is OK if at application startup (or animation startup) 
> there is a big UI glitch as the system determines that it should not 
> GPU-back a canvas.  We have the opportunity now to design an API that 
> does not have that bug.
> 
> Why don't you want to take advantage of this opportunity?

On Mon, 16 Apr 2012, Oliver Hunt wrote:
> 
> We can already do imagedata based access on a gpu backed canvas in 
> webkit without ill effects simply by pulling the canvas off GPU memory.  
> I don't understand why adding a runloop cycle to any read seems like 
> something that would introduce a much more noticable delay than a 
> memcopy.  I also don't understand what makes reading from the GPU so 
> expensive that adding a runloop cycle is necessary for good perf, but 
> it's unnecessary for a write.  This feels like an argument along the 
> lines of "we hate synchronous APIs, but they make sense for graphics.  
> Let's try and make at least part of this asynchronous to satisfy that 
> particular desire."
> 
> Moving data to and from the GPU may be expensive, but i doubt it holds a 
> candle to the cost of waiting for a full runloop cycle, unless you're 
> doing something really inefficient in your backing store management.  
> The fact is that the ImageData is a pixel manipulation API, and any such 
> API is not conducive to good performance on the GPU.

On Mon, 16 Apr 2012, Glenn Maynard wrote:
> 
> The use case is deferred rendering.  Canvas drawing calls don't need to 
> complete synchronously (before the drawing call returns); they can be 
> queued, so API calls return immediately and the actual draws can happen 
> in a thread or on the GPU.  This is exactly like OpenGL's pipelining 
> model (and might well be implemented using it, on some platforms).
> 
> The problem is that if you have a bunch of that work pipelined, and you 
> perform a synchronous readback, you have to flush the queue.  In OpenGL 
> terms, you have to call glFinish().  That might take long enough to 
> cause a visible UI hitch.  By making the readback asynchronous, you can 
> defer the actual operation until the operations before it have been 
> completed, so you avoid any such blocking in the UI thread.

On Mon, 16 Apr 2012, Oliver Hunt wrote:
>
> Could someone construct a demonstration of where the read back of the 
> imagedata takes longer than a runloop cycle?
> 
> You're asking for significant additional complexity for content authors, 
> with a regression in general case performance, it would be good to see 
> if it's possible to create an example, even if it's not something any 
> sensible author would do, where their is a performance improvement.
> 
> Remember, the application is only marginally better when it's not 
> painting due to waiting for a runloop cycle than it is when blocked 
> waiting on a graphics flush.
> 
> Also, if the argument is wrt deferred rendering rather than GPU 
> copyback, can we drop GPU related arguments from this thread?

On Mon, 16 Apr 2012, Darin Fisher wrote:
>
> Here's an example.
> 
> Take http://ie.microsoft.com/testdrive/Performance/FishIETank/, and apply
> the following diff [...]
> 
> Running on a Mac Pro, with Chrome 19 (WebKit @r111385), with 1000 fish, 
> I get 60 FPS.  Setting read_back to true (using dev tools), drops it 
> down to 30 FPS.
> 
> Using about:tracing (a tool built into Chrome), I can see that the read 
> pixels call is taking ~15 milliseconds to complete.  The implied GL 
> flush takes ~11 milliseconds.
> 
> The page was sized to 1400 x 1000 pixels.

On Mon, 16 Apr 2012, Oliver Hunt wrote:
> 
> How does that compare to going through the runloop -- how long does it 
> take to get from that point to a timeout being called if you do var 
> start = new Date; setTimeout(function() {console.log(new Date - 
> start);}, 0); ?
> 
> This also ignores that possibility that in requesting the data, i 
> probably also want to do some processing on the data, so for the sake of 
> simplicity how long does it take to subsequently iterate through every 
> pixel and set it to 0?
> 
> Remember the goal of making this asynchronous is to improve performance, 
> so the 11ms of drawing does have to occur at some point, you're just 
> hoping that by making things asynchronous you can mask that.  But I 
> doubt you would see an actual improvement in wall clock performance.
> 
> I also realised something else that I had not previously considered -- 
> if you're doing bitblit based sprite movement the complexity goes way up 
> if this is asynchronous.

On Mon, 16 Apr 2012, Darin Fisher wrote:
> 
> [run loop latency]
> The answer is ~0 milliseconds.  I know this because without the 
> getImageData call, the frame rate is 60 FPS.  The page calls the draw() 
> function from an interval timer that has a period of 16.7 milliseconds. 
> The trace indicates that nearly all of that budget is used up prior to 
> the getImageData() call that I inserted.
> 
> [iterating through the bitmap]
> That adds about 44 milliseconds.  I would hope that developers would 
> either perform this work in chunks or pass ImageData.data off to a web 
> worker for processing.

On Tue, 17 Apr 2012, Darin Fisher wrote:
> 
> In Chrome at least, getImageData() doesn't actually block to fetch 
> pixels. The thread is only blocked when the first dereference of the 
> pixel buffer occurs.  I believe this is done so that a getImageData() 
> followed by putImageData() call will not need to block the calling 
> thread.
> 
> The above suggests that making getImageData() asynchronous would not 
> actually provide any benefit for cases where the page does not 
> dereference the pixel buffer.  Another use case where this comes up is 
> passing the ImageData to a web worker.  If the web worker is the first 
> to dereference the ImageData, then only the web worker thread should 
> block.
> 
> I think this becomes an argument for keeping getImageData() as is.  It 
> assumes that ImageData is just a handle, and we could find another way 
> to discourage dereferencing the pixel buffer on the UI thread.

Based on the above, I haven't changed anything in the spec.

On Tue, 17 Apr 2012, Glenn Maynard wrote:
> 
> This isn't good enough.  It gives no way for developers to ensure that 
> they don't access the image data until doing so won't cause a 
> synchronous flush.

On Thu, 19 Apr 2012, Maciej Stachowiak wrote:
> 
> You could also address this by adding a way to be notified when the 
> contents of an ImageData are available without blocking. That would work 
> with both vanilla getImageData and the proposed getImageDataHD. It would 
> also give the author the alternative of just blocking (e.g. if they know 
> the buffer is small) or of sending the data off to a worker for 
> processing.

On Fri, 20 Apr 2012, Glenn Maynard wrote:
> 
> This would result in people writing poor code, based on incorrect 
> assumptions.  It doesn't matter how big the buffer is; all that matters 
> is how long the drawing calls before the getImageData take.  For 
> example, if multiple canvases are being drawn to (eg. on other pages 
> running in the same thread), they may share a single drawing queue.
> 
> Any time you retrieve image data synchronously, and it happens to 
> require a draw flush, you freeze the UI for all pages sharing that 
> thread.  Why is that okay for people to do?  We should know better by 
> now than to expose APIs that encourage people to block the UI thread, 
> after spending so much time trying to fix that mistake in early APIs.
> 
> (This should expose a synchronous API in workers if and when Canvas 
> makes it there, of course, just like all other APIs.)

On Sun, 22 Apr 2012, Maciej Stachowiak wrote:
> 
> All JavaScript that runs on the main thread has the potential to "freeze 
> the UI for all pages sharing that thread". One can imagine models that 
> avoid this by design - for example, running all JavaScript on one or 
> more threads separate from the UI thread. But from where we are today, 
> it's not practical to apply such a solution. It's also not practical to 
> make every API asynchronous - it's just too hard to code that way.
> 
> In light of this, we need some sort of rule for what types of APIs 
> should only be offered in asynchronous form on the main thread. Among 
> the major browser vendors, there seems to be a consensus that this 
> should at least include APIs that do any network or disk I/O. Network 
> and disk are slow enough and unpredictable enough that an author could 
> never correctly judge that it's safe to do synchronous I/O.
> 
> Some feel that a call that reads from the GPU may also be in this 
> category of "intrinsically too slow/unpredictable". However, we are 
> talking about operations with a much lower upper bound on their 
> execution time. We're also talking about an operation that has existed 
> in its synchronous form (getImageData) for several years, and we don't 
> have evidence of the types of severe problems that, for instance, 
> synchronous XHR has been known to cause. Indeed, the amount of trouble 
> caused is low enough that no one has yet proposed or implemented an 
> async version of this API.
> 
> If adding an async version has not been an emergency so far, then I 
> don't think it is critical enough to block adding scaled backing store 
> support. Nor am I convinced that we need to deprecate or phase out the 
> synchronous version. Perhaps future evidence will change the picture, 
> but that's how it looks to me so far.

On Mon, 23 Apr 2012, Darin Fisher wrote:
> 
> The point is not about whether the jank introduced by GPU readbacks is 
> emergency level.  The point is that it can be costly, and it can 
> interfere greatly with having an interactive main thread.  If you assume 
> a goal of 60 FPS, then smallish jank can be killer.  It is common for 
> new GL programmers to call glGetError too often for example, and that 
> can kill the performance of the app.  Of course this is no where near as 
> bad as synchronous XHR.  It doesn't have to be at that level to be a 
> problem.  I think it is fair to focus on 60 FPS as a goal in other 
> words.
> 
> That said, I've come around to being OK with getImageDataHD.  As I wrote 
> recently, this is because it is possible to implement that in a 
> non-blocking fashion.  It can just queue up a readback.  It only becomes 
> necessary to block the calling thread when a pixel is dereferenced.  
> This affords developers with an opportunity to instead pass the 
> ImageData off to a web worker before dereferencing.  Hence, the main 
> thread should not jank up.  This of course requires developers to be 
> very smart about what they are doing, and for browsers to be smart too.
> 
> I'm still sad that getImageData{HD} makes it easy for bad code in one 
> web page to screw over other web pages.  The argument that this is easy 
> to do anyways with long running script is a cop out.  We should guide 
> developers to do the right thing in this cooperatively multi-tasking 
> system.

On Mon, 23 Apr 2012, Glenn Maynard wrote:
> 
> It's not reasonable to expect people to fire up a worker and transfer 
> the buffer to the worker to prevent the blocking from happening in the 
> main thread.  That's a particularly hackish workaround, not a 
> replacement for an async API.

It seems pretty reasonable to me, to be honest.

We could add an event that fires on ImageData (or even ArrayBuffer) that 
fires when the data is available. If we add it to ArrayBuffer it's 
something that could be used in other contexts, too.

Is this something that people think we should do? If so, should we add it 
to TypedArray generically?

On Tue, 17 Apr 2012, Oliver Hunt wrote:
> 
> A long time ago I and Dmitry tried to get canvas to be available on a 
> worker thread, and then through some bizarre set of events that desire 
> morphed into the image scaling API, which was then discarded due to 
> being too weird.
> 
> It does occur to me though that it could be interesting to allow a 
> canvas context to be transferred to a worker.  Think about this for a 
> moment:  It would allow arbitrarily expensive rendering to occur in the 
> worker, and then you just need to have some flush style API that would 
> allow the worker to indicate that the content of the canvas was ready to 
> render -- essentially this would be a join() on the UI thread, but the 
> rendering would never blocked the UI.
> 
> Alas when I think about it, i think it may require double buffering the 
> canvas, but it could provide a substantial performance boost, with 
> minimal developer-side complexity.

I haven't yet added canvas to workers, but it is something that is on the 
list of things to add. I have been waiting for implementations of workers 
(including shared workers and message ports) and canvas (including the new 
path APIs) to mature before adding the combination.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'