[whatwg] Proposal: HTMLCanvasElement.printCallback API

Wed Sep 26 10:22:11 PDT 2012

Hello WHATWG members,

This email is about proposing a new attribute "printCallback" on the HTMLCanvasElement (in the following called "Canvas"). This new API allows to:

* define the content of a canvas element during the printing progress
* send the canvas' content without rasterization to the printer

The basic API was implemented in [1] and is available in Firefox Nightly 18.

# Motivation And Use-Case

The motivation for designing and implementing the API was to add proper printing support for the PDF.JS project. The PDF.JS project is an implementation of a PDF viewer using only web technologies. Without this API it is not possible to:

* render only the pages needed for printing. A webpage is printed with the content visible at the moment the print action is started. For PDF.JS this means that all pages are required to be rendered before printing. Rendering all the pages takes quite some time for complex and huge documents (> 100 pages). But the user might only want to print the first page. That means, the user waits for unnecessary computation to finish.
* print the content of a canvas element without rasterization artifacts on the printout. One could increase the size of the canvas such that the rasterization doesn't becomes visible, but this is not possible due to the large usage of memory going with this. Using a different way to render the pages than using canvas (e.g. SVG) is not possible, due to memory and performance issues. 

Although not directly relevant to PDF.JS - it's also not possible to

* define the content of a printed page that looks exactly the same cross all user agents. There are small variations, that cause breaks and styles to look slightly different between user agents. Using CSS it's possible to make one canvas element take up one physical page and then precisely layout content on the canvas.

(I will later describe briefly how the API was used to solve these issues.)

# Actual API

The actual API looks like this:

The "printCallback" attribute on a Canvas takes a callback. This callback is invoked when the canvas with a printCallback is printed. A "printState" object is passed as argument to the callback. The printState object has a "context" property, which points to a CanvasRenderingContext2D object. Against this context all known operations of the CanvasRenderingContext2D are executable. However, the CanvasRenderingContext2D doesn't rasterize the operations but instead forward them directly to the printer - instead of drawing to a pixel surface it's more like drawing to a vector surface. The result of the operations show up on the canvas when printed, but are not visible on the screen. The "printState.done()" function  must be called once all drawing operations for the canvas are done and the printing should progress. This was added to allow the printCallback to perform asynchronous tasks.

A simple example of the API looks like this:

```
var canvas = document.getElementById('canvas');
var ctx = canvas.getContext('2d');
ctx.fillText('Hi there.', 50, 50);

canvas.printCallback = function(printState) {
  var printCtx = printState.context;
  printCtx.fillText('I\'m only visible when printed.', 50, 50);
  printState.done();
};
```

You can try this example out in [2] using Firefox Nighlty and see the results as a PDF in [3]. Notice that you can select the text in the PDF linked in [3]. (Note: in the linked example [2], the callback is called "mozPrintCallback" as the API is currently prefixed in Gecko. The canvas output is rasterized on Windows and Linux due to a bug in Gecko at the moment.)

Some more details on the behavior of the API:

* the printCallback function is only invoked on the canvases that will be visible in the print output.
* there is only one printCallback called at the time. After the "printState.done()" function is called, the next printCallback function  gets invoked assuming there is another canvas that gets printed and has a printCallback specified.
* the order the printCallbacks of the canvases are called follows the output order of the canvases in the printout.
* the resolution on the printContext is the same as when drawing to the canvas on the page. E.g. if a canvas has the attribute "width" set to "100" and by using CSS the canvas takes 10 cm in width on the printout, then 1 unit on the context corresponds to 0.1 cm.
* the putImageData and getImageData functions on the CanvasRenderingContext2D use the same pixel resolution (width/height) as the canvas on the page (this results in the data of the getImageData function to be rasterized).
* the "canvas" property on the printContext points to the canvas on the page and not the canvas element that is printed. Otherwise it's possible to change the layout of the printing while printing. As the canvas on the page might not be available anymore (e.g. the canvas was removed and garbage collected from the document before the printCallback gets invoked), the "canvas" property might be "undefined" or "null".
* the window.onafterprint event is called either
  1. after all printCallbacks are done and the page is ready for printing
  2. the printing got aborted using the print dialog.
* the printContext holds no content when passed to the callback and takes the default values (for transformation matrix, styles etc.) of a CanvasRenderingContext2D
* the printContext is a CanvasRenderingContext2D but instead of using a pixel map to store the drawing operations result, the operations are forwarded to the printer without rasterization.
* the API does not change the layout of the canvas element on the page.

# Open Discussion:

* There is no way to abort in case something goes wrong. E.g. printing to the canvas might require a successful network request, but the request failed. 
* The printState.done() function gets eventually never called and therefore printing the document might never finish.

# How The PrintCallback-API Solved The Problems For PDF.JS

* using CSS all content except a single div is hidden during printing
* when the beforePrint event is fired it is checked if the webpage is setup for printing or not. If it is not, the webpage is setup and the print action is canceled. Otherwise, the printing is not prevented and happens as regular
* the "setup webpage for printing" consists of the following steps
1. for each page of the PDF document, a canvas element is created and insert to the div that is visible during printing
2. using CSS, the canvas inside the print-visible div take up an entire page in the later printout
3. for each canvas the `mozPrintCallback` is set. If the callback function is called, the pdf page corresponding to the canvas is loaded and drawn on the canvas. Once finished, the `printState.done()` function is called
* after the user set the print settings in the print dialog, the webpage is printed
* for the pages that are required for the printout, the mozPrintCallback is called on the canvas for these pages
* after the printing finished (detected by listening to the afterPrint event), the created canvases are removed again to save memory. 

Best regards and looking for feedback on this,

Julian

---

[1]: Mozilla Bug 745025 - Implement CanvasElement.mozPrintCallback: https://bugzilla.mozilla.org/show_bug.cgi?id=745025
[2]: http://jsfiddle.net/FPNMM/2/embedded/js%2Cresult/
[3]: http://n.ethz.ch/~jviereck/drop/mozPrintCallback_output.pdf