[whatwg] <video> and acceleration

Mon Mar 30 02:26:59 PDT 2009

On Sat, 28 Mar 2009 05:57:35 +0100, Benjamin M. Schwartz  
<bmschwar at fas.harvard.edu> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dear What,
>
> Short: <video> won't work on slow devices.  Help!
>
> Long:
> The <video> tag has great potential to be useful on low-powered computers
> and computing devices, where current internet video streaming solutions
> (such as Adobe's Flash) are too computationally expensive.  My personal
> experience is with OLPC XO-1*, on which Flash (and Gnash) are terribly
> slow for any purpose, but Theora+Vorbis playback is quite smooth at
> reasonable resolutions and bitrates.
>
> The <video> standard allows arbitrary manipulations of the video stream
> within the HTML renderer.  To permit this, the initial implementations
> (such as the one in Firefox 3.5) will perform all video decoding
> operations on the CPU, including the tremendously expensive YUV->RGB
> conversion and scaling.  This is viable only for moderate resolutions and
> extremely fast processors.
>
> Recognizing this, the Firefox developers expect that the decoding process
> will eventually be accelerated.  However, an accelerated implementation  
> of
> the <video> spec inevitably requires a 3D GPU, in order to permit
> transparent video, blended overlays, and arbitrary rotations.
>
> Pure software playback of video looks like a slideshow on the XO, or any
> device with similar CPU power, achieving 1 or 2 fps.  However, these
> devices typically have a 2D graphics chip that provides "video overlay"
> acceleration: 1-bit alpha, YUV->RGB, and simple scaling, all in
> special-purpose hardware.**  Using the overlay (via XVideo on Linux)
> allows smooth, full-speed playback.
>
> THE QUESTION:
> What is the recommended way to handle the <video> tag on such hardware?
>
> There are two obvious solutions:
> 0. Implement the spec, and just let it be really slow.
> 1. Attempt to approximate the correct behavior, given the limitations of
> the hardware.  Make the video appear where it's supposed to appear, and
> use the 1-bit alpha (dithered?) to blend static items over it.  Ignore
> transparency of the video.  Ignore rotations, etc.
> 2. Ignore the HTML context.  Show the video "in manners more suitable to
> the user (e.g. full-screen or in an independent resizable window)".
>
> Which is preferable?  Is it worth specifying a preferred behavior?

In the typical case a simple hardware overlay correctly positioned could  
be used, but there will always be a need for a software fallback when  
rotation, filters, etc are used. Like Robert O'Callahan said, a user agent  
would need to detect when it is safe to use hardware acceleration and use  
it only then.

If there is something that could be changed in the spec to make things a  
bit easier for user agents it might be an overlay attribute, just like SVG  
has:  
http://www.w3.org/TR/SVGTiny12/multimedia.html#compositingBehaviorAttribute

I'm not convinced such an attribute would help, just pointing it out  
here...

-- 
Philip Jägenstedt
Opera Software