[whatwg] Video and Audio Tracks API

Lachlan Hunt lachlan.hunt at lachy.id.au
Tue Mar 22 07:20:56 PDT 2011

   This is regarding the recently added audioTracks and videoTracks APIs 
to the HTMLMediaElement.

The design of these APIs seems to be done a little strangely, in that 
dealing with each track is done by passing an index to each method on 
the TrackList interfaces, rather than treating the audioTracks and 
videoTracks as collections of individual audio/video track objects. 
This design is inconsistent with the design of the TextTrack interface, 
and seems sub-optimal.

The use of ExclusiveTrackList for videoTracks also seems rather 
limiting. What about cases where the second video track is a 
sign-language track, or some other video overlay.  This is a use case 
that you seem to be trying to address with the mediaGroup feature, even 
though the example given actually includes all tracks in the same file. 
The example from the spec is:

<video src="movie.vid#track=Video&track=English" autoplay controls 
<video src="movie.vid#track=sign" autoplay mediagroup=movie></video>

Normally, sign language tracks I've seen broadcast on TV programs 
display the sign language interpreter in a small box in the bottom corner.

Other use cases include PiP features, such as director commentary or 
storyboards as available on some Blu-ray and DVDs [1].  So in cases 
where both tracks are included in the same file, having the ability to 
selectively enable multiple video tracks would seems easier to do than 
synchronising separate video files.

There are also the use cases for controlling the volume of individual 
tracks that are not addressed by the current spec design.

I believe the design would work better like this:


interface HTMLMediaElement : HTMLElement {
   readonly attribute AudioTrack[] audioTracks;
   readonly attribute VideoTrack[] videoTracks;

interface MediaTrack {
   readonly attribute DOMString label;
   readonly attribute DOMString language;

            attribute boolean enabled;

Interface AudioTrack : MediaTrack {
            attribute double volume;
            attribute boolean muted;
   // Other potential future APIs include bass, treble, channels, etc.

Interface VideoTrack : MediaTrack {
   // ...


This proposal replaces TrackList.getName(index) with 
MediaTrack[index].label, and .getLanguage(index) with .language, which 
is more consistent with the design of the TextTrack interface.  The 
isEnabled(), and enable() and disable() functions have also been 
replaced with a single mutable boolean .enabled property.

[1] http://en.wikipedia.org/wiki/Picture-in-picture

Lachlan Hunt - Opera Software

More information about the whatwg mailing list