[whatwg] How to handle multitrack media resources in HTML

Mon Apr 11 08:19:23 PDT 2011

On Apr 8, 2011, at 8:54 AM, Ian Hickson wrote:

>> *) Discoverability is indeed an issue, but this can be fixed by defining 
>> a common track API for signalling and enabling/disabling tracks:
>> 
>> {{{
>> interface Track {
>>  readonly attribute DOMString kind;
>>  readonly attribute DOMString label;
>>  readonly attribute DOMString language;
>> 
>>  const unsigned short OFF = 0;
>>  const unsigned short HIDDEN = 1;
>>  const unsigned short SHOWING = 2;
>>  attribute unsigned short mode;
>> };
>> 
>> interface HTMLMediaElement : HTMLElement {
>>  [...]
>>  readonly attribute Track[] tracks;
>> };
>> }}}
> 
> There's a big difference between text tracks, audio tracks, and video 
> tracks. While it makes sense, for instance, to have text tracks enabled 
> but not showing, it makes no sense to do that with audio tracks. 

Audio and video tracks require more data, hence it's less preferred to allow them being  enabled but not showing. If data wasn't an issue, it would be great if this were possible; it'd allow instant switching between multiple audio dubs, or camera angles. 

In terms of the data model, I don't believe there's major differences between audio, text or video tracks. They all exist at the same level - one down from the main presentation layer. Toggling versus layering can be an option for all three kinds of tracks.

For example, multiple video tracks can be mixed together in one media element's display. Think about PiP, perspective side by side  (Stevenote style) or a 3D grid (group chat, like Skype). Perhaps this should be supported instead of relying upon multiple video elements, manual positioning and APIs to knit things together. One would loose in terms of flexibility, but gain in terms of API complexity (it's still one "video") and ease of implementation for HTML developers.

- Jeroen