[whatwg] How to handle multitrack media resources in HTML

Jeroen Wijering jeroen at longtailvideo.com
Mon Apr 18 01:30:40 PDT 2011

Hey Ian, all,

Sorry for the slow response .. 

>>>> There's a big difference between text tracks, audio tracks, and video 
>>>> tracks. While it makes sense, for instance, to have text tracks 
>>>> enabled but not showing, it makes no sense to do that with audio 
>>>> tracks.
>>> Audio and video tracks require more data, hence it's less preferred to 
>>> allow them being enabled but not showing. If data wasn't an issue, it 
>>> would be great if this were possible; it'd allow instant switching 
>>> between multiple audio dubs, or camera angles.
>> I think we mean different things by "active" here.
>> The "hidden" state for a text track is one where the UA isn't rendering 
>> the track but the UA is still firing all the events and so forth. I don't 
>> understand what the parallel would be for a video or audio track.

The parallel would be fetching / decoding the tracks but not showing them to the display (video) or speakers (audio). I agree that, implementation wise, this is much less useful than having an "active but hidden" state for text tracks. However, some people might want to manipulate hidden tracks with the audio data API, much like hidden text tracks can be manipulated with javascript.

>> Text tracks are discontinuous units of potentially overlapping textual 
>> data with position information and other metadata that can be styled with 
>> CSS and can be mutated from script.
>> Audio and video tracks are continuous streams of immutable media data.
> Video and audio tracks do not necessarily produce continuous output - it is perfectly legal to have "gaps" in either, eg. segments that do not render. Both audio and video tracks can have metadata that affect their rendering: an audio track has a volume metadata that attenuates its contribution to the overall mix-down, and a video track has matrix that controls its rendering. The only thing preventing us from styling a video track with CSS is the lack of definition.

Yes, and the same (lack of definition) goes for javascript manipulation. It'd be great if we had the tools for manipulating video and audio tracks (extract/insert frames, move audio snippets around). It would make A/V editing - or more creative uses - really easy in HTML5.

Kind regards,


More information about the whatwg mailing list