[whatwg] How to handle multitrack media resources in HTML

Sun Apr 10 12:36:22 PDT 2011

On Apr 7, 2011, at 11:54 PM, Ian Hickson wrote:

> 
> On Thu, 10 Feb 2011, Silvia Pfeiffer wrote:
>> 
>> One particular issue that hasn't had much discussion here yet is the
>> issue of how to deal with multitrack media resources or media resources
>> that have associated synchronized audio and video resources. I'm
>> concretely referring to such things as audio descriptions, sign language
>> video, and dubbed audio tracks.
>> 
>> We require an API that can expose such extra tracks to the user and to
>> JavaScript. This should be independent of whether the tracks are
>> actually inside the media resource or are given as separate resources,
> 
> I think there's a big difference between multiple tracks inside one
> resource and multiple tracks spread amongst multiple resources: in the
> former case, one would need a single set of network state APIs (load
> algorithm, ready state, network state, dimensions, buffering state, etc),
> whereas in the second case we'd need N set of these APIs, one for each
> media resource.
> 
> Given that the current mechanism for exposing the load state of a media
> resource is a media element (<video>, <audio>), I think it makes sense to
> reuse these elements for loading each media resource even in a multitrack
> scenario. Thus I do not necessarily agree that exposing extra tracks
> should be done in a way that as independent of whether the tracks are
> in-band or out-of-band.
> 

In the case of in-band tracks it may still be the case that they are retrieved independently over the network. This could happen two ways:
- some file formats contain headers which enable precise navigation of the file, for example using HTTP byte ranges, so that the tracks could be retrieved independently. mp4 files would be an example. I don't know that anyone does this, though.
- in the case of adaptive streaming based on a manifest, the different tracks may be in different files, even though they appear as in-band tracks from an HTML perspective.

In these cases it *might* make sense to expose separate buffer and network states for the different in-band tracks in just the same way as out-of-band tracks. In fact the distinction between in-band and out-of-band tracks is mainly how you discover them: out-of-band the author is assumed to know about by some means of their own, in-band can be discovered by loading the metadata part of a single initial resource.

...Mark