[whatwg] Video, Closed Captions, and Audio Description Tracks

Dave Singer singer at apple.com
Tue Oct 9 12:28:20 PDT 2007

At 9:22  +0300 9/10/07, Henri Sivonen wrote:
>On Oct 8, 2007, at 22:12, Dave Singer wrote:
>>At 12:22  +0300 8/10/07, Henri Sivonen wrote:
>>>Could someone who knows more about the production of audio 
>>>descriptions, please, comment if audio description can in practice 
>>>be implemented as a supplementary sound track that plays 
>>>concurrently with the main sound track (in that case Speex would 
>>>be appropriate) or whether the main sound must be manually mixed 
>>>differently when description is present?
>>Sometimes;  but sometimes, for example:
>>* background music needs to be reduced
>>* other audio material needs to be 'moved' to make room for audio description
>In that case, an entire alternative soundtrack encoded using a 
>general-purpose codec would be called for. Is it reasonable to 
>expect content providers to take the bandwidth hit? Or should we 
>expect content providers to provide an entire alternative video file?

If the delivery is streaming, or in some other way where the 
selection of tracks can be done prior to transport, then there isn't 
a bandwidth hit at all, of course.  Then the "ask this resource to 
present itself in the captioned fashion" is a reasonable way to do 

Alternatively, as you say, one might prefer a whole separate file 
"select this file if captions are desired".

Our proposal covers both cases, as both have valid uses.

>>>When the problem is frame this way, the language of the text track 
>>>doesn't need to be specified at all. In case #1 it is "same as 
>>>audio". In case #2 it is "same as context site". This makes the 
>>>text track selection mechanism super-simple.
>>Yes, it can often fall through to the "what content did you select 
>>based on language" and then the question of either selecting or 
>>styling content for accessibility can follow the language.
>I don't understand that comment. My point was that the two most 
>obvious cases don't require a language preference-based selection 
>mechanism at all.

I am trying clumsily to agree with you. Content selection based on 
language, and then choice of any assistive needs (e.g. captions) can 
be orthogonal.

>>>Personally, I'd be fine with a format with these features:
>>>  * Metadata flag that tells if the text track is captioning for 
>>>the deaf or translation subtitles.
>>I don't think we can or should 'climb inside' the content formats, 
>>merely have a standard way to ask them to do things (e.g. turn on 
>I agree. However, in order for the HTML 5 spec to be able to 
>reasonably and pragmatically tell browsers to ask the video 
>subsystem to perform tasks like "turn on captions", we need to check 
>that the obviously foreseeable format families (Ogg in the case of 
>Mozilla and, apparently, Opera and MPEG-4 in the case of Apple) are 
>able to cater for such tasks. Moreover, browsers and content 
>providers need to have a shared understanding of how to do this 

Sure, agreed.  As this matures, we (Apple) will be looking at what it 
takes for the movie file format, and I'll raise the same questions 
about MP4 and 3GP.

David Singer

More information about the whatwg mailing list