[whatwg] Video, Closed Captions, and Audio Description Tracks

Dave Singer singer at apple.com
Mon Oct 8 12:12:33 PDT 2007

At 12:22  +0300 8/10/07, Henri Sivonen wrote:
>Is 3GPP Timed Text aka. MPEG-4 part 17 unencumbered? (IANAL, this 
>isn't an endorsement of the format--just a question.)

I am not authoritative, but I have not seen any disclosures myself.

>>an alternate audio track (e.g. speex as suggested by you for 
>>accessibility to blind people),
>My understanding is that at least conceptually an audio description 
>track is *supplementary* to the normal sound track. Could someone 
>who knows more about the production of audio descriptions, please, 
>comment if audio description can in practice be implemented as a 
>supplementary sound track that plays concurrently with the main 
>sound track (in that case Speex would be appropriate) or whether the 
>main sound must be manually mixed differently when description is 

Sometimes;  but sometimes, for example:
* background music needs to be reduced
* other audio material needs to be 'moved' to make room for audio description

>>and several caption tracks (for different languages),
>I think it needs emphasizing that captioning (for the deaf) and 
>translation subtitling (for people who can hear but who can't follow 
>the language) are distinctly differently in terms of the metadata 
>flagging needs and the playback defaults. Moreover, although 
>translations for multiple languages are nice to have, they 
>complicate UI and metadata considerably and packaging multiple 
>translations in one file is outside the scope of HTML5 as far as the 
>current Design Principles draft (from the W3C side) goes.
>I think we should first focus on two kinds on qualitatively 
>different timed text (differing in metadata and playback defaults):
>  1) Captions for the deaf:
>   * Written in the same language as the speech content of the video is spoken.
>   * May have speaker identification text.
>   * May indicate other relevant sounds textually.
>   * Don't indicate text that can be seen in the video frame.
>   * Not rendered by default.
>   * Enabled by a browser-wide "I am deaf or my device doesn't do 
>sound out" pref.
>  2) Subtitles for the people who can't follow foreign-language speech:
>   * Written in the language of the site that embeds video when 
>there's speech in another language.
>   * Don't identify the speaker.
>   * Don't identify sounds.
>   * Translate relevant text visible in the video frame.
>   * Rendered by default.
>   * As a bonus suppressible via the context menu or something on a 
>case-by-case basis.
>When the problem is frame this way, the language of the text track 
>doesn't need to be specified at all. In case #1 it is "same as 
>audio". In case #2 it is "same as context site". This makes the text 
>track selection mechanism super-simple.

Yes, it can often fall through to the "what content did you select 
based on language" and then the question of either selecting or 
styling content for accessibility can follow the language.

>Personally, I'd be fine with a format with these features:
>  * Metadata flag that tells if the text track is captioning for the 
>deaf or translation subtitles.

I don't think we can or should 'climb inside' the content formats, 
merely have a standard way to ask them to do things (e.g. turn on 

>  * Sequence of plain-text Unicode strings (incl. forced line breaks 
>and bidi marks) with the following data:
>    - Time code when the string appears.
>    - Time code when the string disappears.
>    - Flag for positioning the string at the top of the frame instead 
>of bottom.
>  * A way to do italics (or other emphasis for scripts for which 
>italics is not applicable), but I think this feature isn't essential.
>  * A guideline for estimating the amount of text appropriate to be 
>shown at one time and a matching rendering guideline for UAs. (This 
>guideline should result in an amount of text that agrees with 
>current TV best practices.)

This should all be out of scope, IMHO;  this is about the design of a 
captioning system, which I don't think we should try to do.

>It would be up to the UA to render the text at the bottom of the 
>video frame in white sans-serif with black outline.

Or wherever it's supposed to go.

>I think it would be inappropriate to put hyperlinks in captioning 
>for the deaf because it would venture outside the space of 
>accessibility and effectively hide some links for the non-deaf 

Yes, generally true!

David Singer

More information about the whatwg mailing list