[whatwg] Video, Closed Captions, and Audio Description Tracks
Dave Singer
singer at apple.com
Mon Oct 8 12:12:33 PDT 2007
At 12:22 +0300 8/10/07, Henri Sivonen wrote:
>
>Is 3GPP Timed Text aka. MPEG-4 part 17 unencumbered? (IANAL, this
>isn't an endorsement of the format--just a question.)
I am not authoritative, but I have not seen any disclosures myself.
>>an alternate audio track (e.g. speex as suggested by you for
>>accessibility to blind people),
>
>My understanding is that at least conceptually an audio description
>track is *supplementary* to the normal sound track. Could someone
>who knows more about the production of audio descriptions, please,
>comment if audio description can in practice be implemented as a
>supplementary sound track that plays concurrently with the main
>sound track (in that case Speex would be appropriate) or whether the
>main sound must be manually mixed differently when description is
>present?
Sometimes; but sometimes, for example:
* background music needs to be reduced
* other audio material needs to be 'moved' to make room for audio description
>
>>and several caption tracks (for different languages),
>
>I think it needs emphasizing that captioning (for the deaf) and
>translation subtitling (for people who can hear but who can't follow
>the language) are distinctly differently in terms of the metadata
>flagging needs and the playback defaults. Moreover, although
>translations for multiple languages are nice to have, they
>complicate UI and metadata considerably and packaging multiple
>translations in one file is outside the scope of HTML5 as far as the
>current Design Principles draft (from the W3C side) goes.
>
>I think we should first focus on two kinds on qualitatively
>different timed text (differing in metadata and playback defaults):
> 1) Captions for the deaf:
> * Written in the same language as the speech content of the video is spoken.
> * May have speaker identification text.
> * May indicate other relevant sounds textually.
> * Don't indicate text that can be seen in the video frame.
> * Not rendered by default.
> * Enabled by a browser-wide "I am deaf or my device doesn't do
>sound out" pref.
> 2) Subtitles for the people who can't follow foreign-language speech:
> * Written in the language of the site that embeds video when
>there's speech in another language.
> * Don't identify the speaker.
> * Don't identify sounds.
> * Translate relevant text visible in the video frame.
> * Rendered by default.
> * As a bonus suppressible via the context menu or something on a
>case-by-case basis.
>
>When the problem is frame this way, the language of the text track
>doesn't need to be specified at all. In case #1 it is "same as
>audio". In case #2 it is "same as context site". This makes the text
>track selection mechanism super-simple.
Yes, it can often fall through to the "what content did you select
based on language" and then the question of either selecting or
styling content for accessibility can follow the language.
>
>Personally, I'd be fine with a format with these features:
> * Metadata flag that tells if the text track is captioning for the
>deaf or translation subtitles.
I don't think we can or should 'climb inside' the content formats,
merely have a standard way to ask them to do things (e.g. turn on
captions).
> * Sequence of plain-text Unicode strings (incl. forced line breaks
>and bidi marks) with the following data:
> - Time code when the string appears.
> - Time code when the string disappears.
> - Flag for positioning the string at the top of the frame instead
>of bottom.
> * A way to do italics (or other emphasis for scripts for which
>italics is not applicable), but I think this feature isn't essential.
> * A guideline for estimating the amount of text appropriate to be
>shown at one time and a matching rendering guideline for UAs. (This
>guideline should result in an amount of text that agrees with
>current TV best practices.)
This should all be out of scope, IMHO; this is about the design of a
captioning system, which I don't think we should try to do.
>
>It would be up to the UA to render the text at the bottom of the
>video frame in white sans-serif with black outline.
Or wherever it's supposed to go.
>
>I think it would be inappropriate to put hyperlinks in captioning
>for the deaf because it would venture outside the space of
>accessibility and effectively hide some links for the non-deaf
>audience.
Yes, generally true!
--
David Singer
Apple/QuickTime
More information about the whatwg
mailing list