[whatwg] Video, Closed Captions, and Audio Description Tracks

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Sun Oct 7 17:14:05 PDT 2007

Hi Chris,

this is a very good discussion to have and I would be curious about
the opinions of people.

CMML has been developed with an aim to provide "html"-type timed text
annotations for audio/video - in particular hyperlinks and annotations
to temporal sections of videos. This is both, more generic than
captions, and less generic in that captions have formatting and are
displayed in a particular way.

One option is to extend CMML to provide the caption functionality
inside CMML. This would not be difficult and in fact, the current
"desc" tag is already being used for such functionality in xine. It is
however suboptimal since it mixes aims. A better way would be to
invent a "caption" tag for CMML which would have some formatting
functionality (colours, alignment etc. - the things that the EBU
subtitling standard http://www.limeboy.com/support.php?kbID=12 is

Another option would be to disregard CMML completely and invent a new
timed text logical bitstream for Ogg which would just have the
subtitles. This could use any existing time text format and would just
require a bitstream mapping for Ogg, which should not be hard to do at

Now for Ogg Skeleton: Ogg Skeleton will indeed have a part to play in
this, however not directly for specification of the timed text
annotations. Ogg Skeleton is a track that describes what is inside the
Ogg file. So, assuming we would have a multitrack video file with a
video track, an audio track, an alternate audio track (e.g. speex as
suggested by you for accessibility to blind people), a CMML track (for
hyperlinking into and out of the video), and several caption tracks
(for different languages), then Ogg Skeleton would explain exactly
that these exist without the need for a program to decode the Ogg file

I think we need to understand exactly what we expect from the caption
tracks before being able to suggest an optimal solution. If e.g. we
want caption tracks with hyperlinks on a temporal basis and some more
metadata around that which is machine readable, then an extension of
CMML would make the most sense.


On 10/8/07, Chris Double <chris.double at double.co.nz> wrote:
> The video element  description states that Theora, Voribis and Ogg
> container should be supported. How should closed captions and audio
> description tracks for accessibility be supported using video and
> these formats?
> I was pointed to a page outlining some previous discussion on the issue:
> http://wiki.whatwg.org/wiki/Video_accessibility
> Is there a way of identifying which track is the closed caption track,
> which is the alternate audio track, etc? How are other implementors of
> the video element handling this issue?
> Is CMML for the closed captions viable? Or a speex track for the
> alternate audio? Or using Ogg Skeleton in some way to get information
> about the other tracks?
> Chris
> --
> http://www.bluishcoder.co.nz

More information about the whatwg mailing list