[whatwg] Video, Closed Captions, and Audio Description Tracks

Tue Nov 13 04:04:34 PST 2007

Sorry to be getting back to this thread this late, but I am trying to
catch up on email.

I'd like to contribute some thoughts on Ogg, CMML and Captions and
will cite selectively from emails in this thread.

On Oct 9, 2007 5:22 PM, Henri Sivonen <hsivonen at iki.fi> wrote:
> On Oct 8, 2007, at 22:12, Dave Singer wrote:
> > I don't think we can or should 'climb inside' the content formats,
> > merely have a standard way to ask them to do things (e.g. turn on
> > captions).
>
> I agree. However, in order for the HTML 5 spec to be able to
> reasonably and pragmatically tell browsers to ask the video subsystem
> to perform tasks like "turn on captions", we need to check that the
> obviously foreseeable format families (Ogg in the case of Mozilla
> and, apparently, Opera and MPEG-4 in the case of Apple) are able to
> cater for such tasks. Moreover, browsers and content providers need
> to have a shared understanding of how to do this concretely.
>
> > This should all be out of scope, IMHO;  this is about the design of
> > a captioning system, which I don't think we should try to do.
>
> I think the captioning format should be specified by the video format
> family. However, in this case it has become apparent that there
> currently isn't One True Way of doing captioning in the Ogg family.
> In principle, this is a problem that the specifiers of the Ogg family
> should solve. In practice, though, this thread arises directly from
> an issue hit by the Mozilla implementation effort. Since the WHATWG
> is about interoperable implementations, it becomes a WHATWG problem
> to make sure that browsers that implement Ogg for <video> and content
> providers have the same understanding of what the One True Way of
> doing captioning in Ogg is if the HTML 5 spec tosses the captioning
> problem to the video format (which, I agree, is the right place to
> toss it to). Hopefully, the HTML 5 spec text can be a one-sentence
> informative reference to a spec by another group. But which spec?

Ogg indeed currently has no preferred means of specifying captions.
Usually it happens through a separate srt or ssa or similar file and
the player makes sure to display the captions correctly.

I just had a look at the W3C DFXP format
(http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/). It looks rather
similar to CMML, lacks the hyperlinking functionality, but has
stylesheet and formatting support in it and more
subtitle/karaoke-specific functionality. I believe it would be
straightforward to define a media mapping for DFXP into Ogg should we
decide that DFXP is the way forward. Similarly, it would be rather
simple to define a media mapping for any of the anime subtitle formats
mentioned above.

Somewhat orthogonal to the discussion about subtitles is the use of
CMML. Yes, it is possible to use CMML in its current specification as
a superset of a srt-type sutitle format. However, the "description"
element would then need be interpreted as the caption, which is
somewhat of a misuse. I actually see captions and CMML as orthogonal
concepts - CMML provides hyperlinks and machine-readable textual
annotations in a timed manner, while captions provide formatted text
for users to read.

=================

On Oct 10, 2007 3:03 AM, Maik Merten <maikmerten at gmx.net> wrote:
> Benjamin Hawkes-Lewis schrieb:
> Actually I wonder if it wouldn't make sense to have an attribute for
> media elements specifying a URI for a file containing Timed Text. These
> externally stored (not embedded in a media file) captions would be
> codec-agnostic and could be used to reuse the very same set of captions
> for e.g. differently encoded media (Ogg, MPEG,
> Generic-Codec-Of-The-Season, ...).

In the above described cases of DFXP, srt, ssa or CMML, each one of
these are text documents that can potentially live independent of the
video file n a server ("externally stored"). In fact, apart from CMML,
there is no defined mapping for Ogg of the others as yet.

> As a side note I like the idea of captions which are more than just the
> usual stream text. Imagine a newsreel with timed "Would you like to know
> more?" links. Given that HTML5 is usually viewed in browsers that
> implement at least a non-empty subset of HTML I imagine it should be
> possible for the browser to layer something div-equivalent over the
> media elements supporting captioning and pipe the HTML captions into it
> (with caution, imagine a caption itself recursively embedding a video).

That is exactly what CMML provides to Ogg: timed textual annotations,
hyperlinks out of a video, and hyperlink into a video (URI addressable
offsets and sections in the file).

I am wondering whether it might be a good idea to include some of the
DFXP specifications into CMML to enable it better for captiosn and
thus not have to deal with multiple timed text formats. I haven't
thought this through yet.

====

On Oct 10, 2007 3:42 AM, Anne van Kesteren <annevk at opera.com> wrote:
> On Tue, 09 Oct 2007 18:03:41 +0200, Maik Merten <maikmerten at gmx.net> wrote:
> >> http://www.w3.org/TR/2006/CR-ttaf1-dfxp-20061116/
> >
> > Actually I wonder if it wouldn't make sense to have an attribute for
> > media elements specifying a URI for a file containing Timed Text. These
> > externally stored (not embedded in a media file) captions would be
> > codec-agnostic and could be used to reuse the very same set of captions
> > for e.g. differently encoded media (Ogg, MPEG,
> > Generic-Codec-Of-The-Season, ...).
>
> This would be problematic when downloading the video for offline use or
> further distribution. This is also different from how this currently works
> for DVDs, iPod, and the like as far as I can tell. It also makes authoring
> more complicated in the cases where someone hands a video to you as you'd
> have to separate the closed caption stream from it first and point to it
> as a separate resource.

Think it through: when you currently download a video from bittorrent,
you download the subtitle file with it - mostly inside a zip file for
simplicity even. Downloading a separate caption file  is similar to
how you currently have to download the images separately for a Web
page. It's no big deal really as long as there is a connection that
can be automatically identified (e.g. through a link to the other
inside the one, or through a zip-file, or through a description file).

Actually for the authoring, I completely disagree. Authoring a
captioning file inside a text editor is much simpler than needing a
special application to author the captions directly inside a video
file.

In any case: I don't think it's a matter of one or the other. I
believe firmly that it should be both, no matter what caption format
and video format is being used.

=====

On Oct 10, 2007 3:46 AM, Henri Sivonen <hsivonen at iki.fi> wrote:
> On Oct 9, 2007, at 19:24, Dave Singer wrote:
> > How the Ogg community designs intrinsic caption support is up to
> > them, isn't it?
>
> In theory ideally yes.
>
> However, when HTML 5 says "User agents should support Ogg Theora
> video and Ogg Vorbis audio, as well as the Ogg container format." and
> "User agents should provide controls to enable or disable the display
> of closed captions associated with the video stream, though such
> features should, again, not interfere with the page's normal
> rendering." it becomes a WHATWG issue to elicit a way to satisfy both
> "should" requirements at the same time if implementors don't
> otherwise have sufficient guidance on how to implement closed
> captioning support for Ogg interoperably.

Yes and no. Even if WHATWG decides that you should use Ogg with DFXP
inside it for captioning - as long as the Ogg community does not
provide a media mapping (i.e. a prescrption on how to do the embedding
into the Ogg container), there is no standard means for doing so.
Thus, if there is a need for such a mapping, the Ogg community would
indeed need to create such a specification, unless there is no need
for encapsulating the caption files directly inside the Ogg container.
I believe howere, that such a specification is necessary to enable
ubiquitous usabilty and uptake.

Regards,
Silvia.

---
Dr Silvia Pfeiffer
Annodex Association
Xiph Foundation Member