[whatwg] Captions, Subtitles and the Video Element

Henri Sivonen
Thu Feb 19 23:03:08 PST 2009

On Feb 20, 2009, at 00:37, Greg Millam wrote:

>  The current state of accessibility and captions in HTML5 has been
> relegated to http://wiki.whatwg.org/wiki/Video_accessibility - a wiki
> page with use cases, requirements, existing solutions, and an empty
> "Proposed Solutions" category.

Since then, the active work has moved to the Mozilla wiki and to Xiph:

Silvia Pfeiffer has been working on this as a Mozilla Foundation  

>  * <video> . . . </video> is not necessarily a standalone tag. If the
> author desires, they can add more elements to define tracks. Whether
> this should be <caption type="format" src="..." media="caption"> or
> <source type="timedtext/format" src="..."> can vary. (I prefer
> <caption> as it's more explicit).

FWIW, you can't use the element name <caption> for legacy reasons. You  
can't use the element name <text>, since that would introduce new name  
collisions with SVG 1.1.

>  * Support for (at minimum) "Subrip" format. Subrip I choose here for
> the same reason we picked it for YouTube: It's readable,
> understandable, and simple. You can create one with your favorite
> editor. Subrip has no style associated with individual captions, so
> can be subject to CSS caption rules for "SPAN.caption"

I agree it makes sense to start with something simple. The markupless  
flavor of SRT would be such a format. However, supporting the  
formatting tags in later flavors of SRT is a can of worms: You'd  
quickly end up introducing a third HTML/XML-like parser into the  
browser. Further, the formatted flavors of SRT have become victims of  
the same problem that the RSS <title> became a victim of. Let's not go  

For formatted captions, I think it makes sense to overlay a browsing  
context onto the video and make HTML/CSS-based captions render into  
that browsing context on the main thread (tolerating some timing  
jitter relative to the video track).

http://wiki.xiph.org/index.php/Timed_Divs_HTML is a proposal to this  
direction, but it lacks a concrete processing model proposal at present.

>  * Support for other formats (608, 708, .ass, dfxp, etc) up to the
> user agent. (But preferred!)

DFXP reinvents a lot of stuff that browsers already implement in their  
CSS formatter. From a browser code reuse point of view, it makes more  
sense to use HTML+CSS.

Henri Sivonen
hsivonen at iki.fi

