[whatwg] Captions, Subtitles and the Video Element

Greg Millam millam at google.com
Thu Feb 19 14:37:43 PST 2009

Hi guys -

  I'm one of the main engineers responsible for captioning support on
YouTube, and I've joined the Chrome team at Google to attempt to help
drive video captions and subtitling forward: Both to implement support
in Chrome for it, and to push for HTML5 support for captions.

  In my following statements, I am working off of a search through the
mailing list and reading of the HTML 5 spec. Particularly where the
Video tag is concerned. If there are any factual errors, or I'm way
off, just point my way. All this is as far as I can discover.

  The current state of accessibility and captions in HTML5 has been
relegated to http://wiki.whatwg.org/wiki/Video_accessibility - a wiki
page with use cases, requirements, existing solutions, and an empty
"Proposed Solutions" category. I aim to fix that. My main goal here is
to prevent captioning from missing out on HTML5 and being dropped
"because we never got around to it". (a la HDMI)

Here is my proposal:

Use cases:
  * Accessibility.
  * Ability to audiences in other languages.

  * Allow movie formats to include captioning support.
  * Make it simple for an author to create and publish transcripts,
without requiring them to embed it into the movie.
  * Make it simple for caption or subtitle tracks to be accessible.
  * Allow full javascript control: List, add, delete, and create caption tracks.
  * Provide a required format to act as a baseline across all browsers.

The current state of the <video> element includes support for defining
a source video file, local or remote. There is no method to define a
caption source or track.

Proposed Solution:

HTML5 / Page Author:
  * Each video will have a list of zero or more Timed Text tracks.
  * A track has three variables for selection: Type, Language, Name.
These can be null, except for name.
  * Type is a string, and may be (but is not limited to): "Caption"
"Transcript" "Translation" "Subtitles", etc. Others can be defined by
the user (e.g: "Commentary" "User Comments").
  * Language is a language code (en, es, pt_BR, etc)
  * Name is a freeform text identifier. By default, "default" or
"caption". If a video file has multiple tracks, they are added as
"caption1" "caption2", etc.
  * <video> . . . </video> is not necessarily a standalone tag. If the
author desires, they can add more elements to define tracks. Whether
this should be <caption type="format" src="..." media="caption"> or
<source type="timedtext/format" src="..."> can vary. (I prefer
<caption> as it's more explicit).
  * <caption src="foo.srt" type="caption" language="en" name="default"
/> adds a new caption. <caption> is standalone.
  * All timed text tracks encoded in the video file are added to the
list, as an implicit caption element.
  * Caption tags, when displayed, count as <span
class="caption">...</span> unless they have style associated with them
(uncommon). So they can be tweaked via CSS. Whether by the author or
overridden by useragent.

User Agent:
  * Implements support for <caption> tag.
  interface MediaCaptionElement : HTMLElement {
             attribute DOMString src;
             attribute DOMString format; // default: "auto".
             attribute DOMString type;
             attribute DOMString language;
             attribute DOMString name;
             attribute DOMBoolean enabled;
  * Media elements now have a list of Captions associated with it.

  * Support for (at minimum) "Subrip" format. Subrip I choose here for
the same reason we picked it for YouTube: It's readable,
understandable, and simple. You can create one with your favorite
editor. Subrip has no style associated with individual captions, so
can be subject to CSS caption rules for "SPAN.caption"
  * Support for other formats (608, 708, .ass, dfxp, etc) up to the
user agent. (But preferred!)

  * Media or Video elements now have additional features exposed via javascript.
  * getCaptionList(): returns an array of caption elements.
  * addCaption({src:'',name:'',language:'',type:''}) - Adds a Caption element.
  * enableCaption(captionElement) - Enables a CaptionElement for
display. If captionElement is null, enable the first track in the
  * disableCaption(captionElement) - Disables CaptionElement for
display. If captionElement is null, disabled all tracks in the list.

User Agent UI: (Only relevant if User Agent adds its own controls for media):
  * Must be able to enable caption Elements.
  * Preferably by a button on the UI with either "CC" or a double
underscore (preferred).

User Agent Context Menu:
  * Must have captions, with a list to enable/disable.


Well - That's a start, and that's what I'd like to see and implement
over the next several months. Input and discussion would be much
appreciated! If anybody here has worked on it, I'd also like to talk
to you.


- Greg Millam


Every time you give up a dream, a chicken stays on its side of the road.

More information about the whatwg mailing list