[whatwg] On the subtitle format for HTML5

Sun May 23 03:30:59 PDT 2010

2010/5/23 Carlos Andrés Solís <csolisr at gmail.com>:
> Hello, I've been writing lately in the WHATWG and WebM mail-lists and would
> like to hear your opinion on the following idea.
>
> Imagine a hypothetical website that delivers videos with subtitles that can
> be chosen by the user. And also imagine there is the possibility of
> downloading a file with the video, along with either the chosen subtitle
> tracks, or all of them at once. The problems on multiple tracks I have
> already discussed in another thread; this one deals mainly with subtitle
> formats. There is still an issue on which format should be used for
> subtitling in HTML5. As you might know, there are basic subtitle formats
> that are formed by timed plain text and little else (like SRT or the
> proposed WebSRT), and there are full-blown subtitle formats that allow for
> extreme formatting and typesetting (like Advanced SubStation Alpha). The
> basic subtitles have the advantage of being easily editable by hand, but
> sacrificing capabilities that advanced formats allow with the cost of
> harder-to-understand syntax. It would be a shame to drop advanced subtitles
> from the HTML5 specs, but it would be bothersome if everybody is forced to
> use a complex-to-write format. So a middle ground could be handy: allowing
> WebSRT for the simple tasks, and using another format for advanced
> typesetting. To put an example, ASSA allows to modify the text font, size,
> border, shadow, scale, rotation, position, and some other properties; it
> also allows movement of text, text animation, karaoke, and even some
> vectorial graphing. But all of that could be achieved with HTML5 programming
> on top of the WebSRT format (or whichever gets chosen). This, of course,
> causes a pair of problems.
> * The first one is that there would be no tools to edit HTML5 subtitles
> specifically, forcing to make a type of subset which would have to be
> standardized, plus an editor to be able to create such subtitles without
> having to learn how to create a full-blown website.
> * The second one is that media players that wanted to use such subtitles
> would be forced to ship an HTML5 decoder. Most media players are NOT web
> browsers, though, or based on one either. The only exceptions I remember are
> media players built on top of XUL, like Songbird or Nightingale. But players
> like WMP, WinAmp, VLC, Xine family, GStreamer family or MPlayer family would
> be left out, since they have no need (and no time) to plug in a web browser
> in a program that hasn't needed it.
>
> Any ideas or suggestions?
> - Carlos Solís
>

I've thought about this problem for a long time. On top of the
dimensions that you describe we have requirements to support not just
subtitles and captions, but also textual audio descriptions, chapter
markers, lyrics, karaoke and possibly other text-based media
alternatives/additions of similar form.

I believe fundamentally there is a need for three levels of
subtitling/captioning/text support:

1. the very basic text cues with in/out times
   This works for basic subtitles, basic captions, lyrics, textual
audio descriptions, and chapter markers at least.

2. text cues with improved styling, positioning, timing, formatting,
and some simple effects (e.g. rotation)
   This works for advanced subtitles, advanced captions, karaoke, and
probably 80% of other use cases.

3. a full capabilities format that can support images, hyperlinks,
vector graphics, animations and scripting.
   Seeing as this is like the full capability of HTML5, this should
satisfy all needs.

The scale from basic to full capabilities is, of course, rather
continuous and existing formats only fall roughly in those given three
groups.

ASSA would indeed satisfy most of the full capabilities level, which
then of course includes level 2 and 1.
WebSRT would satisfy level 2 and by that level 1.

You raise a concern about introducing a new format and the lack of
support by existing authoring software. I've had that concern in the
past, too. However, seeing the overwhelming success that Google had
with introducing a new media format, I can see that given sufficient
support by companies and sufficient engineering force put behind it,
introducing a new subtitling format should be fairly easy. I am not
sure that WebSRT is the right format (seeing as it doesn't scale to
the full capabilities). But I don't think any longer that introducing
a new format is a problem.

Also, looking at the capabilities of software that use/create ASSA
today, I see the tendency that the complex capabilities are not much
supported anyway. I would consider even to introduce a more HTML-like
format such that the Web browsers can make use of their existing
capabilities for playing back complex features (animations, SVG etc).
Existing authoring software for ASSA would then only need to export a
file format that is more HTML-like.

As for your second concern: non-Web media players would have an issue
with a new complex format that would require HTML features to be
implemented. I guess that is a concern. They would have the choice, of
course, to not support a new format at all. Or if they wanted to
support it, they would need to parse the new format and display the
features. However, that is not much different from parsing ASSA and
displaying those features - if the player already supports ASSA, it
could just reuse that code to interpret the Web format. If not,
introducing a Web engine would allow them to provide those features -
for ASSA and the new format alike.

I guess I'm starting to talk myself into wanting a more HTML-like
format than WebSRT that scales to provide full features by using
existing Web technology. I'd be curious to hear what others think...

Cheers,
Silvia.