[whatwg] Thoughts on video accessibility

Wed Jul 15 22:58:30 PDT 2009

Hi Ian,

Great to see the new efforts to move the subtitle/caption/karaoke
issues forward!

I actually have a contract with Mozilla starting this month to help
solve this, so I am more than grateful that you have proposed some
ideas in this space.

On Thu, Jul 16, 2009 at 9:38 AM, Ian Hickson<ian at hixie.ch> wrote:
> On Sat, 27 Dec 2008, Silvia Pfeiffer wrote:
>> > 1. Timed text in the resource itself (or linked from the resource
>> > itself), rendered as part of the video automatically by the user
>> > agent.
>>
>> For case 1, the practical implications are that browser vendors will
>> have to develop support for a large variety of text codecs, each one
>> providing different functionalities.
>
> I would hope that as with a video codec, we can standardise on a single
> subtitle format, ideally some simple media-independent combination of SRT
> and LRC [1]. It's difficult to solve this problem without a standard
> codec, though.

I have myself thought about creating a new format to address the needs
for time-aligned text in audio/video.

However, the problem with creating a new format is that you start from
scratch and already spreaded formats are not supported.

I can see that your proposed format is trying to be backwards
compatible with SRT, so at least it would work for the large number of
existing srt file collections. I am still skeptical, in particular
because there are no authoring systems for this format around.
But I would be curious what others think about your proposed SRT-LRC-mix.

>> In fact, the easiest solution would be if that particular format was
>> really only HTML.
>
> IMHO that would be absurd. HTML means scripting, embedded videos, an
> unbelivably complex rendering system, complex parsing, etc; plus, what's
> more, it doesn't even support timing yet, so we'd have to add all the
> timing and karaoke features on top of it. Requiring that video players
> embed a timed HTML renderer just to render subtitles is like saying that
> we should ship Microsoft Word with every DVD player, to handle the user
> input when the user wants to type in a new chapter number to jump to.

I agree, it cannot be a format that contains all the complexity of
HTML. It would only support a subpart of HTML that is relevant, plus
the addition of timing - and in this case is indeed a new format. I
have therefore changed my mind since I sent that email in Dec 08 and
am hoping we can do it with existing formats.

In particular, I have taken an in-depth look at the latest
specification from the Timed Text working group that have put years of
experiments and decades of experience into developing DFXP. You can
see my review of DFXP here:
http://blog.gingertech.net/2009/06/28/a-review-of-the-w3c-timed-text-authoring-format/
. I think it is both too flexible in a lot of ways, but also too
restrictive in others. However, it is a well formulated format that is
also getting market traction. In addition, it is possible to formulate
profiles to add missing functionality.

If we want a quick and dirty hack, srt itself is probably the best
solution. If we want a well thought-out solution, DFXP is probably a
better idea.

I am currently experimenting with these and will be able to share
something soon for further discussion.

>> > 3. Timed text stored in a separate file, which is then parsed by the
>> > user agent and rendered as part of the video automatically by the
>> > browser.
>> >
>> Maybe we should consider solving this differently. Either we could
>> encapsulate into the video container upon download. Or we could create a
>> zip-file or tarball upon download. I'd just find it a big mistake to
>> ignore the majority use case in the standard, which is why I proposed
>> the <text> elements inside the <video> tag.
>
> If browser vendors are willing to merge subtitles and video files when
> saving them, that would be great. Is this easy to do?

My suggestion was really about doing this server-side, which we have
already implemented years ago in the Annodex project for Ogg
Theora/Vorbis.

However, it is also possible to do this in the browser: in the case of
Ogg, the browser just needs to have a multiplexing library installed
as well as a means to encode the subtitle file (which I like to call a
"text codec"). Since it's text, it's nowhere near as complex as
encoding audio or video and just consists of light-weight packaging
code. So, yes, it is totally possible to have the browsers create a
binary video file that has the subtitles encapsulated that were
previously only accessible as referenced text files behind a separate
URL.

The only issue I see is the baseline codec issue: every browser that
wants to support multiple media formats has to implement this
multiplexing and text encoding for every media encapsulation format
differently, which is annoying and increases complexity. It's however
generally a small amount of complexity compared to the complexity
created by having to support multiple codecs.

>> Here is my example again:
>> <video src="http://example.com/video.ogv" controls>
>>  <text category="CC" lang="en" type="text/x-srt" src="caption.srt"></text>
>>  <text category="SUB" lang="de" type="application/ttaf+xml" src="german.dfxp"></text>
>>  <text category="SUB" lang="jp" type="application/smil" src="japanese.smil"></text>
>>  <text category="SUB" lang="fr" type="text/x-srt" src="translation_webservice/fr/caption.srt"></text>
>> </video>
>
> Here's a counterproposal:
>
>   <video src="http://example.com/video.ogv"
>          subtitles="http://example.com/caption.srt" controls>
>   </video>

Subtitle files are created to enable users to choose the text in the
language that they speak to be displayed. With a simple addition like
what you are proposing, I don't think such a choice is possible. Or do
you have a proposal on how to choose the adequate language file?

Also, the attributes on the proposed text element of course serve a purpose:
* the "category" attribute is meant to provide a default for styling
the text track,
* the "language" attribute is meant to provide a means to build a menu
to choose the adequate subtitle file from,
* the "type" attribute is meant to both identify the mime type of the
format and the character set used in the file.

The character set question is actually a really difficult problem to
get right, because srt files are created in an appropriate character
set for the language, but there is no means to store in a srt file
what character set was used in its creation. That's a really bad
situation to be in for the Web server, who can then only take an
educated guess. By giving the ability to the HTML author to specify
the charset of the srt file with the link, this can be solved.

BTW: my latest experiments with subtitles have even a few more
attributes. I am not ready to publish that yet, but should be within a
week or so and will be glad to have a further discussion then.

> I think this would be fine, on the long term. I don't think the existing
> implementations of <video> are at a point yet where it makes sense to
> define this yet, though.

I think we have to start discussing it and doing experiments. I think
<video> is getting stable enough to move forward. I'm expecting a
period of discussion and experimentation with time-aligned text both
in-band and out-of-band, so it's good to get started on this rather
sooner than later.

> It would be interesting to hear back from the browser vendors about how
> easily the subtitles could be kept with the video in a way that survives
> reuse in other contexts.

Incidentally, I'd be interested in such information about H.264. I
wonder how easy it will be for example with QuickTime or mp4 to
encapsulate srt on-the-fly inside a browser.

Regards,
Silvia.