[whatwg] Thoughts on video accessibility

Thu Jul 16 01:28:34 PDT 2009

On Thu, 16 Jul 2009 07:58:30 +0200, Silvia Pfeiffer  
<silviapfeiffer1 at gmail.com> wrote:

> Hi Ian,
>
> Great to see the new efforts to move the subtitle/caption/karaoke
> issues forward!
>
> I actually have a contract with Mozilla starting this month to help
> solve this, so I am more than grateful that you have proposed some
> ideas in this space.
>
> On Thu, Jul 16, 2009 at 9:38 AM, Ian Hickson<ian at hixie.ch> wrote:
>> On Sat, 27 Dec 2008, Silvia Pfeiffer wrote:
>>> > 1. Timed text in the resource itself (or linked from the resource
>>> > itself), rendered as part of the video automatically by the user
>>> > agent.
>>>
>>> For case 1, the practical implications are that browser vendors will
>>> have to develop support for a large variety of text codecs, each one
>>> providing different functionalities.
>>
>> I would hope that as with a video codec, we can standardise on a single
>> subtitle format, ideally some simple media-independent combination of  
>> SRT
>> and LRC [1]. It's difficult to solve this problem without a standard
>> codec, though.
>
> I have myself thought about creating a new format to address the needs
> for time-aligned text in audio/video.
>
> However, the problem with creating a new format is that you start from
> scratch and already spreaded formats are not supported.
>
> I can see that your proposed format is trying to be backwards
> compatible with SRT, so at least it would work for the large number of
> existing srt file collections. I am still skeptical, in particular
> because there are no authoring systems for this format around.
> But I would be curious what others think about your proposed SRT-LRC-mix.

There are already more formats than you could possibly want on the scale  
between SRT (dumb text) and complex XML formats like DFXP or USF (used in  
Matroska). In my layman opinion both extremes make sense, but anything in  
between I'm rather skeptical to.

>>> In fact, the easiest solution would be if that particular format was
>>> really only HTML.
>>
>> IMHO that would be absurd. HTML means scripting, embedded videos, an
>> unbelivably complex rendering system, complex parsing, etc; plus, what's
>> more, it doesn't even support timing yet, so we'd have to add all the
>> timing and karaoke features on top of it. Requiring that video players
>> embed a timed HTML renderer just to render subtitles is like saying that
>> we should ship Microsoft Word with every DVD player, to handle the user
>> input when the user wants to type in a new chapter number to jump to.
>
> I agree, it cannot be a format that contains all the complexity of
> HTML. It would only support a subpart of HTML that is relevant, plus
> the addition of timing - and in this case is indeed a new format. I
> have therefore changed my mind since I sent that email in Dec 08 and
> am hoping we can do it with existing formats.

I think that eventually we will want timing/synchronization in HTML for  
synchronizing multiple video or audio tracks. As far as I can tell no  
browser wants to implement the addCueRange API (removing this should be  
the topic of a separate mail), so we really need to re-think this part and  
I think that timed text plays an important part here.

> In particular, I have taken an in-depth look at the latest
> specification from the Timed Text working group that have put years of
> experiments and decades of experience into developing DFXP. You can
> see my review of DFXP here:
> http://blog.gingertech.net/2009/06/28/a-review-of-the-w3c-timed-text-authoring-format/
> . I think it is both too flexible in a lot of ways, but also too
> restrictive in others. However, it is a well formulated format that is
> also getting market traction. In addition, it is possible to formulate
> profiles to add missing functionality.
>
> If we want a quick and dirty hack, srt itself is probably the best
> solution. If we want a well thought-out solution, DFXP is probably a
> better idea.
>
> I am currently experimenting with these and will be able to share
> something soon for further discussion.
>
>
>>> > 3. Timed text stored in a separate file, which is then parsed by the
>>> > user agent and rendered as part of the video automatically by the
>>> > browser.
>>> >
>>> Maybe we should consider solving this differently. Either we could
>>> encapsulate into the video container upon download. Or we could create  
>>> a
>>> zip-file or tarball upon download. I'd just find it a big mistake to
>>> ignore the majority use case in the standard, which is why I proposed
>>> the <text> elements inside the <video> tag.
>>
>> If browser vendors are willing to merge subtitles and video files when
>> saving them, that would be great. Is this easy to do?
>
> My suggestion was really about doing this server-side, which we have
> already implemented years ago in the Annodex project for Ogg
> Theora/Vorbis.
>
> However, it is also possible to do this in the browser: in the case of
> Ogg, the browser just needs to have a multiplexing library installed
> as well as a means to encode the subtitle file (which I like to call a
> "text codec"). Since it's text, it's nowhere near as complex as
> encoding audio or video and just consists of light-weight packaging
> code. So, yes, it is totally possible to have the browsers create a
> binary video file that has the subtitles encapsulated that were
> previously only accessible as referenced text files behind a separate
> URL.
>
> The only issue I see is the baseline codec issue: every browser that
> wants to support multiple media formats has to implement this
> multiplexing and text encoding for every media encapsulation format
> differently, which is annoying and increases complexity. It's however
> generally a small amount of complexity compared to the complexity
> created by having to support multiple codecs.

I disagree, remuxing files would be much more of an implementation burden  
than supporting multiple codecs, at least if a format-agnostic media  
framework is used (be that internal or external to the browser). Remuxing  
would require you to support/create parts of the media framework that you  
otherwise aren't using, i.e. parsers, muxers, file writers and plugging of  
these together (which unlike decoding isn't automatic in any framework  
I've seen).

Anything is doable of course, but I think this is really something that is  
best done server-side using specialized tools.

>>> Here is my example again:
>>> <video src="http://example.com/video.ogv" controls>
>>>  <text category="CC" lang="en" type="text/x-srt"  
>>> src="caption.srt"></text>
>>>  <text category="SUB" lang="de" type="application/ttaf+xml"  
>>> src="german.dfxp"></text>
>>>  <text category="SUB" lang="jp" type="application/smil"  
>>> src="japanese.smil"></text>
>>>  <text category="SUB" lang="fr" type="text/x-srt"  
>>> src="translation_webservice/fr/caption.srt"></text>
>>> </video>
>>
>> Here's a counterproposal:
>>
>>   <video src="http://example.com/video.ogv"
>>          subtitles="http://example.com/caption.srt" controls>
>>   </video>
>
> Subtitle files are created to enable users to choose the text in the
> language that they speak to be displayed. With a simple addition like
> what you are proposing, I don't think such a choice is possible. Or do
> you have a proposal on how to choose the adequate language file?
>
> Also, the attributes on the proposed text element of course serve a  
> purpose:
> * the "category" attribute is meant to provide a default for styling
> the text track,
> * the "language" attribute is meant to provide a means to build a menu
> to choose the adequate subtitle file from,
> * the "type" attribute is meant to both identify the mime type of the
> format and the character set used in the file.
>
> The character set question is actually a really difficult problem to
> get right, because srt files are created in an appropriate character
> set for the language, but there is no means to store in a srt file
> what character set was used in its creation. That's a really bad
> situation to be in for the Web server, who can then only take an
> educated guess. By giving the ability to the HTML author to specify
> the charset of the srt file with the link, this can be solved.
>
> BTW: my latest experiments with subtitles have even a few more
> attributes. I am not ready to publish that yet, but should be within a
> week or so and will be glad to have a further discussion then.
>
>
>> I think this would be fine, on the long term. I don't think the existing
>> implementations of <video> are at a point yet where it makes sense to
>> define this yet, though.
>
> I think we have to start discussing it and doing experiments. I think
> <video> is getting stable enough to move forward. I'm expecting a
> period of discussion and experimentation with time-aligned text both
> in-band and out-of-band, so it's good to get started on this rather
> sooner than later.
>
>
>> It would be interesting to hear back from the browser vendors about how
>> easily the subtitles could be kept with the video in a way that survives
>> reuse in other contexts.

I think that in the case of external subtitles the browser could simply  
save it alongside with the video. It is my experience that is media  
players have much more robust support for external subtitles (like SRT)  
than for internal subtitles, so this is my preferred option (plus it's  
easier).

> Incidentally, I'd be interested in such information about H.264. I
> wonder how easy it will be for example with QuickTime or mp4 to
> encapsulate srt on-the-fly inside a browser.
>
> Regards,
> Silvia.

-- 
Philip Jägenstedt
Core Developer
Opera Software