[whatwg] Thoughts on video accessibility
ian at hixie.ch
Sat Dec 27 01:16:02 PST 2008
I have carefully read all the feedback in this thread concerning
associating text with video, for various purposes such as captions,
Taking a step back as far as I can tell there are two axes: where the
timed text comes from, and how it is rendered.
Where it comes from, it seems, boils down to three options:
- embedded in or referenced from the media resource itself
- as a separate file parsed by the user agent
- as a separate file parsed by the web page
Where the timed text is rendered boils down to two options:
- rendered automatically by the user agent
- rendered by the web page overlaying content on the video
For the purposes of this discussion I am ignoring burned-in captions,
since they're basically equivalent to a different video, much like videos
with overlayed sign language interpreters (or VH1 pop-up's annotations!).
These 5 options give us 6 cases:
1. Timed text in the resource itself (or linked from the resource itself),
rendered as part of the video automatically by the user agent.
This is the optimal situation from an accessibility and usability point of
view, because it works when the video is shown full-screen, it works when
the video is saved separate from the Web page, it works easily when other
pages link to the same video file, it requires minimal work from the page
author, and so forth.
This is what I think we should be encouraging.
It would probably make sense to expose the timed text track selection to
the Web page through the API, maybe even expose the text itself somehow,
but these are features that can and should probably wait until <video> has
been more reliably implemented.
2. Timed text in the resource itself (or linked from the resource itself),
exposed to the Web page with no native rendering.
This allows pages to implement experimental subtitling mechanisms while
still allowing the timed text tracks to survive re-use of the video file,
but it seems to introduce a high cost (all pages have to implement
subtitling themselves) with very little gain, and with several
disadvantages -- different sites will have inconsistent subtitling, bugs
will be prevalent in the subtitling and accessibility will thus suffer,
and in all likelihood even videos that have subtitles will end up not
having them shown as small sites sites don't bother to implement anything
but the most basic controls.
3. Timed text stored in a separate file, which is then parsed by the user
agent and rendered as part of the video automatically by the browser.
This would make authoring subtitles somewhat easier, but would typically
lose the benefits of subtitles surviving when the video file is extracted.
It would also involve a distinct increase in implementation and language
complexity. We would also have to pick a timed text format, or add yet
another format war to the <video>/<audio> codec debacle, which I think
would be a really big mistake right now. Given the immature state of timed
text formats (it seems there are new formats announced every month), it's
probably premature to pick one -- we should let the market pick one first.
4. Timed text stored in a separate file, which is then parsed by the user
agent and exposed to the Web page with no native rendering.
This combines the disadvantages of the previous two options, without
really introducing any groundbreaking advantages.
5. Timed text stored in a separate file, which is then fetched and parsed
by the Web page, which then passes it to the browser for rendering.
This is just an excessive level of complexity for a feature that could
just be supported exclusively by the user agent. In particular, it doesn't
actually provide for much space for experimentation -- whatever API we
provide to expose the subtitles would limit what the rendering would be
like regardless of what the pages want to try.
This option side-steps the issue of picking a format, though.
6. Timed text stored in a separate file, which is then fetched and parsed
by the Web page, and which is then rendered by the Web page.
We can't stop this from being available, and there's not much we can do to
help with this case beyond what we do now. The disadvantages are that it
doesn't work when the video is shown full-screen, when the video is saved
separate from the Web page, when other pages link to the same video file
without using their own implementation of the feature, and it requires
substantial implementation work from the page. The _advantages_, and they
are significant, are that pages can easily create subtitles separate from
the video, they can easily provide features such as automated
translations, and they can easily implement features that would otherwise
seem overly ambitious, e.g. hyperlinked annotations with ad tracking.
Based on this analysis it seems to me that cases 1 and 6 are important to
support, but that cases 2 to 5 aren't as compelling -- they either have
disadvantages that aren't outweighed by their advantages, or they are just
being less powerful than other options.
Cases 1 and 6 right now don't require changes to the spec. I think we
should eventually provide the APIs mentioned above under case 1 since they
would help bridge the gap between the two types of timed text solutions,
but as noted above I think we should wait until implementations are more
mature before extending the API further.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg