[whatwg] Introduction of media accessibility features

Sam Dutton Sam.Dutton at bbc.co.uk
Fri Apr 23 09:42:24 PDT 2010

Some thoughts about the Media Multitrack and TextAssociations specs --
and also about http://wiki.whatwg.org/wiki/Timed_tracks...

The specs are great news in terms of accessibility and should open up
possibilities for search and temporal addressing. There may be cases
where it makes sense to encode metadata with a media resource, but the
ability to use timed data in textual format, synchronised but separate
from the media it relates to, has to be a good thing.

The specs and the Timed Tracks wiki also made me think about the
increasing *granularity* of media and digitised (meta)data. 

For example: film archives used to have no more than a one-sentence
description for an entire can of film, illegibly written in a book or on
index cards or (more recently) with a few lines in a Word document. Now,
digitised footage will often be provided alongside timed, digitised
metadata: high quality, structured, detailed, shot-level, frame accurate
data about content, location, personnel, dialogue, rights, ratings and
more. Accelerating digitisation is at the heart of this
'granularisation', obviously, but a variety of technologies contribute:
linked data and semantic markup, temporal URLs, image recognition (show
me frames in this video with a car in them), M3U / HTTP streaming, and
so on -- even the new iPhone seekToTime method.

So, in addition to what's on offer in the specs, I'm wondering if it
might be possible to have time-aligned *data*, with custom roles.  

For example, imagine a video with a synchronised 'chapter' carousel
below it (like the R&DTV demo at
http://www.bbc.co.uk/blogs/rad/2009/08/html5.html). The video element
would have a track with 'chapter' as its role attribute, and the
location of the chapter data file as its src. The data file would
consist of an array of 'chapter' objects, each representing some timed
data. Every object in the track source would require a start and/or end
values, and a content value with arbitrary properties:

    start: 10.00,
    end: 20.00,
    content: {
        title: "Chapter 2",
        description: "Some blah relating to chapter 2",
        image: "/images/chapter2.png"
    start: 20.00,
    end: 30.00,
    content: {
        title: "Chapter 3",
        description: "Chapter 3 blah",
        image: "/images/chapter3.png"

In this example, selecting the chapter track for the video would cause
the video element to emit segment entry/exit events -- a bit like the
Cue Ranges idea. In this example, each event would correspond to an
object in the chapter data source. 

I'm not sure of the best way to implement the Event object for a 'data
track', but maybe it would include:
- a type property, as for other Event objects, which would evaluate to
'chapter' in this case
- a content property evaluating to the content object defined in the
- a property indicating entry or exit (this seems a bit woolly...)

To make use of data tracks, content authors would need to build layouts
with elements that could listen for events and display content
appropriately -- and data tracks could also refer to content areas
provided by the browser, e.g. for captions. Conversely, multiple content
providers could provide different data tracks for the same media.

This approach would also make it possible to publish multiple data
tracks, separately searchable and displayable. For example, a footage
archive could provide a track each for sound effects, dialogue, and
location. (This example makes me think -- maybe it should be possible to
select multiple tracks?)

I can imagine various other scenarios:
- a journalist builds a slideshow of images synchronised with audio
- an educational publisher builds multiple annotation tracks for video
clips, providing different sets of content for different school years
- a news provider builds an archive search function, enabling users to
navigate via search to individual segments of footage and view
synchronised shot descriptions and metadata
- a broadcaster publishes multiple tracks of content for a sporting
event, including technical detail, follow-a-competitor, and a comedy
- an architect videos a building site, adding timed annotations like
YouTube Annotations.

Of course, most of what I've described can already be achieved
reasonably well with a bit of JavaScript hacking, but all these examples
belong to such a common class of use case that I think it might be
better to have some kind of native implementation, rather than a variety
of JavaScript alternatives reliant on the timeupdate event. 

Sam Dutton

This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

More information about the whatwg mailing list