[whatwg] Subtitles, captions, and other tracks augmenting video or audio

Tue Apr 20 06:07:16 PDT 2010

On Tue, Apr 20, 2010 at 8:33 AM, Ian Hickson <ian at hixie.ch> wrote:
> On Fri, 16 Apr 2010, Ian Hickson wrote:
>>
>> I'm starting to look at the feedback sent over the past few years for
>> augmenting audio and video with additional timed tracks such as
>> subtitles, captions, audio descriptions, karaoke, slides, lyrics, ads,
>> etc. One thing that would be really helpful is if we could get together
>> a representative sample of typical uses of these features, as well as
>> examples of some of the more extreme uses.
>>
>> If anyone has any examples, please add them here:
>>
>>    http://wiki.whatwg.org/wiki/Timed_tracks
>
> I've started filling in the above and writing observations at the foot of
> the page based on the examples there. This is going to heavily influence
> how I evaluate proposals, so now would be a really good time to check out
> these examples and add any more if there are important features that I've
> missed.

Hi Ian,

I spent some time today filling that page and when I came back to it
just now it seems you have moved most of the use cases elsewhere,
namely to http://wiki.whatwg.org/wiki/Use_cases_for_API-level_access_to_timed_tracks
.

IIUC the idea is to focus on the core problem at hand which right now
are captions and subtitles. That is fair enough, but I think you might
want to reconsider this for lyrics and chapter markers. I'm ok with
moving the others to a later stage.

Firstly about the Lyrics. I think they are just the same as captions
and should go back into the first document. In particular since we are
talking about captions and subtitles for both the <video> and the
<audio> element and this shows some good examples of how lyrics are
being displayed as time-aligned text with audio resources. Most of
these examples are widgets used on the Web, so I think they are
totally relevant.

Lyrics (LRC) files typically look like this:

[ti:Can't Buy Me Love]
[ar:Beatles, The]
[au:Lennon & McCartney]
[al:Beatles 1 - 27 #1 Singles]
[by:Wooden Ghost]
[re:A2 Media Player V2.2 lrc format]
[ve:V2.20]
[00:00.45]Can't <00:00.75>buy <00:00.95>me <00:01.40>love,
<00:02.60>love<00:03.30>, <00:03.95>love, <00:05.30>love<00:05.60>
[00:05.70]<00:05.90>Can't <00:06.20>buy <00:06.40>me <00:06.70>love,
<00:08.00>love<00:08.90>

There is some metadata at the start and then there are time fragments,
possibly overloaded with explicit subtiming for individual works in
karaoke-style. This is not very different from SRT and in fact should
fit with your Karaoke use case.

I'm also confused about the removal of the chapter tracks. These are
also time-aligned text files and again look very similar to SRT. Here
is an extract of a QTtext chapter track example:

{QTtext} {size:16} {font:Lucida Grande} {width:320} {height:42}
{language:0} {textColor:65535,65535,65535} {backColor:0,0,0}
{doNotAutoScale:off} {timeScale:100} {timeStamps:absolute}
{justify:center}
[00:00:09.30]
Chocolate Rain
[00:00:12.00]
Some stay dry and others feel the pain
[00:00:16.00]
Chocolate Rain
[00:00:18.00]
A baby born will die before the sin

So, while I can understand that you currently want to focus on just
solving captions and subtitles, I think it is important to keep other
time-aligned text applications that can be solved in the exact same
way part of the design to keep an open mind about general time-aligned
text use cases.

Cheers,
Silvia.