[whatwg] Timed tracks: feedback compendium

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Fri Sep 10 16:27:48 PDT 2010

On Fri, Sep 10, 2010 at 11:00 PM, Philip Jägenstedt <philipj at opera.com>wrote:

> On Thu, 09 Sep 2010 15:08:43 +0200, Silvia Pfeiffer
> <silviapfeiffer1 at gmail.com> wrote:
>  On Wed, Sep 8, 2010 at 9:19 AM, Ian Hickson <ian at hixie.ch> wrote:
>>> On Fri, 23 Jul 2010, Philip Jägenstedt wrote:
>>> If we must have both kind=subtitles and kind=captions, then I'd suggest
>>> > making the default subtitles, as that is without a doubt the most
>>> common
>>> > kind of timed text. Making captions the default only means that most
>>> > timed text will be mislabeled as being appropriate for the HoH when it
>>> > is not.
>>> Ok, I've changed the default. However, I'm not fighting this battle if it
>>> comes up again, and will just change it back if people don't defend
>>> having
>>> this as the default. (And then change it back again if the browsers pick
>>> "subtitles" in their implementations after all, of course.)
>>> Note that captions aren't just for users that are hard-of-hearing. Most
>>> of
>>> the time when I use timed tracks, I want captions, because the reason I
>>> have them enabled is that I have the sound muted.
>> Hmm, you both have good points. Maybe we should choose something as the
>> default that is not visible on screen, such as "descriptions"? That would
>> avoid the issue and make it explicit for people who provide captions or
>> subtitles that they have to make a choice.
> If we want people to make an explicit choice, we should make kind a
> required attribute and make browsers ignore <track>s without it. (I think
> subtitles is a good default though.)

I think you misunderstood - my explanation probably wasn't very good. I'm
looking at it from the authoring POV.

What I meant was: if I author a text track that is supposed to be visible on
screen as the video plays back and if we choose either @kind=subtitle or
@kind=caption as the default, then I don't have to really think through
about what I authored as it will be displayed on screen. This invites people
to not distinguish between whether they authored subtitles or captions,
which is a bad thing, because a deaf user may then get tracks with the wrong
label and expectations. If, however, we choose as a default something that
is not visible on screen, e.g. @kind=description or @kind=metadata, then the
author who wants their text track to be visible on screen has to give it a
label, i.e. make an explicit choice between @kind=subtitle and
@kind=caption. I believe this will lead to more correctly labeled content. I
am therefore strongly against default labeling with either subtitle or
caption. We could make @kind a required attribute instead as you are saying.

>  > - Use existing technologies where appropriate.
>>> > [...]
>>> > > - Try as much as possible to have things Just Work.
>>> >
>>> > I think by specifying a standalone cue text parser WebSRT fails on
>>> these
>>> > counts compared to reusing the HTML fragment parsing algorithm for
>>> > parsing cue text.
>>> HTML parsing is a disaster zone that we should avoid at all costs, IMHO.
>>> I
>>> certainly don't think it would make any sense to propagate that format
>>> into anywhere where we don't absolutely have to propagate it.
>> A WebSRT authoring application does not have to create all markup that a
>> HTML fragment parser supports. It would only use what it sees necessary
>> for
>> the use cases that it targets.
>> Browsers are WebSRT players that will consume the HTML fragments created
>> by
>> such authoring applications.
>> In addition, browsers will also be able to consume richer HTML fragments
>> that were created as time-aligned overlays for video  with more fancy
>> styling by Web developers. Something like
>> http://people.mozilla.com/~prouget/demos/vp8/<http://people.mozilla.com/%7Eprouget/demos/vp8/>(you need Firefox for it).
>> Where it says "This movie will eat your planet", you could have fancy
>> timed
>> text.
>> Just as much as there is a need for basic captions and subtitles, there is
>> also a need for fancy time-aligned HTML fragments. It would be very
>> strange
>> if, in order to get that working, people would need to use the "metadata"
>> part of the WebSRT spec.
> Is it likely that HTML in cues alone will be enough to do all the fancy
> things that people might want?

I would really want to try and live without script in cues, which means: no
canvas. But since we have svg, that should be fine. CSS would be required,

> If they need scripts anyway, I'm very happy
> to force them to use metadata, as it also makes the browser implementation
> simpler (that's my opinion of the suggestions made so far, anyway).

I suppose we can start with having people go that way for now and later when
we see that becoming the norm do something about it in a "v2".

>  On Sun, 25 Jul 2010, Silvia Pfeiffer wrote:
>>> >
>>> > I think if we have a mixed set of .srt files out there, some of which
>>> > are old-style srt files (with line numbers, without WebSRT markup) and
>>> > some are WebSRT files with all the bells and whistles and with
>>> > additional external CSS files, we create such a mess for that existing
>>> > ecosystem that we won't find much love.
>>> I'm not sure our goal is to find love here, but in general I would agree
>>> that it would be better to have one format than two. I don't see why we
>>> wouldn't just have one format here though. The idea of WebSRT is to be
>>> sufficiently backwards-compatible that that is possible.
>> With "finding love" I referred to your expressed goals:
>>  - Keep implementation costs for standalone players low.
>>  - Use existing technologies where appropriate.
>>  - Try as much as possible to have things Just Work.
>> With WebSRT, we will have one label for two different types of files: the
>> old-style SRT files and the new WebSRT files. Just putting a single label
>> on
>> them doesn't mean it is one format, in particular when most old files will
>> not be conformant to the new label and
> Apart from the encoding, what else about old SRT files wouldn't
> be conformant?

<font> and <u>

> Does it matter that they aren't conformant if they work
> anyway?

The ones on the wrong charset won't work, at least not without us
introducing specific handling for it - which is incidentally specific
handling that non-Web applications won't get, so they are still left out in
the rain. Think of a new standalone application that was developed just for
WebSRT and only deals with UTF-8. It will not deal well with those legacy

>  many new files will not play in the software created for the old spec.
> As long as we don't add a header, the files will play in most existing
> software. Apart from parsers that assume that SRT is plain text (and thus
> would be unsuitable for much existing SRT content), what kind of breakage
> have you found with WebSRT-specific syntax in existing software?

I think we need to add a header - and possibly other things in the future.
Will we forever have the SRT restrictions hold back the introduction of new
features into WebSRT?

>  None is allowed today, but it would be relatively straight-forward to
>>> introduce metadata before the cues (or even in between the cues). For
>>> example, we could add defaults:
>>>  *
>>>  L:-1 T:50% A:middle
>>>  00:00:20,000 --> 00:00:24,400
>>>  Altocumulus clouds occur between six thousand
>>>  00:00:24,600 --> 00:00:27,800
>>>  and twenty thousand feet above ground level.
>>> We could add metadata (here using a different syntax that is similarly
>>> backwards-compatible with what the spec parser does today):
>>   @charset --> win-1252
>>>  @language --> en-US
>>>  00:00:20,000 --> 00:00:24,400
>>>  Altocumulus clouds occur between six thousand
>>>  00:00:24,600 --> 00:00:27,800
>>>  and twenty thousand feet above ground level.
>> When I read the following:
>> "A WebSRT file body consists of an optional U+FEFF BYTE ORDER MARK (BOM)
>> character, followed by zero or more WebSRT line
>> terminators<
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-line-terminator
>> >,
>> followed by zero or more WebSRT
>> cues<
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-cue
>> >
>> separated
>> from each other by two or more WebSRT line
>> terminators<
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-line-terminator
>> >,
>> followed by zero or more WebSRT line
>> terminators<
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-line-terminator
>> >
>> ."
>> then that doesn't imply for me that we can add anything in front of the
>> WebSRT cues without breaking the spec, or that we can define cues that are
>> not time ranges around the "-->" sign.
> The parsing algorithm simply skips over things it doesn't recognize,
> that's why adding basically any new syntax in between cues wouldn't break
> existing WebSRT parsers.

Legacy SRT parsers are not required to do so and even if they are actually
implemented to deal with this situation, it's a dangerous assumption. We may
as well write into the syntax description of WebSRT that any line that
doesn't match the syntax description has to be ignored, which then has the
effect that every single file in the world is a valid WebSRT file.

Allowing anything as part of the syntax is a bit
> dangerous though, as most unrecognized stuff between cues are likely
> broken cues. Validators should warn about it, not treat it as a comment.

I wasn't aware of the effect of the standardised parsing algorithm for
WebSRT allowing "broken cues" to be dealt with. This will effectively mean
that a parser will be required to parse all files that it is given from
beginning to end and discard all non-conformant lines - even if that file
may be a 100GB large movie file. In this case, I would really recommend that
we put a magic identifier at the beginning of Web SRT files so we can be
sure that the intention of the file was to be a WebSRT file. Let's have the
string "WebSRT" at the beginning of the files.

> Not being convinced we need anything more than simple key-value headers in
> a header, I still looked at the options for comments:
> Making any line with a --> in it be a comment would hide a lot of broken
> cues from validators, so I think we shouldn't do this.
> ; appears at the beginning of lines in 15/10000 files and most don't look
> like they're intended as comments.
> # appears at the beginning of lines in 244/10000 files and most don't look
> like they're intended as comments.
> /* only appears in 3/10000 files, so CSS-style comments might work, but
> does add some complexity
> // appears at the beginning of lines in 5/10000 files and most look like
> that *are* intended as comments or are garbage, so it should work.
> (data from OpenSubtitles sample)

This convinces me only more that we should break with SRT history.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100911/92a9df11/attachment-0002.htm>

More information about the whatwg mailing list