[whatwg] Timed tracks: feedback compendium

Fri Sep 10 06:00:11 PDT 2010

On Thu, 09 Sep 2010 15:08:43 +0200, Silvia Pfeiffer
<silviapfeiffer1 at gmail.com> wrote:

> On Wed, Sep 8, 2010 at 9:19 AM, Ian Hickson <ian at hixie.ch> wrote:
>
>>
>> On Fri, 23 Jul 2010, Philip Jägenstedt wrote:
>>
>> If we must have both kind=subtitles and kind=captions, then I'd suggest
>> > making the default subtitles, as that is without a doubt the most  
>> common
>> > kind of timed text. Making captions the default only means that most
>> > timed text will be mislabeled as being appropriate for the HoH when it
>> > is not.
>>
>> Ok, I've changed the default. However, I'm not fighting this battle if  
>> it
>> comes up again, and will just change it back if people don't defend  
>> having
>> this as the default. (And then change it back again if the browsers pick
>> "subtitles" in their implementations after all, of course.)
>>
>> Note that captions aren't just for users that are hard-of-hearing. Most  
>> of
>> the time when I use timed tracks, I want captions, because the reason I
>> have them enabled is that I have the sound muted.
>>
>
> Hmm, you both have good points. Maybe we should choose something as the
> default that is not visible on screen, such as "descriptions"? That would
> avoid the issue and make it explicit for people who provide captions or
> subtitles that they have to make a choice.

If we want people to make an explicit choice, we should make kind a
required attribute and make browsers ignore <track>s without it. (I think
subtitles is a good default though.)

>> > - Use existing technologies where appropriate.
>> > [...]
>> > > - Try as much as possible to have things Just Work.
>> >
>> > I think by specifying a standalone cue text parser WebSRT fails on  
>> these
>> > counts compared to reusing the HTML fragment parsing algorithm for
>> > parsing cue text.
>>
>> HTML parsing is a disaster zone that we should avoid at all costs,  
>> IMHO. I
>> certainly don't think it would make any sense to propagate that format
>> into anywhere where we don't absolutely have to propagate it.
>>
>
> A WebSRT authoring application does not have to create all markup that a
> HTML fragment parser supports. It would only use what it sees necessary  
> for
> the use cases that it targets.
>
> Browsers are WebSRT players that will consume the HTML fragments created  
> by
> such authoring applications.
> In addition, browsers will also be able to consume richer HTML fragments
> that were created as time-aligned overlays for video  with more fancy
> styling by Web developers. Something like
> http://people.mozilla.com/~prouget/demos/vp8/ (you need Firefox for it).
> Where it says "This movie will eat your planet", you could have fancy  
> timed
> text.
>
> Just as much as there is a need for basic captions and subtitles, there  
> is
> also a need for fancy time-aligned HTML fragments. It would be very  
> strange
> if, in order to get that working, people would need to use the "metadata"
> part of the WebSRT spec.

Is it likely that HTML in cues alone will be enough to do all the fancy
things that people might want? If they need scripts anyway, I'm very happy
to force them to use metadata, as it also makes the browser implementation
simpler (that's my opinion of the suggestions made so far, anyway).

> On Sun, 25 Jul 2010, Silvia Pfeiffer wrote:
>> >
>> > I think if we have a mixed set of .srt files out there, some of which
>> > are old-style srt files (with line numbers, without WebSRT markup) and
>> > some are WebSRT files with all the bells and whistles and with
>> > additional external CSS files, we create such a mess for that existing
>> > ecosystem that we won't find much love.
>>
>> I'm not sure our goal is to find love here, but in general I would agree
>> that it would be better to have one format than two. I don't see why we
>> wouldn't just have one format here though. The idea of WebSRT is to be
>> sufficiently backwards-compatible that that is possible.
>>
>
> With "finding love" I referred to your expressed goals:
>  - Keep implementation costs for standalone players low.
>  - Use existing technologies where appropriate.
>  - Try as much as possible to have things Just Work.
>
> With WebSRT, we will have one label for two different types of files: the
> old-style SRT files and the new WebSRT files. Just putting a single  
> label on
> them doesn't mean it is one format, in particular when most old files  
> will
> not be conformant to the new label and

Apart from the encoding, what else about old SRT files wouldn't
be conformant? Does it matter that they aren't conformant if they work
anyway?

> many new files will not play in the software created for the old spec.

As long as we don't add a header, the files will play in most existing
software. Apart from parsers that assume that SRT is plain text (and thus
would be unsuitable for much existing SRT content), what kind of breakage
have you found with WebSRT-specific syntax in existing software?

>> None is allowed today, but it would be relatively straight-forward to
>> introduce metadata before the cues (or even in between the cues). For
>> example, we could add defaults:
>>
>>   *
>>   DEFAULTS
>>   L:-1 T:50% A:middle
>>
>>   00:00:20,000 --> 00:00:24,400
>>   Altocumulus clouds occur between six thousand
>>
>>   00:00:24,600 --> 00:00:27,800
>>   and twenty thousand feet above ground level.
>>
>> We could add metadata (here using a different syntax that is similarly
>> backwards-compatible with what the spec parser does today):
>
>
>>   @charset --> win-1252
>>   @language --> en-US
>>
>>   00:00:20,000 --> 00:00:24,400
>>   Altocumulus clouds occur between six thousand
>>
>>   00:00:24,600 --> 00:00:27,800
>>   and twenty thousand feet above ground level.
>>
>>
>
> When I read the following:
> "A WebSRT file body consists of an optional U+FEFF BYTE ORDER MARK (BOM)
> character, followed by zero or more WebSRT line
> terminators<http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-line-terminator>,
> followed by zero or more WebSRT
> cues<http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-cue>
> separated
> from each other by two or more WebSRT line
> terminators<http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-line-terminator>,
> followed by zero or more WebSRT line
> terminators<http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-line-terminator>
> ."
> then that doesn't imply for me that we can add anything in front of the
> WebSRT cues without breaking the spec, or that we can define cues that  
> are
> not time ranges around the "-->" sign.

The parsing algorithm simply skips over things it doesn't recognize,
that's why adding basically any new syntax in between cues wouldn't break
existing WebSRT parsers. Allowing anything as part of the syntax is a bit
dangerous though, as most unrecognized stuff between cues are likely
broken cues. Validators should warn about it, not treat it as a comment.

Not being convinced we need anything more than simple key-value headers in
a header, I still looked at the options for comments:

Making any line with a --> in it be a comment would hide a lot of broken
cues from validators, so I think we shouldn't do this.

; appears at the beginning of lines in 15/10000 files and most don't look
like they're intended as comments.

# appears at the beginning of lines in 244/10000 files and most don't look
like they're intended as comments.

/* only appears in 3/10000 files, so CSS-style comments might work, but
does add some complexity

// appears at the beginning of lines in 5/10000 files and most look like
that *are* intended as comments or are garbage, so it should work.

(data from OpenSubtitles sample)

>> * there is no possibility to add file-wide metadata to WebSRT; things
>> > about authoring and usage rights as well as information about the  
>> media
>> > resource that the file relates to should be kept within the file.  
>> Almost
>> > all subtitle and caption format have the possibility for such metadata
>>
>> This is something we could add if there is a clear use case, but I'm not
>> sure that there is. Why does SRT not have it today?
>>
>
> Because SRT is a quick hack and the simplest format possible that  
> fulfills
> not even bare needs. :-)
> But seriously: most formats have metadata and I would rather go with  
> those
> experiences than with SRT in this respect.

I agree with Silvia. I often see various credits in the cues themselves.
Some are there for ego purposes, but I expect at least some of them would
end up in a metadata field if it existed. It's hard to get solid numbers,
but after some grepping and manual filtering it seems like around 5% of
files have some form of credits matching 'subtitle', 'translat' or
'caption' case-insensitively. I guess that many non-English subtitles
have the credits in another language, so the true percentage should be
higher.

>> * there is no magic identifier for a WebSRT resource, i.e. what the
>> > <wmml> element is for WMML. This makes it almost impossible to create  
>> a
>> > program to tell what file type this is, in particular since we have  
>> made
>> > the line numbers optional. We could use "-->" as an indicator, but  
>> it's
>> > not a good signature.
>>
>> Yeah, that's a problem. I considered adding "WEBSRT" at the start of  
>> every
>> file but we couldn't use it reliably since WebSRT parsers presumably  
>> want
>> to support SRT using the same parser, and that has no signature.
>>
>
> I continue to doubt that you can support WebSRT without changing your SRT
> parser. Thus, you might as well make such a change and make it easy for  
> SRT
> parsers to identify that it's a WebSRT file to parse and not legacy SRT.

Do you think that we can't make the WebSRT parser work well enough for
existing SRT content, or why would one want to use two different parsers?

>> So, let's stop pretending there is compatibility and just call WebSRT a
>> > new format.
>>
>> Compatibility has nothing to do with conformance. It has to do with what
>> user agents do. As far as I can tell, WebSRT is backwards-compatible  
>> with
>> legacy SRT user agents, and legacy SRT files are compatible with WebSRT
>> user agents as described by the spec.
>>
>
> Legacy SRT files contain many different character sets, which makes them
> non-conformant to WebSRT. I would not think that new WebSRT  
> implementations
> like what the Web browsers will need to implement should make exceptions
> from the spec to support non-conformant files and become compatible with
> legacy SRT files. That to me again confirms that these are two different
> formats. Yes, they can be supported by the same piece of code, but that
> doesn't make them the same format.

That's not really an exception, conformance (syntax) and parsing are just
different things. (It must be so, since making parsing more forgiving than
the syntax is what allows for future extensions of version-less formats on
the web.)

> On Wed, 18 Aug 2010, Silvia Pfeiffer wrote:
>> >
>> > It actually burns down to the question: do we want the simple SRT  
>> format
>> > to survive as its own format and be something that people can rely  
>> upon
>> > as not having "weird stuff" in it - or do we not. I believe that it's
>> > important that it survives.
>>
>> Does that format still exist? Is it materially different than WebSRT?
>>
>
> What do you mean? All existing SRT files adhere to the simple form of  
> SRT.

Lots of SRT files are broken in different ways, what is it they adhere to?

> None of the adhere to the WebSRT specification.

Perhaps many don't adhere to the WebSRT syntax (mostly due to not being  
UTF-8), but if they parse and render in the same way as they do in most  
existing SRT players, does it matter?

-- 
Philip Jägenstedt
Core Developer
Opera Software