[whatwg] Timed tracks: feedback compendium

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Fri Oct 22 02:21:44 PDT 2010

On Fri, Oct 22, 2010 at 7:19 PM, Philip Jägenstedt <philipj at opera.com> wrote:
> On Tue, 19 Oct 2010 22:35:50 +0200, Silvia Pfeiffer
> <silviapfeiffer1 at gmail.com> wrote:
>> On Tue, Sep 14, 2010 at 7:49 PM, Philip Jägenstedt <philipj at opera.com>
>> wrote:
>>> On Tue, 14 Sep 2010 10:30:03 +0200, Simon Pieters <simonp at opera.com>
>>> wrote:
>>>> On Tue, 14 Sep 2010 10:11:16 +0200, Philip Jägenstedt
>>>> <philipj at opera.com>
>>>> wrote:
>>>>> The point of a header is that browsers can identify WebSRT files and
>>>>> not
>>>>> keep parsing through a 100GB movie file,
>>>> I don't think we should break SRT compat for this. I don't think this is
>>>> a
>>>> problem at all. We already have this situation elsewhere, e.g. what if
>>>> you
>>>> do <link rel=stylesheet href=movie.webm>?
>>>> If it really turns out to be a problem you could just apply the hardware
>>>> limitations clause and abort parsing if you haven't found any cues after
>>>> parsing X bytes or whatever.
>>>> In any case, the spec currently requires text/srt (or other supported
>>>> subtitle format MIME type) for <track>, so a movie file would be
>>>> rejected
>>>> based on the MIME type per spec (see step 4 in
>>>> #sourcing-out-of-band-timed-tracks).
>>> Well, I was hoping to sidestep the issue of MIME types and file
>>> extensions
>>> by always ignoring them. Last I checked Apache doesn't have a default
>>> mapping for .srt, so everyone using <track> would have to add it
>>> themselves.
>>> About metadata, I noticed that there's a voice called <credit>...
>> I think that's only for the credits at the start or end of a movie.
>> Anyway: I'm trying to summarize the changes that were discussed this
>> far to WebSRT. I think we have the following:
>> * add a header to identify the kind of websrt file & the language
>> * add a means to add metadata as name-value pairs
>> e.g.
>> WebSRT
>> language: en-US
>> author: Frank
>> date: 2010-09-20
>> kind: subtitle
>> copyright: WGBH, 2010
>> license: CC-BY-SA, http://creativecommons.org/licenses/by-sa/3.0/
> What should happen when the language in <track srclang> doesn't match the
> language in the file itself?

Since the attributes in <track> are a hint, probably what is available
in the file should overrule what is in the <track> attributes. It is
the same for the @charset attribute, which is overruled to utf-8 for

You could on the other hand argue that the Web developer wanted to
override a falsely specified language, but I believe it is much more
likely that the author of the file knows the correct language rather
than the Web page author, since the first is closer to authoring the

> Also, why is kind needed in the file?

For those occasions where no Web page with <track> is available, i.e.
offline applications such as vlc.

>> * add a means to add comments
>> e.g.
>> // Lines starting with // are comments
> So far the web two comment syntaxes: <!-- SGML style --> and /* CSS style
> */, so if we need comments I think we should pick one of these.

I'm not fussed. I thought your analysis pointed to //, which is also
nicer because it takes the full line into account without a need for
end tags. Also, it is common from C++ and other programming languages.
But I don't really mind - we just need a decision and reasons for why.

>> And some changes on <track>:
>> * make @kind a required attribute
> Why was this?

Earlier in this thread we were discussing the differences between
subtitles and captions. And one solution to making sure people picked
the right @kind instead of the default always catching and possibly
catching the wrong types is to make @kind a required attribute. I
support that idea.

>> * add @type for mime type identification as we allow more than just
>> WebSRT as external formats, e.g. TTML
> Having more than one format seems to complicate rendering. The WebSRT
> rendering rules tries to avoid overlap between cues from different tracks,
> but I don't see how that could work between different formats, unless all
> formats have basically the same model. It certainly wouldn't work with a
> fixed-layout format like TTML. In other words, can't this wait until some
> implementor has shown concrete interest in implementing more than one
> format?

That interest has already been expressed multiple times in the W3C
HTML5 accessibility task force and here, too, I think. Hixie confirmed
in an email [1] that the element is indeed meant to be usable with
other formats. That doesn't mean that browsers need to implement more
than the baseline. I would personally support having WebSRT as the

[1] http://lists.w3.org/Archives/Public/public-html-a11y/2010Aug/0020.html

> Anyway, I agree that at least a magic header like "WebSRT" is needed because
> of the horrors of legacy SRT parsing. Breaking SRT compat means that we can
> go back to requiring UTF-8 as the encoding. However, UTF-8 does complicate
> the magic header a bit due to the possibility of a BOM [1]. While it would
> be nice to forbid the use of a BOM, I expect we'd then see lots of
> frustration from authors who's editors automatically insert it...
> [1] http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

I'm happy to enforce UTF-8 on WebSRT. The @charset can work for other
formats. I didn't know about the BOM problem - but having read it, I
would think it makes sense to forbid it. What tools do and how they
deal with erroneous files is a different matter.


More information about the whatwg mailing list