[whatwg] Timed tracks: feedback compendium

Fri Oct 22 02:21:44 PDT 2010

On Fri, Oct 22, 2010 at 7:19 PM, Philip Jägenstedt <philipj at opera.com> wrote:
> On Tue, 19 Oct 2010 22:35:50 +0200, Silvia Pfeiffer
> <silviapfeiffer1 at gmail.com> wrote:
>
>> On Tue, Sep 14, 2010 at 7:49 PM, Philip Jägenstedt <philipj at opera.com>
>> wrote:
>>>
>>> On Tue, 14 Sep 2010 10:30:03 +0200, Simon Pieters <simonp at opera.com>
>>> wrote:
>>>
>>>> On Tue, 14 Sep 2010 10:11:16 +0200, Philip Jägenstedt
>>>> <philipj at opera.com>
>>>> wrote:
>>>>
>>>>> The point of a header is that browsers can identify WebSRT files and
>>>>> not
>>>>> keep parsing through a 100GB movie file,
>>>>
>>>> I don't think we should break SRT compat for this. I don't think this is
>>>> a
>>>> problem at all. We already have this situation elsewhere, e.g. what if
>>>> you
>>>> do <link rel=stylesheet href=movie.webm>?
>>>>
>>>> If it really turns out to be a problem you could just apply the hardware
>>>> limitations clause and abort parsing if you haven't found any cues after
>>>> parsing X bytes or whatever.
>>>>
>>>> In any case, the spec currently requires text/srt (or other supported
>>>> subtitle format MIME type) for <track>, so a movie file would be
>>>> rejected
>>>> based on the MIME type per spec (see step 4 in
>>>> #sourcing-out-of-band-timed-tracks).
>>>>
>>>
>>> Well, I was hoping to sidestep the issue of MIME types and file
>>> extensions
>>> by always ignoring them. Last I checked Apache doesn't have a default
>>> mapping for .srt, so everyone using <track> would have to add it
>>> themselves.
>>>
>>> About metadata, I noticed that there's a voice called <credit>...
>>
>> I think that's only for the credits at the start or end of a movie.
>>
>>
>>
>> Anyway: I'm trying to summarize the changes that were discussed this
>> far to WebSRT. I think we have the following:
>>
>> * add a header to identify the kind of websrt file & the language
>> * add a means to add metadata as name-value pairs
>>
>> e.g.
>> WebSRT
>> language: en-US
>> author: Frank
>> date: 2010-09-20
>> kind: subtitle
>> copyright: WGBH, 2010
>> license: CC-BY-SA, http://creativecommons.org/licenses/by-sa/3.0/
>
> What should happen when the language in <track srclang> doesn't match the
> language in the file itself?

Since the attributes in <track> are a hint, probably what is available
in the file should overrule what is in the <track> attributes. It is
the same for the @charset attribute, which is overruled to utf-8 for
WebSRT IIRC.

You could on the other hand argue that the Web developer wanted to
override a falsely specified language, but I believe it is much more
likely that the author of the file knows the correct language rather
than the Web page author, since the first is closer to authoring the
file.

> Also, why is kind needed in the file?

For those occasions where no Web page with <track> is available, i.e.
offline applications such as vlc.

>
>> * add a means to add comments
>>
>> e.g.
>> // Lines starting with // are comments
>
> So far the web two comment syntaxes: <!-- SGML style --> and /* CSS style
> */, so if we need comments I think we should pick one of these.

I'm not fussed. I thought your analysis pointed to //, which is also
nicer because it takes the full line into account without a need for
end tags. Also, it is common from C++ and other programming languages.
But I don't really mind - we just need a decision and reasons for why.

>> And some changes on <track>:
>> * make @kind a required attribute
>
> Why was this?

Earlier in this thread we were discussing the differences between
subtitles and captions. And one solution to making sure people picked
the right @kind instead of the default always catching and possibly
catching the wrong types is to make @kind a required attribute. I
support that idea.

>> * add @type for mime type identification as we allow more than just
>> WebSRT as external formats, e.g. TTML
>
> Having more than one format seems to complicate rendering. The WebSRT
> rendering rules tries to avoid overlap between cues from different tracks,
> but I don't see how that could work between different formats, unless all
> formats have basically the same model. It certainly wouldn't work with a
> fixed-layout format like TTML. In other words, can't this wait until some
> implementor has shown concrete interest in implementing more than one
> format?

That interest has already been expressed multiple times in the W3C
HTML5 accessibility task force and here, too, I think. Hixie confirmed
in an email [1] that the element is indeed meant to be usable with
other formats. That doesn't mean that browsers need to implement more
than the baseline. I would personally support having WebSRT as the
baseline.

[1] http://lists.w3.org/Archives/Public/public-html-a11y/2010Aug/0020.html

> Anyway, I agree that at least a magic header like "WebSRT" is needed because
> of the horrors of legacy SRT parsing. Breaking SRT compat means that we can
> go back to requiring UTF-8 as the encoding. However, UTF-8 does complicate
> the magic header a bit due to the possibility of a BOM [1]. While it would
> be nice to forbid the use of a BOM, I expect we'd then see lots of
> frustration from authors who's editors automatically insert it...
>
> [1] http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

I'm happy to enforce UTF-8 on WebSRT. The @charset can work for other
formats. I didn't know about the BOM problem - but having read it, I
would think it makes sense to forbid it. What tools do and how they
deal with erroneous files is a different matter.

Cheers,
Silvia.