[whatwg] Timed tracks: feedback compendium

Philip Jägenstedt philipj at opera.com
Fri Oct 22 01:19:41 PDT 2010


On Tue, 19 Oct 2010 22:35:50 +0200, Silvia Pfeiffer  
<silviapfeiffer1 at gmail.com> wrote:

> On Tue, Sep 14, 2010 at 7:49 PM, Philip Jägenstedt <philipj at opera.com>  
> wrote:
>> On Tue, 14 Sep 2010 10:30:03 +0200, Simon Pieters <simonp at opera.com>  
>> wrote:
>>
>>> On Tue, 14 Sep 2010 10:11:16 +0200, Philip Jägenstedt  
>>> <philipj at opera.com>
>>> wrote:
>>>
>>>> The point of a header is that browsers can identify WebSRT files and  
>>>> not
>>>> keep parsing through a 100GB movie file,
>>>
>>> I don't think we should break SRT compat for this. I don't think this  
>>> is a
>>> problem at all. We already have this situation elsewhere, e.g. what if  
>>> you
>>> do <link rel=stylesheet href=movie.webm>?
>>>
>>> If it really turns out to be a problem you could just apply the  
>>> hardware
>>> limitations clause and abort parsing if you haven't found any cues  
>>> after
>>> parsing X bytes or whatever.
>>>
>>> In any case, the spec currently requires text/srt (or other supported
>>> subtitle format MIME type) for <track>, so a movie file would be  
>>> rejected
>>> based on the MIME type per spec (see step 4 in
>>> #sourcing-out-of-band-timed-tracks).
>>>
>>
>> Well, I was hoping to sidestep the issue of MIME types and file  
>> extensions
>> by always ignoring them. Last I checked Apache doesn't have a default
>> mapping for .srt, so everyone using <track> would have to add it  
>> themselves.
>>
>> About metadata, I noticed that there's a voice called <credit>...
>
> I think that's only for the credits at the start or end of a movie.
>
>
>
> Anyway: I'm trying to summarize the changes that were discussed this
> far to WebSRT. I think we have the following:
>
> * add a header to identify the kind of websrt file & the language
> * add a means to add metadata as name-value pairs
>
> e.g.
> WebSRT
> language: en-US
> author: Frank
> date: 2010-09-20
> kind: subtitle
> copyright: WGBH, 2010
> license: CC-BY-SA, http://creativecommons.org/licenses/by-sa/3.0/

What should happen when the language in <track srclang> doesn't match the  
language in the file itself? Also, why is kind needed in the file?

> * add a means to add comments
>
> e.g.
> // Lines starting with // are comments

So far the web two comment syntaxes: <!-- SGML style --> and /* CSS style  
*/, so if we need comments I think we should pick one of these.

> And some changes on <track>:
> * make @kind a required attribute

Why was this?

> * add @type for mime type identification as we allow more than just
> WebSRT as external formats, e.g. TTML

Having more than one format seems to complicate rendering. The WebSRT  
rendering rules tries to avoid overlap between cues from different tracks,  
but I don't see how that could work between different formats, unless all  
formats have basically the same model. It certainly wouldn't work with a  
fixed-layout format like TTML. In other words, can't this wait until some  
implementor has shown concrete interest in implementing more than one  
format?

Anyway, I agree that at least a magic header like "WebSRT" is needed  
because of the horrors of legacy SRT parsing. Breaking SRT compat means  
that we can go back to requiring UTF-8 as the encoding. However, UTF-8  
does complicate the magic header a bit due to the possibility of a BOM  
[1]. While it would be nice to forbid the use of a BOM, I expect we'd then  
see lots of frustration from authors who's editors automatically insert  
it...

[1] http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

-- 
Philip Jägenstedt
Core Developer
Opera Software



More information about the whatwg mailing list