[whatwg] Timed tracks: feedback compendium

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Wed Oct 27 04:54:59 PDT 2010


On Wed, Oct 27, 2010 at 8:53 PM, Philip Jägenstedt <philipj at opera.com> wrote:
> On Fri, 22 Oct 2010 13:49:00 +0200, Silvia Pfeiffer
> <silviapfeiffer1 at gmail.com> wrote:
>
>> On Fri, Oct 22, 2010 at 10:18 PM, Simon Pieters <simonp at opera.com> wrote:
>>>
>>> On Fri, 22 Oct 2010 13:09:24 +0200, Philip Jägenstedt <philipj at opera.com>
>>> wrote:
>>>
>>>>> Using <!-- --> is a bad idea since the WebSRT syntax already uses -->.
>>>>> I
>>>>> don't see the need for multiline comments.
>>>>
>>>> Right. If we must have comments I think I'd prefer /* ... */ since both
>>>> CSS and JavaScript have it, and I can't see that single-line comments
>>>> will
>>>> be easier from a parser perspective.
>>>
>>> Line comments seem better from a compat perspective (you wouldn't get
>>> commented out stuff appear as cues in legacy parsers).
>>
>> Philip's research earlier from this thread was as follows:
>>
>> ; appears at the beginning of lines in 15/10000 files and most don't look
>> like they're intended as comments.
>>
>> # appears at the beginning of lines in 244/10000 files and most don't look
>> like they're intended as comments.
>>
>> /* only appears in 3/10000 files, so CSS-style comments might work, but
>> does add some complexity
>>
>> // appears at the beginning of lines in 5/10000 files and most look like
>> that *are* intended as comments or are garbage, so it should work.
>>
>> (data from OpenSubtitles sample)
>>
>> which seems to support the choice of //.
>
> Note that this was assuming that WebSRT should be an extension of SRT. If
> that's not true, we can choose more freely.
>
>> I do wonder what the lines that start with ; or # contained though.
>
> ; look mostly like typos, sometimes where " was intended.
>
> # seems to have been mostly used as some kind of emphasis, with # sentences
> like this #
>
> Note, that lots of the files are in languages and encodings unknown to me,
> so my guesses shouldn't be taken too seriously. It's obvious that if WebSRT
> is an extension of SRT (which I no longer think is a good idea), then *some*
> content will break.


I recently came across the mpsub format, see
http://www.mplayerhq.hu/DOCS/tech/mpsub.sub . It has name-value pairs
at the start for file-wide metadata and uses # for comments. (It also
has a weird time stamp format which I would ignore.) Actually, the
name-value pairs make sense to me, and we could use the # for comments
as an analogy to scripting languages, where # is often the sign for
comments. OTOH we could use // and /* */ in analogy with C/C++ for
comments which would cover both, single-line and multi-line comments
and thus be more flexible.

Silvia.



More information about the whatwg mailing list