[whatwg] Timed tracks: feedback compendium
Philip Jägenstedt
philipj at opera.com
Fri Oct 22 04:09:24 PDT 2010
On Fri, 22 Oct 2010 11:45:24 +0200, Simon Pieters <simonp at opera.com> wrote:
> On Fri, 22 Oct 2010 11:21:44 +0200, Silvia Pfeiffer
> <silviapfeiffer1 at gmail.com> wrote:
>
>> Since the attributes in <track> are a hint, probably what is available
>> in the file should overrule what is in the <track> attributes. It is
>> the same for the @charset attribute, which is overruled to utf-8 for
>> WebSRT IIRC.
>
> No, charset="" overrules the encoding for WebSRT per spec.
We should just remove charset="" from the spec.
>>>> * add a means to add comments
>>>>
>>>> e.g.
>>>> // Lines starting with // are comments
>>>
>>> So far the web two comment syntaxes: <!-- SGML style --> and /* CSS
>>> style
>>> */, so if we need comments I think we should pick one of these.
>
> Actually there are three more in javascript:
>
> // line comment
> <!-- line comment
> --> line comment
>
> http://wiki.whatwg.org/wiki/Web_ECMAScript#HTML_comments
>
>
>> I'm not fussed. I thought your analysis pointed to //, which is also
>> nicer because it takes the full line into account without a need for
>> end tags. Also, it is common from C++ and other programming languages.
>> But I don't really mind - we just need a decision and reasons for why.
>
> Using <!-- --> is a bad idea since the WebSRT syntax already uses -->. I
> don't see the need for multiline comments.
Right. If we must have comments I think I'd prefer /* ... */ since both
CSS and JavaScript have it, and I can't see that single-line comments will
be easier from a parser perspective.
>>> Anyway, I agree that at least a magic header like "WebSRT" is needed
>>> because
>>> of the horrors of legacy SRT parsing.
>
> I don't see why we can't just consume the legacy and support it in
> WebSRT. Part of the point with WebSRT is to support the legacy. If we
> don't want to support the legacy, then the format can be made a lot
> cleaner.
Did you read
<http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-October/028799.html>
and look at <http://ale5000.altervista.org/subtitles.htm>?
Do you think it's a good idea to make WebSRT an extension of ale5000-SRT?
My opinion is that it's not a very good idea, which of course we can
simplify some aspects of the format. For example, we don't need to allow
both , and . as the millisecond separator, and the time parsing in general
can be made more sane.
>>> Breaking SRT compat means that we can
>>> go back to requiring UTF-8 as the encoding. However, UTF-8 does
>>> complicate
>>> the magic header a bit due to the possibility of a BOM [1]. While it
>>> would
>>> be nice to forbid the use of a BOM, I expect we'd then see lots of
>>> frustration from authors who's editors automatically insert it...
>>>
>>> [1] http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
>>
>> I'm happy to enforce UTF-8 on WebSRT. The @charset can work for other
>> formats. I didn't know about the BOM problem - but having read it, I
>> would think it makes sense to forbid it. What tools do and how they
>> deal with erroneous files is a different matter.
>
> Forbidding it would be the frustration. Consider editing a WebSRT file
> in Notepad, and then suddenly it doesn't work anymore. Instead we should
> allow the BOM. (WebSRT already allows the BOM.)
This means that it's tricker to use "WebSRT" as the magic bytes, but I
agree it's probably the better trade-off.
--
Philip Jägenstedt
Core Developer
Opera Software
More information about the whatwg
mailing list