[whatwg] Fwd: Discussing WebSRT and alternatives/improvements

Mon Aug 23 01:55:33 PDT 2010

On Sat, 21 Aug 2010 01:32:49 +0200, Silvia Pfeiffer  
<silviapfeiffer1 at gmail.com> wrote:

> On Fri, Aug 20, 2010 at 10:53 PM, Philip Jägenstedt  
> <philipj at opera.com>wrote:
>
>> On Wed, 18 Aug 2010 00:42:04 +0200, Silvia Pfeiffer <
>> silviapfeiffer1 at gmail.com> wrote:
>>
>>  On Thu, Aug 12, 2010 at 6:09 PM, Philip Jägenstedt <philipj at opera.com
>>> >wrote:
>>>
>>> Yeah, so the only conforming solution is probably to use CSS3
>>> transition-delay property. That may not be the most elegant solution,  
>>> but
>>> it
>>> works.
>>>
>>
>> So, it seems clear that in order to use an HTML parser we have to  
>> sacrifice some features or make them more verbose.
>
>
> That sounds like there are multiple problems, when in fact we are only
> talking about the single use case of timestamps.

I was referring also to the voices markup which is made much more verbose.

> All other requirements are
> met by the existing innerHTML parser. Is it really necessary to throw out
> all the advantages of re-using innerHTML just to avoid some extra markup  
> for this single use case?

No, this isn't a critical use case in itself. I'm not fundamentally  
opposed to using an HTML parser, I just don't see any great benefits, but  
some complications.

>> The whole of the WebSRT parser isn't very big or complicated, so I don't
>> think implementation cost is a strong argument for reusing the HTML  
>> parser,
>> especially since at least the timing syntax needs a separate parser.
>
>
>
> It's not just about implementation cost - it's also the problem of
> maintaining another spec that can grow to have eventually all the  
> features
> that HTML5 has and more. Do you really eventually want to re-spec and
> re-implement a whole innerHTML parser plus the extra <t> element when we
> start putting <svg> and <canvas> and all sorts of other more complex HTML
> features into captions? Just because the <t> element is making trouble  
> now?
> Is this really the time to re-invent HTML?

I don't expect that SVG, <canvas>, images, etc will ever natively be made 
part of captions. Rather, I would hope that the metadata state together 
with scripts is used. If we think that e.g. images in captions are an 
important use case, then WebSRT is not a good solution.

If we allow arbitrary HTML and expect browsers to handle it well, it adds  
some complexity. For example, any videos and images in the cue would have  
to be fully loaded and ready to be decoded by the time the cue is to be  
shown, which I really don't want to implement the logic for. Simply having  
an iframe-like container where the document is replaced for each cue  
wouldn't be enough, rather one would have to create one document per cue  
during parsing and wait for all of those to finish loading before  
beginning playback. I'm not sure, but I'm guessing that amounts to  
significant memory overhead.

As an aside, I personally see it as a good things that *doesn't* 
work in WebSRT, whereas it would using an HTML parser.

>> It's a bit more than just annoying to users. If there are automated
>>> processes involved that print that stuff on tape for example, you can  
>>> burn
>>> through a lot of material and money before realising that your input  
>>> files
>>> are "broken" and if you cannot get software support for the new files
>>> implemented, you may need to implement costly manual checking of the
>>> files.
>>>
>>
>> SRT as it is today can and does contain broken timestamps, missing
>> linebreaks and at least <i>, <b>, <u> and <font ...> markup, some of  
>> which
>> is broken. If anyone is able to to rely on their input as being  
>> well-formed
>> enough as to be put through automatic but costly processes, they'd have  
>> to
>> have very good control of where their input comes from. I can't see how
>> WebSRT would change that.
>
>
> I would indeed expect a fairly trusted relationship with the supplier.  
> But
> assuming your supplier changes from SRT to WebSRT support in their  
> captions.
> If they have two different file extensions, you will notice immediately  
> and
> there is a trigger to actually start implementing WebSRT support. If they
> are the same file extension, that will cause the trouble I explained. If  
> at
> least there was a version identifier in existing SRT, then we wouldn't  
> have
> that trouble at all. But we've had this discussion.
>
>
>
>>  The core "problem" is that WebSRT is far too compatible with existing  
>> SRT
>>>> usage. Regardless of the file extension and MIME type used, it's quite
>>>> improbable that anyone will have different parsers for the same  
>>>> format.
>>>> Once
>>>> media players have been forced to handle the extra markup in WebSRT  
>>>> (e.g.
>>>> by
>>>> ignoring it, as many already do) the two formats will be the same, and
>>>> using
>>>> WebSRT markup in .srt files will just work, so that's what people will
>>>> do.
>>>> We may avoid being seen as arrogant format-hijackers, but the end  
>>>> result
>>>> is
>>>> two extensions and two different MIME types that mean exactly the same
>>>> thing.
>>>>
>>>
>>>
>>> It actually burns down to the question: do we want the simple SRT  
>>> format
>>> to
>>> survive as its own format and be something that people can rely upon as
>>> not
>>> having "weird stuff" in it - or do we not. I believe that it's  
>>> important
>>> that it survives. WebSRT can have absolutely anything in it, including
>>> code
>>> and binary data, even if that stuff would not be interpreted in a  
>>> browser,
>>> but handed on to the JavaScript API for a JavaScript routine to do
>>> something
>>> with it. It is a great extensible platform. But the advantage of SRT is
>>> that
>>> it is simple and reliably simple. We completely remove this option by
>>> stealing the format.
>>>
>>
>> I've collected some statistics on existing SRT content that I intend to
>> publish soonish. For now, I'll just note that >50% contain some form of
>> markup. Adding to this various ways in which the files could be broken,  
>> it
>> seems to me that SRT as deployed is neither really simple nor reliable.
>> Private use of SRT is of course simple and reliable, but that will be  
>> true
>> in the future too.
>>
>
> Honestly, using the existing small mess around SRT as an excuse to turn  
> it
> into a huge mess doesn't seem a good argument to me.

I'm just saying that SRT isn't a plain text format today and anyone who's  
able to assume it is can only do so because they control the input.

Deployed SRT uses , , and . WebSRT adds <ruby>, <rt> and 
<1>...<infinity>, extensions which are very much in line with the existing 
format and already "works" in many players (in the sense that they are 
ignored, not rendered). I wouldn't call that a huge mess.

>> Aside: WebSRT can't contain binary data, only UTF-8 encoded text.
>
>
> It sure can. Just base-64 encode it. I'm not saying it's a good thing,  
> but
> if somebody really has an urge...

Sure, this would be a metadata track. Sites have no reason to offer  
download links to it, and if anyone gets hold of such a file it would  
quickly be evident that it's useless.

>>   Since browser vendors get all the benefits and none of the problems it
>>>>
>>>>> would be a mistake to only listen to us, of course. It might be
>>>>>> worthwhile
>>>>>> contacting developers of applications like VLC, Totem or MPlayer and
>>>>>> ask
>>>>>> precisely how annoyed they would be if suddenly one day they had to
>>>>>> tweak
>>>>>> their SRT parser because of WebSRT.
>>>>>>
>>>>>>
>>>>>
>>>>> Some of them have already spoken:
>>>>> http://forum.doom9.org/showthread.php?p=1396576 "Extending SRT is a
>>>>> very
>>>>> bad
>>>>> idea" etc etc. Also, I've had feedback from other subtitle  
>>>>> professionals
>>>>> that are also against extending SRT, the main reasons being to break
>>>>> existing working software environments.
>>>>>
>>>>>
>>>> The only way to really avoid messing with the ecosystem is to invent a
>>>> completely new format. The choice is between something that won't  
>>>> work at
>>>> all in non-browsers and something that will mostly work.
>>>>
>>>
>>>
>>>
>>> If you look at it realistically, we *are* inventing a completely new
>>> format.
>>> WebSRT only on the surface has some resemblance with SRT. When you dig
>>> deeper, it is a completely different format with different aims and
>>> applications. Yes, it covers all the SRT aims and applications, but it
>>> does
>>> so much more! Only some of it will work in non-browsers, others will
>>> utterly
>>> fail and will completely disrupt an already working ecosystem. I think  
>>> it
>>> may even have a really bad effect if we introduce WebSRT as SRT in that
>>> authoring software will refrain from implementing support for the  
>>> richer
>>> features in order not to disrupt the existing software ecosystem. In  
>>> the
>>> end
>>> we might end up with a lot of unsupported features in WebSRT an no real
>>> progress. I much prefer having progress with a transition period with
>>> conscious decisions to support the extra features.
>>>
>>
>> As long as WebSRT is similar enough to SRT that software developers can  
>> use
>> the same parser for both, they will effectively become the same format.
>
>
> There is a difference between "being the same format" and "superceding".  
> I
> believe strongly that WebSRT will supercede SRT. But if we make it the  
> "same
> format", we simply inherit the mess that already exists. All those broken
> SRT files will continue to be broken WebSRT files. Just by taking over  
> the
> format, we will not magically do away with the existing mess.

Right, if the syntax is largely compatible with SRT then all the mess of  
SRT has to be dealt with. Not dealing with it would be quite  
irresponsible, in which case we should make the syntax fundamentally  
incompatible with SRT and forget about it.

Adding a magic number and header to WebSRT might be a good thing to do for  
various reasons, but it's not enough to make the format incompatible with  
all SRT parsers.

>> If we define WebSRT in a way that can handle >99% of existing content  
>> and
>> degrade gracefully (enough) when using new features in old software, it
>> seems reasonable to do. If lots of software developers cry foul, then
>> perhaps we should reconsider. It seems to me, though, that actually
>> researching and defining a good algorithm for parsing SRT would be of  
>> use to
>> others than just browsers.
>>
>
> How is that different from moving away from SRT. If everyone has to  
> change
> their parsing of SRT to accommodate a new spec, then that is a new  
> format.

Not everyone has to change their parsers immediately, many will continue  
to work. However, if someone wants to support SRT in a compatible way,  
it's very helpful to have a spec, assuming that WebSRT is actually  
compatible enough with existing SRT content.

This is quite similar to HTML4 vs HTML5. There are lots of mostly  
compatible HTML parsers, but HTML5 defines a single parsing algorithm, and  
slow convergence towards that is a good thing.

If the SRT ecosystem is so fragile that it cannot tolerate any extension  
whatsoever, then we should stay far away from it. It just seems that's not  
the case.

-- 
Philip Jägenstedt
Core Developer
Opera Software