[whatwg] Fwd: Discussing WebSRT and alternatives/improvements

Wed Aug 11 08:26:33 PDT 2010

On Wed, 11 Aug 2010 15:38:32 +0200, Silvia Pfeiffer  
<silviapfeiffer1 at gmail.com> wrote:

> On Wed, Aug 11, 2010 at 10:30 PM, Philip Jägenstedt  
> <philipj at opera.com>wrote:
>
>> On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer <
>> silviapfeiffer1 at gmail.com> wrote:
>>
>>  On Tue, Aug 10, 2010 at 7:49 PM, Philip Jägenstedt <philipj at opera.com
>>> >wrote:
>>>
>>> I have checked the parse spec and
>>> http://www.whatwg.org/specs/web-apps/current-work/#tag-open-state  
>>> indeed
>>> implies that a tag starting with a number is a parse error. Both, the
>>> timestamps and the voice markers thus seem problems when going with an
>>> innerHTML parser. Is there a way to resolve this? I mean: I'd quite
>>> happily
>>> drop the voice markers for a <span @class> but I am not sure what to do
>>> about the timestamps. We could do what I did in WMML and introduce a  
>>> <t>
>>> element with the timestamp as a @at attribute, but that is again more
>>> verbose. We could also introduce an @at attribute in <span> which would
>>> then
>>> at least end up in the DOM and can be dealt with specially.
>>>
>>
>> What should numerical voices be replaced with? Personally I'd much  
>> rather
>> write <philip> and <silvia> to mark up a conversation between us two,  
>> as I
>> think it'd be quite hard to keep track of the numbers if editing  
>> subtitles
>> with many different speakers. However, going with that and using an HTML
>> parser is quite a hack. Names like <mark> and <li> may already have  
>> special
>> parsing rules or default CSS.
>>
>
> In HTML it is <span class="philip">..</span> and <span
> class="silvia">...</span>. I don't see anything wrong with that. And it's
> only marginally longer than <philip> ... </philip> and  
> <silvia>...</silvia>.
>
>> Going with HTML in the cues, we either have to drop voices and inner
>> timestamps or invent new markup, as HTML can't express either. I don't  
>> think
>> either of those are really good solutions, so right now I'm not  
>> convinced
>> that reusing the innerHTML parser is a good way forward.
>
>
> I don't see a need for the voices - they already have markup in HTML, see
> above. But I do wonder about the timestamps. I'd much rather keep the
> innerHTML parser if we can, but I don't know enough about how the  
> timestamps
> could be introduced in a non-breakable manner. Maybe with a data-  
> attribute?
> Maybe <span data-t="00:00:02.100">...</span>?

data- attributes are reserved for use by scripts on the same page, but we  
*could* of course introduce new elements or attributes for this purpose.  
However, adding features to HTML only for use in WebSRT seems a bit odd.

>>  That would make text/srt and text/websrt synonymous, which is kind of
>>>> pointless.
>>>>
>>>
>>>
>>> No, it's only pointless if you are a browser vendor. For everyone else  
>>> it
>>> is
>>> a huge advantage to be able to choose between a guaranteed simple  
>>> format
>>> and
>>> a complex format with all the bells and whistles.
>>>
>>>
>>>
>>>  The advantages of taking text/srt is that all existing software to  
>>> create
>>>> SRT can be used to create WebSRT
>>>>
>>>
>>>
>>> That's not strictly true. If they load a WebSRT file that was created  
>>> by
>>> some other software for further editing and that WebSRT file uses  
>>> advanced
>>> WebSRT functionality, the authoring software will break.
>>>
>>
>> Right, especially settings appended after the timestamps are quite  
>> likely
>> to be stripped when saving the file.
>
>
> Or may even break the software if it's badly implemented, or may end up
> inside the cue text - just like the other control instructions which will
> end up as plain text inside the cue. You won't believe how many people  
> have
> pointed out to me that my SRT test parser exposed an <i> tag markup in  
> the
> cue text rather than interpreting it, when I was experimenting with  
> applying
> SRT cues in a HTML div without touching the cue text content. Extraneous
> markup is really annoying.

Indeed, but given the option of seeing no subtitles at all and seeing some  
markup from time to time, which do you prefer? For a long time I was using  
a media player that didn't handle "HTML" in SRT and wasn't very amused at  
seeing <i> and similar, but it was sure better than no subtitles at all. I  
doubt it will take long for popular software to start ignoring things  
trailing the timestamp and things in square brackets, which is all you  
need for basic "compatibility". Some of the tested software already does  
so.

>>  and servers that already send text/srt don't need to be updated. In  
>> either
>>>> case I think we should support only one mime type.
>>>>
>>>
>>>
>>> What's the harm in supporting two mime types but using the same parser  
>>> to
>>> parse them?
>>>
>>
>> Most content will most likely be plain old SRT without voices, <ruby> or
>> similar. People will create them using existing software with the .srt
>> extension and serve them using the text/srt MIME type. When they later
>> decide to add some <ruby> or similar, it will just work without  
>> changing the
>> extension or MIME type. The net result is that text/srt and text/websrt  
>> mean
>> exactly the same thing, making it a wasted effort.
>
>
> From a Web browser perspective, yes. But not from a caption authoring
> perspective. At first, I would author a SRT file. Later, I want to add  
> some
> fancy stuff. So, I load it into the application again. Then I add the  
> fancy
> stuff. It tells me that I cannot save it as SRT, but have to save it as
> WebSRT, so I don't lose the information. Good! Now, the pipeline that I  
> have
> set up for SRT files transcoding and burning onto video and which cannot  
> yet
> deal with WebSRT will not accept the WebSRT file. Good again! Makes me
> extend my pipeline or go to the provider and upgrade my software, so I  
> get
> the full feature support and the correct rendering. Excellent.

I think that as long as WebSRT is mostly compatible with SRT then people  
will keep using SRT tools, with the occasional mishap and disaster. I  
won't deny that it breaks expectations of what SRT is, but the alternative  
is to make WebSRT fundamentally incompatible so that not even media  
frameworks that rely on sniffing would treat it as SRT. However, unless  
<track> is a complete failure other applications will eventually want to  
support the format that browsers support, so inventing something  
completely new has a high cost too.

Since browser vendors get all the benefits and none of the problems it  
would be a mistake to only listen to us, of course. It might be worthwhile  
contacting developers of applications like VLC, Totem or MPlayer and ask  
precisely how annoyed they would be if suddenly one day they had to tweak  
their SRT parser because of WebSRT.

>>> Do you find MPlayer's behavior annoying because by rescaling already
>>> rendered text, the text loses resolution and becomes less readable?  
>>> This
>>> is
>>> definitely not the behaviour I am after.
>>>
>>
>> Scaling with the video is annoying with small videos, as the text ends  
>> up
>> being huge in fullscreen. I assume we're going to do scaling as well as  
>> we
>> can, so that's not an argument in either direction.
>>
>> I'll have to withdraw any opinion for now, I don't know how to best deal
>> with this.
>
>
>
> Yes, I can imagine that on small video it's bad to scale the text down  
> with
> the video, since it becomes unreadable. I thought that a solution would  
> be
> to define the screen size for which the text was written and then scale  
> the
> text with the video. But maybe there is a function that needs to be  
> applied
> where there is a minimum font size below which one cannot go and a  
> maximum
> font size above which it's bad, too. It seems that scaling text at the  
> same
> rate as video is not appropriate. I wonder if there is an optimal  
> function
> that people have found to be best? Worth doing some experiments I guess.

A bit of a guess, but perhaps some combination of CSS3's calc(), min() and  
max() could help. If there's some particularly suitable formula, perhaps  
it could even be made part of the default stylesheet.

-- 
Philip Jägenstedt
Core Developer
Opera Software