On Wed, Aug 25, 2010 at 7:20 PM, Philip Jägenstedt <span dir="ltr"><<a href="mailto:philipj@opera.com">philipj@opera.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div><div></div><div class="h5">On Wed, 25 Aug 2010 09:16:56 +0200, Silvia Pfeiffer <<a href="mailto:silviapfeiffer1@gmail.com" target="_blank">silviapfeiffer1@gmail.com</a>> wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Tue, Aug 24, 2010 at 8:49 PM, Philip Jägenstedt <<a href="mailto:philipj@opera.com" target="_blank">philipj@opera.com</a>>wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Tue, 24 Aug 2010 04:32:21 +0200, Silvia Pfeiffer <<br>
<a href="mailto:silviapfeiffer1@gmail.com" target="_blank">silviapfeiffer1@gmail.com</a>> wrote:<br>
<br>
On Mon, Aug 23, 2010 at 6:55 PM, Philip Jägenstedt <<a href="mailto:philipj@opera.com" target="_blank">philipj@opera.com</a><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
>wrote:<br>
<br>
Aside: WebSRT can't contain binary data, only UTF-8 encoded text.<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
</blockquote>
<br>
It sure can. Just base-64 encode it. I'm not saying it's a good thing,<br>
but<br>
if somebody really has an urge...<br>
<br>
<br>
</blockquote>
Sure, this would be a metadata track. Sites have no reason to offer<br>
download links to it, and if anyone gets hold of such a file it would<br>
quickly be evident that it's useless.<br>
<br>
</blockquote>
<br>
<br>
After a user has seen the crap on screen. I'm just saying: it's a legal<br>
WebSRT file and really not compatible with any existing infrastructure for<br>
SRT.<br>
<br>
</blockquote>
<br>
A fair point. The alternatives I can see are (1) using an incompatible<br>
format so that the user sees nothing or (2) adding a header that indicates<br>
that the track is metadata.<br>
<br>
In order to tell the user to stop wasting their time with this file, I<br>
think (1) is clearly worse. (2) is absolutely an option, but it will only<br>
make a difference to software that understands this header and if the header<br>
is optional it will likely often be omitted. A dialog saying "this is a<br>
metadata track, you can't watch it" is slightly friendlier than a screen<br>
full of crap, but they are both pretty effective at getting the message<br>
across.<br>
</blockquote>
<br>
<br>
<br>
Yeah, I'm totally for adding a hint as to what format is in the cue. Then, a<br>
WebSRT file can be identified as to what it contains.<br>
</blockquote>
<br></div></div>
OK, but note that a browser would ignore this and trust what <track kind> says. I wouldn't want the kind change after the external track is loaded, it would make the UI confusing if a captions track disappeared from the menu as soon as it was loaded because it internally claims to be metadata.</blockquote>
<div><br>Yes, I have no problem with that. Though I believe we have overloaded @kind with too much meaning as I already mentioned earlier [1]. I think it would make more sense to pull the different dimensions into different attributes:<br>
- @type or @format for the format of the cue<br>- @kind for the semantic meaning of it (subtitle, caption, karaoke etc) - one track could even satisfy several needs, so this would be a lit of kinds<br>- and finally the visual rendering problem, which could possibly be solved by providing a link to a div or p where the data should be rendered alternatively to the default. Right now, audio and metadata tracks get no rendering at all and I see that as a problem.<br>
<br><br>[1] <a href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027356.html">http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027356.html</a><br><br> <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im"><div></div><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
The question, then, is if parsers that handle the mentioned markup also<br>
ignore <1>, <ruby> and <rt>. I haven't tested it, but I assume that some<br>
will ignore it and some won't. How many percent of the media player market<br>
would have to handle this correctly for these extensions to be OK, in your<br>
opinion?<br>
</blockquote>
<br>
<br>
If a single one breaks, it would be bad IMO because the expectations of the<br>
users of that software will be broken even if it may just be a small<br>
percentage of users and we have no influence on the upgrade path of that<br>
software - in particular if it is proprietary.<br>
</blockquote>
<br></div>
Neither a new file extension, MIME type or header is enough to stop some implementations from treating it as SRT and break. The only remaining option, AFAICT, is making the format fundamentally incompatible with SRT. Is it worth it?</blockquote>
<div><br>If it has a different file extension and a different mime type and even a different header, I don't think any existing software will open it as SRT. Why would it think that a random file is a SRT file? It would need to be an application that accepts absolutely anything that you give it as SRT and then that software has more fundamental problems.<br>
<br>
</div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
At this point, what is your recommendation? The following ideas have been<br>
on the table:<br>
<br>
* Change the file extension to something other than .srt.<br>
<br>
I don't have an opinion, browsers ignore the file extension anyway.<br>
<br>
</blockquote>
<br>
Yes, I think we should definitely have a new file extension.<br>
</blockquote>
<br></div>
I'll leave this to others to decide, but since browsers have no concept of file extensions, just using .srt will work. If the format is SRT-like it's likely at least some files will use .srt in practice.</blockquote>
<div><br>All SRT files in practice use the .srt extension - it is typically how these formats are identified by applications. Just because *nix ignores file extensions mostly for identifying file types doesn't mean that applications do. Again, I believe strongly that re-using the same file extension is the one biggest pain we can inflict on the community.<br>
<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im"><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* Change the MIME type to something other than text/srt.<br>
<br>
I doubt it makes any difference, as most software that deal with SRT today<br>
have no concept of MIME types. No matter what I'd want exactly 1 MIME type<br>
or alternatively make browsers ignore the MIME type completely.<br>
<br>
</blockquote>
<br>
You're right in that existing SRT software probably doesn't deal much with a<br>
SRT mime type. Right now text/x-srt or text/srt is sometimes used for SRT<br>
files, but often text/plain is also in use and more likely from a Web<br>
server. Since this is the space where Web browsers play, I am not overly<br>
fussed, though I think logically text/websrt makes more sense with a .wsrt<br>
extension. Then, also SRT files can be served as text/websrt to allow them<br>
to take part in the WebSRT infrastructure if indeed they will continue to be<br>
valid WebSRT files.<br>
</blockquote>
<br></div>
Is there anything you expect would break if WebSRT files were served as text/srt?</blockquote><div><br>I'm asking because I don't know how anal Web browsers are about mime types. I would think a Web browser should accept WebSRT and SRT files in text/plain format as well as WebSRT files in text/websrt format and SRT files in text/srt format. Would something break if they even came as text/html? I would expect that it makes a difference when these are loaded directly as a resource for display (e.g. when you directly go to <a href="http://example.com/mycaptions.wsrt">http://example.com/mycaptions.wsrt</a>), but not when used through a <track> element, where WebSRT is the baseline format and thus is expected.<br>
</div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im"><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* Add a header to WebSRT to make it uniquely identifiable.<br>
<br>
The header would have to be mandatory and browsers would have to reject<br>
files that don't have it. Such files would be compatible with some existing<br>
software and break some, depending on how they sniff. We could also put<br>
metadata in such a header.<br>
<br>
</blockquote>
<br>
Yes, I think we need to introduce a header. Maybe we can hide all the<br>
structure in what SRT recognizes as comments (i.e. start the lines as ";".<br>
But I believe we need some hints like the @profile to identify the type of<br>
the cues and the <link> to link to a style sheet, and we need metadata like<br>
the <meta> element of HTML headers.<br>
</blockquote>
<br></div>
I had no idea that semicolon was used for comments in SRT, is this usage widespread? Does it work in most players?</blockquote><div><br>I thought it was, but maybe it was just introduced for WebSRT. It is not tested in Hixie's SRT research[2]. Can you take a quick look through your SRT file collection if there are any? I'm probably wrong about this seeing as it's not mentioned in the wiki page for SRT [3].<br>
<br>[2] <a href="http://wiki.whatwg.org/wiki/SRT_research">http://wiki.whatwg.org/wiki/SRT_research</a><br>[3] <a href="http://en.wikipedia.org/wiki/SubRip">http://en.wikipedia.org/wiki/SubRip</a><br><br><br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im"><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* Make something deliberately incompatible with SRT.<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
It doesn't make a big difference to browsers implementing the format. We'd<br>
be replacing something that mostly works in existing players with something<br>
that never works.<br>
<br>
</blockquote>
<br>
That was the idea of WMML and I took that path because I thought it would be<br>
advantageous for other Web applications, such as built on libxml2, expat,<br>
php's SimpleXML, pyexpat for python, Nokogiri for ruby etc. But I really<br>
like the idea of WebSRT to allow arbitrary metadata in the cues without<br>
having to put it into CDATA sections.<br>
<br>
I don't mind creating a format that is still somewhat compatible with SRT.<br>
We don't have to force incompatibility - but we should also not have it<br>
restrict us. In either case, it is a new format.<br>
</blockquote>
<br></div>
I'm not trying to be annoying, but this seems to clash with your preference to not break any existing software. Anything that resembles SRT *will* be treated as SRT in some existing players.</blockquote><div><br>No, I think that's a misconception. I think most players test the file extension and maybe the mime type before opening a file as srt. A quick test in VLC on my Mac shows that when I go to "Subtitle -> Open File" I am not allowed to open anything that doesn't have an extension that VLC accepts - they get filtered out. Thus, what the actual file looks like really doesn't matter - what matter is what it sells itself as through the file extension, the mime type, or some magic identifier at the beginning of the file. Which is used depends on your OS and your application.<br>
<br><br>Cheers,<br>Silvia.<br><br></div></div>