[whatwg] SRT research: separating cues
simonp at opera.com
Tue Oct 25 00:18:32 PDT 2011
On Mon, 24 Oct 2011 22:50:43 +0200, Silvia Pfeiffer
<silviapfeiffer1 at gmail.com> wrote:
> So, in your opinion, should there be a change to the WebVTT spec that
> separates cues differently?
> Is there a recommendation you have from your analysis?
My recommendation is http://www.w3.org/Bugs/Public/show_bug.cgi?id=14550
> On Mon, Oct 24, 2011 at 6:26 PM, Simon Pieters <simonp at opera.com> wrote:
>> I wanted to research how common it is to fail to separate cues in SRT,
>> for what reason.
>> SRT parsers usually interpret a timings line as a new cue, while WebVTT
>> wants two blank lines for a new cue.
>> I took the 65k SRT files we've got, replaced comma with dot and
>> "WEBVTT\n\n", then ran them in Opera's <track> impl, looking for '-->'
>> cue data.
>> There were 840 files with --> in cue data. This is 1.3% of the files.
>> Looking at the cue data, there were 11,118 lines that contained -->.
>> were 8830 lines of only whitespace.
>> In the cue data, if I look at valid-looking timing lines
>> (/^\d\d:\d\d:\d\d\.\d\d\d\s*-->\s*\d\d:\d\d:\d\d\.\d\d\d(\s|$)/) and
>> the line before that, or the line before *that* if it looks like an SRT
>> (/^\d+\s*$/), then I see 7030 lines of only whitespace and 3761 lines of
>> something else.
>> Failing to separate cues results in an unpleasant experience for the
>> since basically the screen is filled with several "cues" including
>> their IDs
>> and timing lines.
>> Some files had most or all of their cues parsed as a single cue with the
>> WebVTT parser, e.g. because all lines ended with one or more spaces.
>> at such a file in a text editor, it's not immediately obvious that
>> an error, because the spaces are not visible. Moreover, the file is not
>> non-conforming, so a validator wouldn't help either.
>> So what about the cases that aren't whitespace? It seems to be mostly
>> missing the newline completely. Some omitted the ID also. One file had
>> a "|"
>> between all cues.
>> Simon Pieters
>> Opera Software
More information about the whatwg