[whatwg] <track> / WebVTT issues
giles at mozilla.com
Wed Sep 21 11:12:40 PDT 2011
On 21/09/11 02:15 AM, Philip Jägenstedt wrote:
> Implementors of <track> / WebVTT from several browser vendors (Opera,
> Mozilla, Google, Apple) met at the Open Video Conference recently. There
> was a session on video accessibility, a bunch of new bugs were filed
>  and there was much rejoicing.
> There were a few issues that weren't concrete enough to file bugs on,
> but which I think are still worthwhile discussing further:
> == Comments ==
> If you look at the source of the spec, you'll find comments as a v2
> feature request:
> COMMENT -->
> this is a comment, bla bla
I don't like the format either. I do think it's very important we have
some mechanism for multi-line file level metadata, embedded css, etc. so
the files can live on their own.
The syntax section also suggests all metadata has to be on the signature
line, while the parser will actually skip everything between the
signature and the first double line terminator.
For in-caption, <! comment> is a good idea. Semantically it's a bit
weird to not mention it in the spec, since everything else has an end
tag, but the parser will ignore it as we want.
> The parser is fairly strict in some regards:
> * must use exactly 2 digits for minutes and seconds
> * minutes and seconds must be <60
I'm not normally one for restrictions, but parser also says the
(optional) hours field must have "two or more" digits, with no maximum
If we all agree on an implementation limit, it could be helpful to
specify one. Storing milliseconds in a 32 bit type gives a little over
1000 hours of timestamps. Single-precision float runs out of useful
precision after about 50 hours. I'd suggest a two or three digit limit
on hours to avoid requiring a 64 bit type. If we don't care about that,
then 10 digits is a reasonable limit to avoid running out of precision
> A small percentage of cues (or cue text) will be dropped because of
> these constraints and this is not very likely to be noticed unless the
> entire video+captions are watched.
This is a very good point.
> 02:00.000 --> next
> Last Chapter
> Cues would be created with endTime = Infinity, and be modified to the
> startTime of the following cue (in source order) if there is a following
> cue. This would IMO be quite neat, but is the use case strong enough?
This would also nicely solve the latency issue with generating live
captions. With both use cases together, I'd be in favour of this, but we
have other issues to address before live VTT streams work in the <track>
element. See https://www.w3.org/Bugs/Public/show_bug.cgi?id=14104
More information about the whatwg