[whatwg] <track> / WebVTT issues
Philip Jägenstedt
philipj at opera.com
Wed Sep 21 02:15:25 PDT 2011
Implementors of <track> / WebVTT from several browser vendors (Opera,
Mozilla, Google, Apple) met at the Open Video Conference recently. There
was a session on video accessibility,[1] a bunch of new bugs were filed
[2] and there was much rejoicing.
There were a few issues that weren't concrete enough to file bugs on, but
which I think are still worthwhile discussing further:
== Comments ==
If you look at the source of the spec, you'll find comments as a v2
feature request:
COMMENT -->
this is a comment, bla bla
I do not think this would be very useful. As a one-line comment at the top
of the file (for authorship, etc) it is rather verbose and ugly, while for
commenting out cues you would have to comment out each cue individually.
It also doesn't work inside cues, where something like <! comment > is
what would be backwards compatible with the current parser. If comments
are left for v2, the above is what it'll be, because of compatibility
constraints. If anyone is less than impressed with that, now would be the
time to suggest an alternative and have it spec'd.
== Scrolling captions ==
The WebVTT layout algorithm tries to not move cues around once they've
been displayed and to never obscure other cues. This means that for cues
that overlap in time, the rendering will often be out of order, with the
earliest cue at the bottom. This is quite contrary to the (mainly US?)
style of (live) scrolling captions, where cues are always in order and
scroll to bring new captions into view. (I am not suggesting any specific
change.)
== Scaling up and down ==
Scaling the font size with the video will not be optimal for either small
screens (text will be too small) or very large screens (text will be too
big). Do we change the default rendering in some way, or do we let users
override the font size? If users can override it, do we care that this may
break the intended layout of the author?
== Strict vs forgiving parsing ==
The parser is fairly strict in some regards:
* double id line discards entire cue
(http://www.w3.org/Bugs/Public/show_bug.cgi?id=13943)
* must use exactly 2 digits for minutes and seconds
* minutes and seconds must be <60
* must use "." as the decimal separator
* must use exactly 3 decimal digits
* stray "<" consumes the rest of the cue text
A small percentage of cues (or cue text) will be dropped because of these
constraints and this is not very likely to be noticed unless the entire
video+captions are watched. Possible remedies:
* make the parser more forgiving where it does not conflict with
extensibility
* make browsers complain a lot in the error console
* point and laugh at those who failed to use a (non-existent) validator
== Chapter end time ==
In most systems chapters are really chapter markers, a point in time. A
chapter implicitly ends when the next begins. For nested chapters this
isn't so, as the end time is used to determine nesting. Do we expect that
UIs for chapter navigation make the end time visible in some fashion (e.g.
highlighting the chapter on the timeline) or that when a chapter it is
chosen, it will pause at the end time?
== --> next ==
A suggestion that was brought up when discussing chapters. When one simply
wants the chapter to end when the next starts, it's a bit of a hassle to
always include the end time. Some additional complexity in the parser
could allow for this:
00:00.000 --> next
Chapter 1
01:00.000 --> next
Intermezzo
02:00.000 --> next
Last Chapter
Cues would be created with endTime = Infinity, and be modified to the
startTime of the following cue (in source order) if there is a following
cue. This would IMO be quite neat, but is the use case strong enough?
[1] http://openvideoconference.org/standards-for-video-accessibility/
[2] http://wiki.whatwg.org/wiki/WebVTT
--
Philip Jägenstedt
Core Developer
Opera Software
More information about the whatwg
mailing list