[whatwg] WebVTT feedback (was Re: Video feedback)

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Thu Jun 9 08:19:36 PDT 2011

Hi all,

While we're on the topic of providing feedback on WebVTT, I want to
add some things that have crept up while trying to implement the spec
line by line.


1. Text Track cue size

In the parsing section for cues, step 27, the default for cue is set
to 100. This means that every cue that has no explicit size setting
("S:") will occupy the full width of the video viewport (height if
vertical renering), even if the displayed text is only short, such as
"[music]". I believe that is not the best default means of rendering
subtitles and captions, because more of the video's pixel are
obstructed than is necessary by the cue background box with its dark
grey background rgba(0,0,0,0.8).

Instead, it would make a lot more sense to just have a background box
cover the screen estate that the text needs, i.e. put the background
color only on the Text boxes themselves. This is how YouTube do it.

Alternatively, we could have the background box just cover the
bounding box of all the Text boxes inside it, which will make it a
rectangular display of each caption cue, but bound to the width of the
longest text line length.

2. Text Track default cue line position

In the parsing section for cues, step 25, the default line position
for cues is 'auto' and the default snap-to-lines flag is true. For
cues that have no explicit line position setting ("L:"), this means
that the height of the cue ends up getting y-position of 0 (see
Section 2 with the WebVTT cue text rendering rules, step 10, substep
9, first case ). The y-position in turn leads in substep 10 to setting
the "top" property to y-position vh, which is 0 percent of the video's
height. top:0 means that the cue is now placed by default at the top
of the video viewport.

Instead, it would make a lot more sense to have it rendered by default
at the bottom of the video viewport, since that is how captions and
subtitles in the past have by default been rendered.

Thus, I would suggest to mean that an auto line position is mapped to
the y-position of 100 in Section 2, step 10, substep 9, first case.

3. Calculation of Text Track cue line position

Assuming we've set a "L:100%" on a cue, then according to Section 2,
step 10, substep 9, second case we arrive at a y-position of 100,
leading to the setting of "top" to 100% of the video's height. This
means that the cue will disappear beyond the bottom of the video
viewport. Is that intended?

Also, shouldn't the caption text box have been centered in the middle
of the caption text box's height at the L position rather than at the
top of that box?

4. Calculation of Text Track cue text position

Similarly as for the vertical line positioning, I wonder whether there
is a problem with the horizontal "T:" text positioning. When we
specify T:25% on an A:middle cue box, the box is moved half its size
to the left of the T position, i.e. it ends up at -12.5% of the video
viewport's width. Is that intended? Should there be a way to limit how
far a box can be moved off the video viewport? Should it continue to
be visible when moved off the video viewport?

(and thanks to Ronny for helping to surface some of this)

More information about the whatwg mailing list