[whatwg] Video feedback

Thu Jun 2 19:58:29 PDT 2011

On Thu, Jun 2, 2011 at 7:28 PM, Ian Hickson <ian at hixie.ch> wrote:
> We can add comments pretty easily (e.g. we could say that "<!" starts a
> comment and ">" ends it -- that's already being ignored by the current
> parser), if people really need them. But are comments really that useful?
> Did SRT have problem due to not supporting inline comments? (Or did it
> support inline comments?)

I've only worked with SSA subtitles (fansubbing), where {text in
braces} effectively worked as a comment.  We used them a lot to
communicate between editors on a phrase-by-phrase basis.

But for that use case, using hidden spans makes more sense, since you
can toggle them on and off to view them inline, etc.

Given that, I'd be fine with a comment format that doesn't allow
mid-cue comments, if it makes the format simpler.

>> The text on the left is a transcription, the top is a transliteration,
>> and the bottom is a translation.
>
> Aren't these three separate text tracks?

They're all in the same track, in practice, since media players don't
play multiple subtitle tracks.

It's true that having them in separate tracks would be better, so they
can be disabled individually.  This is probably rare enough that it
should just be sorted out with scripts, at least to start.

> It's not clear to me that we need language information to apply proper
> font selection and word wrapping, since CSS doesn't do it.

But it doesn't have to, since HTML does this with @lang.

> Mixing one CJK language with one non-CJK language seems fine. That should
> always work, assuming you specify good fonts in the CSS.

The font is ultimately in the user's control.  I tell Firefox to
always use Tahoma for Western text and MS Gothic for Japanese text,
ignoring the often ugly site-specified fonts.  The only control sites
have over my fonts is the language they say the text is (or which the
whole page is detected as).  The same principle seems to apply for
captions.

(That's not to say that it's important enough to add yet and I'm fine
with punting on this, at least for now.  I just don't think specifying
fonts is the right solution.)

The most straightforward solution would seems to be having @lang be a
CSS property; I don't know the rationale for this being done by HTML
instead.

> I don't understand why we can't have good typography for CJK and non-CJK
> together. Surely there are fonts that get both right?

I've never seen a Japanese font that didn't look terrible for English
text.  Also, I don't want my font selection to be severely limited due
to the need to use a single font for both languages, instead of using
the right font for the right text.

>> One example of how this can be tricky: at 0:17, a caption on the bottom
>> wraps and takes two lines, which then pushes the line at 0:19 upward
>> (that part's simple enough).  If instead the top part had appeared
>> first, the renderer would need to figure out in advance to push it
>> upwards, to make space for the two-line caption underneith it.
>> Otherwise, the captions would be forced to switch places.
>
> Right, without lookahead I don't know how you'd solve it. With lookahead
> things get pretty dicey pretty quickly.

The problem is that, at least here, the whole scene is nearly
incomprehensible if the top/bottom arrangement isn't maintained.
Lacking anything better, I suspect authors would use similar brittle
hacks with WebVTT.

Anyway, I don't have a simple solution either.

>> I think that, no matter what you do, people will insert line breaks in
>> cues.  I'd follow the HTML model here: convert newlines to spaces and
>> have a separate, explicit line break like <br> if needed, so people
>> don't manually line-break unless they actually mean to.
>
> The line-breaks-are-line-breaks feature is one of the features that
> originally made SRT seem like a good idea. It still seems like the neatest
> way of having a line break.

But does this matter?  Line breaks within a cue are relatively
uncommon in my experience (perhaps it's different for other
languages), compared to how many people will insert line breaks in a
text editor simply to break lines while authoring.  If you do this
while testing on a large monitor, it's likely to look reasonable when
rendered; the brokenness won't show up until it's played in a smaller
window.  Anyone using a non-programmer's text editor that doesn't
handle long lines cleanly is likely to do this.

Wrapping lines manually in SRTs also appears to be common (even
standard) practice, perhaps due to inadequate line wrapping in SRT
renderers.  Making line breaks explicit should help keep people from
translating this habit to WebVTT.

>> Related to line breaking, should there be an   escape?  Inserting
>> nbsp literally into files is somewhat annoying for authoring, since
>> they're indistinguishable from regular spaces.
>
> How common would   be?

I guess the main cases I've used nbsp for don't apply so much to
captions, eg. © 2011 (likely to come at the start of a caption,
so not likely to be wrapped anyway).

>> We might also consider leaning on users a bit to tell us what they want.
>> For example, I think people are pretty used to hitting play and then
>> pause to buffer until the end of the video. What if we just used our
>> bandwidth heuristics while in the play state, and buffered blindly when
>> a pause occurs less than X seconds into a video? I won't argue that this
>> is a wonderful solution (or a habit we should encourage), but I figured
>> I'd throw a random idea out there鈥�
> That seems like pretty ugly UI. :-)

Changing buffering modes based on *when* the user pauses is an ugly
UI.  Pausing to let a video buffer when it's underrunning (regardless
of when it's paused) is something easy to understand and that people
are used to, though.  I don't know if this is relevant to the spec or
just an implementation issue.

>> I think that pausing shouldn't affect read-ahead buffering behavior.
>> I'd suggest another preload value, preload=buffer, sitting between
>> "metadata" and "auto".  In addition to everything loaded by "metadata",
>> it also fills the read-ahead buffer (whether the video is playing or
>> not).
>>
>> - If a page wants prebuffering only (not full preloading), it sets
>> preload=buffer.  This can be done even when the video is paused, so when
>> the user presses play, the video starts instantly without pausing for a
>> server round-trip like preload=metadata.
>
> So this would be to buffer enough to play through assuming the network
> remains at the current bandwidth, but no more?

I suppose that wouldn't work too well: if the video is small then you
may as well preload the whole thing, and if it's large then long-term
bandwidth estimates aren't going to be very accurate.  (I'm dubious of
any behavior based on bandwidth estimations.)

-- 
Glenn Maynard