[whatwg] Video feedback

Fri Jun 3 05:20:09 PDT 2011

On Fri, 03 Jun 2011 01:28:45 +0200, Ian Hickson <ian at hixie.ch> wrote:

>> > On Fri, 22 Oct 2010, Simon Pieters wrote:

Actually it was me, but that's OK :)

>> > > There was also some discussion about metadata. Language is sometimes
>> > > necessary for the font engine to pick the right glyph.
>> >
>> > Could you elaborate on this? My assumption was that we'd just use CSS,
>> > which doesn't rely on language for this.
>>
>> It's not in any spec that I'm aware of, but some browsers (including
>> Opera) pick different glyphs depending on the language of the text,
>> which really helps when rendering CJK when you have several CJK fonts on
>> the system. Browsers will already know the language from <track
>> srclang>, so this would be for external players.
>
> How is this problem solved in SRT players today?

Not at all, it seems. Both VLC and Totem allow setting the character  
encoding and font used for subtitles in the (global) preferences menu, so  
presumably you would change that if the default doesn't work. Font  
switching seems to mainly be an issue when your system has other default  
fonts than the text you're reading, and it appears that is rare enough  
that very little software does anything about it, browsers perhaps being  
an exception.

> On Mon, 3 Jan 2011, Philip Jägenstedt wrote:
>>
>> > > * The "bad cue" handling is stricter than it should be. After
>> > > collecting an id, the next line must be a timestamp line. Otherwise,
>> > > we skip everything until a blank line, so in the following the
>> > > parser would jump to "bad cue" on line "2" and skip the whole cue.
>> > >
>> > > 1
>> > > 2
>> > > 00:00:00.000 --> 00:00:01.000
>> > > Bla
>> > >
>> > > This doesn't match what most existing SRT parsers do, as they simply
>> > > look for timing lines and ignore everything else. If we really need
>> > > to collect the id instead of ignoring it like everyone else, this
>> > > should be more robust, so that a valid timing line always begins a
>> > > new cue. Personally, I'd prefer if it is simply ignored and that we
>> > > use some form of in-cue markup for styling hooks.
>> >
>> > The IDs are useful for referencing cues from script, so I haven't
>> > removed them. I've also left the parsing as is for when neither the
>> > first nor second line is a timing line, since that gives us a lot of
>> > headroom for future extensions (we can do anything so long as the
>> > second line doesn't start with a timestamp and "-->" and another
>> > timestamp).
>>
>> In the case of feeding future extensions to current parsers, it's way
>> better fallback behavior to simply ignore the unrecognized second line
>> than to discard the entire cue. The current behavior seems unnecessarily
>> strict and makes the parser more complicated than it needs to be. My
>> preference is just ignore anything preceding the timing line, but even
>> if we must have IDs it can still be made simpler and more robust than
>> what is currently spec'ed.
>
> If we just ignore content until we hit a line that happens to look like a
> timing line, then we are much more constrained in what we can do in the
> future. For example, we couldn't introduce a "comment block" syntax,  
> since
> any comment containing a timing line wouldn't be ignored. On the other
> hand if we keep the syntax as it is now, we can introduce a comment block
> just by having its first line include a "-->" but not have it match the
> timestamp syntax, e.g. by having it be "--> COMMENT" or some such.

One of us must be confused, do you mean something like this?

1
--> COMMENT
00:00.000 --> 00:01.000
Cue text

Adding this syntax would break the *current* parser, as it would fail in  
step 39 (Collect WebVTT cue timings and settings) and then skip the rest  
of the cue. If we want any room for extensions along these lines, then  
multiple lines preceding the timing line must be handled gracefully.

> Looking at the parser more closely, I don't really see how doing anything
> more complex than skipping the block entirely would be simpler than what
> we have now, anyway.

I suggest:

  * Step 31: Try to "collect WebVTT cue timings and settings" instead of  
checking for the substring "-->". If it succeeds, jump to what is now step  
40. If it fails, continue at what is now step 32. (This allows adding any  
syntax as long as it doesn't exactly match a timing line, including "-->  
COMMENT". As a bonus, one can fail faster when trying to parse an entire  
timing line rather than doing a substring search for "-->".)

  * Step 32: Only set the id line if it's not already set. (Assuming we  
want the first line to be the id line in future extensions.)

  * Step 39: Jump to the new step 31.

In case not every detail is correct, the idea is to first try to match a  
timing line and to take the first line that is not a timing line (if any)  
as the id, leaving everything in between open for future syntax changes,  
even if they use "-->".

I think it's fairly important that we handle this. Double id lines is an  
easy mistake to make when copying things around. Silently dropping those  
cues would be worse than what many existing (line-based, id-ignoring) SRT  
parsers do.

> On Sat, 22 Jan 2011, Philip Jägenstedt wrote:

>> I'm inclined to say that we should normalize all whitespace during
>> parsing and not have explicit line breaks at all. If people really want
>> two lines, they should use two cues. In practice, I don't know how well
>> that would fare, though. What other solutions are there?
>
> I think we definitely need line breaks, e.g. for cases like:
>
>   -- Do you want to go to the zoo?
>   -- Yes!
>   -- Then put your shoes on!
>
> ...which is quite common style in some locales.

Right, normalizing all whitespace would be overkill.

> However, I agree that we should encourage people to let browsers wrap the
> lines. Not sure how to encourage that more.

> On Mon, 14 Feb 2011, Philip Jägenstedt wrote:
>> >
>> > [line wrapping]
>>
>> There's still plenty of room for improvements in line wrapping, though.
>> It seems to me that the main reason that people line wrap captions
>> manually is to avoid getting two lines of very different length, as that
>> looks quite unbalanced. There's no way to make that happen with CSS, and
>> AFAIK it's not done by the WebVTT rendering spec either.
>
> WebVTT just defers to CSS for this. I agree that it would be nice for CSS
> to allow UAs to do more clever things here and (more importantly) for UAs
> to actually do more clever things here.

To expand a bit more on the problem and suggested solution, consider the  
example cue "This sentence is spoken by a single speaker and is presented  
as a single cue."

If simple line-wrapping (how browsers currently render text) is used it  
might be:

"This sentence is spoken by a single speaker and is presented as a
single cue."

Subtitles tend to be line-wrapped to have more balanced line width, and at  
least I would certainly much prefer this line wrapping:

"This sentence is spoken by a single speaker
and is presented as a single cue."

Apart from being easier to read, this is also much more suitable for  
left/right-alignment in cases where that is used to associate the cue with  
a speaker on screen. With WebVTT, one would have to manually line-break  
the text to get this result. Apart from wasting the time of the captioner,  
it will also break if a slightly larger font is used -- you might get this  
rendering instead:

"This sentence is spoken by a single
speaker
and is presented as a single cue."

In other cases you might get 4 lines where 3 would have been enough. This  
is not a theoretical issue, I see it fairly with SRT subtitles rendered at  
another size than was tested with.

My suggested solution is to first layout the text using all of the  
available width. Then, decrease the width as much as possible without  
increasing the number of line breaks. The algorithm should also prefer to  
make the first line the longest, as this is IMO more aesthetically  
pleasing.

I would like to see this specified and would gladly implement it in Opera,  
but in which spec does it belong? It seems fairly subtitling-specific to  
me, so if it could be in the WebVTT rendering rules to begin with (as  
opposed to CSS with vendor prefixes) that would be at least short-term  
awesome. It's only if this is the default line-wrapping for <track>+WebVTT  
that people are going to discover this and stop manually line-breaking  
their captions.

> On Tue, 18 Jan 2011, Robert O'Callahan wrote:
>>
>> One solution that could work here is to honour dynamic changes to
>> 'preload', so switching preload to 'none' would stop buffering. Then a
>> script could do that, for example, after the user has paused the video
>> for ten seconds. The script could also look at 'buffered' to make its
>> decision.
>
> If browsers want to do that I'm quite happy to add something explicitly  
> to
> that effect to the spec. Right now the spec doesn't disallow it.

For now, Opera has made it impossible to change the internal preload state  
 from a higher state to a lower state specifically to prevent this. If  
script authors could start and stop the buffering at will, it would  
certainly be abused to perform throttling using lots of small requests. If  
the buffering behavior of browsers is broken, I'd prefer to fix it (in  
spec or implementation) rather than to allow scripts to work around it.

> On Wed, 19 Jan 2011, Philip Jägenstedt wrote:
>>
>> The 3 preload states imply 3 simple buffering strategies:
>>
>> none: don't touch the network at all
>> preload: buffer as little as possible while still reaching readyState
>> HAVE_METADATA
>> auto: buffer as fast and much as possible
>
> "auto" isn't "as fast and much as possible", it's "as fast and much as
> will make the user happy". In some configurations, it might be the same  
> as
> "none" (e.g. if the user is paying by the byte and hates video).

The way I see it, that's just a matter of a user preference to limit the  
internal preload state to "none" regardless of what the content attribute.

>> However, the state we're discussing is when the user has begun playing  
>> the
>> video. The spec doesn't talk about it, but I call it:
>>
>> invoked: buffer as little as possible without readyState dropping below
>> HAVE_FUTURE_DATA (in other words: being able to play from currentTime to
>> duration at playbackRate without waiting for the network)
>
> There's also a fifth state, let's call it "aggressive", where even while
> playing the video the UA is trying to download the whole thing in case  
> the
> connection drops.

This is the same as "auto" for now, but sure, that could be improved.

>> If the available bandwidth exceeds the bandwidth of the resource, some
>> kind of throttling must eventually be used. There are mainly 2 options
>> for doing this:
>>
>> 1. Throttle at the TCP level by not reading data from the socket (not  
>> at all
>> to suspend, or at a controlled rate to buffer ahead)
>> 2. Use HTTP byte ranges, making many smaller requests with any kind of
>> throttling at the TCP level
>
> There's also option 3, to handle the fifth state above: don't throttle.
>
>
>> When HTTP byte ranges are used to achieve bandwidth management, it's
>> hard to talk about a single downloadBufferTarget that is the number of
>> seconds buffered ahead. Rather, there might be an upper and lower limit
>> within which the browser tries to stay, so that each request can be of a
>> reasonable size. Neither an author-provided minumum or maximum value can
>> be followed particularly closely, but could possibly be taken as a hint
>> of some sort.
>
> Would it be a more useful hint than "preload"? I'm skeptical about adding
> many hints with no requirements. If there's some specific further
> information we can add, though, it might make sense to add more features
> to "preload".

I don't think that now is a good time to add more features to preload,  
given that what we have isn't interoperably implemented yet.

>> The above buffering strategies are still not enough, because users seem
>> to expect that in a low-bandwidth situation, the video will keep
>> buffering until they can watch it through to the end. These seem to be
>> the options for solving the problem:
>>
>> * Make sites that want this behavior set .preload='auto' in the 'paused'
>> event handler
>>
>> * Add an option in the context menu to "Preload Video" or some such
>>
>> * Cause an invoked (see dfn above) but paused video to behave like
>> preload=auto
>>
>> * As above, but only when the available bandwidth is limited
>>
>> I don't think any of these solutions are particularly good, so any input
>> on other options is very welcome!
>
> If users expect something, it seems logical that it should just happen. I
> don't have a problem with saying that it should depend on preload="",
> though. If you like I can make the spec explicitly describe what the
> preload="" hints mean while video is playing, too.

That would be a good start. In Opera, playing the video causes the  
internal preload state to go to "invoked".

> On Thu, 20 Jan 2011, Philip Jägenstedt wrote:
>>
>> There have been two non-trivial changes to the seeking algorithm in the
>> last year:
>>
>> Discussed at  
>> http://lists.w3.org/Archives/Public/public-html/2010Feb/0003.html
>> lead to http://html5.org/r/4868
>>
>> Discussed at  
>> http://lists.w3.org/Archives/Public/public-html/2010Jul/0217.html
>> lead to http://html5.org/r/5219
>
> Yeah. In particular, sometimes there's no way for the UA to know
> asynchronously if the seek can be done, which is why the attribute is set
> after the method returns. It's not ideal, but the alternative is not
> always implementable.
>
>
>> With that said, it seems like there's nothing that guarantees that the
>> asynchronous section doesn't start running while the script is still
>> running.
>
> Yeah. It's not ideal, but I don't really see what we can do about it.

http://www.w3.org/Bugs/Public/show_bug.cgi?id=12267

By only updating the media state between tasks (or as tasks), the script  
that issued the seek could not see the state changed as a result of it.

> On Fri, 4 Feb 2011, Matthew Gregan wrote:
>>
>> For anyone following along, the behaviour has now been changed in the
>> Firefox 4 nightly builds.
>
> On Mon, 24 Jan 2011, Robert O'Callahan wrote:
>>
>> I agree. I think we should change behavior to match author expectations
>> and the other implementations, and let the spec change to match.
>
> How do you handle the cases where it's not possible?
>
>
> If all the browsers can do it, I'm all for going back to having
> currentTime change synchronosuly.

Changing currentTime synchronously doesn't mean that seeking to that  
position will actually succeed, so I don't see why that would be a  
problem. currentTime would just be updated again once it's been clamped in  
the asynchronous section of the seek algorithm.

> On Sat, 14 May 2011, Ojan Vafai wrote:
>>
>> If someone proposed a workable solution, browser would likely implement
>> it. I can't think of a backwards-compatible solution to this, so I agree
>> that developers just need to learn the that this is a bad pattern. I
>> could imagine browsers logging a warning to the console in these cases,
>> but I worry that it would fire too much in today's web.
>
> Indeed.
>
>
>> It's unfortunate that you need to use an inline event handler instead of
>> one registered via addEventListener to avoid the race condition.
>> Exposing something to the platform like jquery's live event handlers (
>> http://api.jquery.com/live/) could mitigate this problem in practice,
>> e.g. it would be just as easy or easier to register the event handler
>> before the element is created.
>
> You can also work around it by setting src="" from script after you've
> used addEventListener, or by checking the state manually after you've
> added the handler and calling the handler if it is too late (though you
> have to be aware of the situation where the event is actually already
> scheduled and you added the listener between the time it was scheduled  
> and
> the time it fired, so your function really has to be idempotent).

A better fix would be http://www.w3.org/Bugs/Public/show_bug.cgi?id=12267  
so there is no window where scripts see state X even though the related  
transition event has not fired yet.

-- 
Philip Jägenstedt
Core Developer
Opera Software