[whatwg] Cue points in media elements

Tue May 1 13:57:29 PDT 2007

On Apr 30, 2007, at 7:15 PM, Ralph Giles wrote:

> Thanks for adding to the discussion. We're very interested in
> implementing support for presentations as well, so it's good
> to hear from someone with experience.

Thanks for responding, I'm glad to hear your input.

> On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:
>
>> in our language, you might see something like this:
>>
>>   (movie "Foo.mov" :name 'movie)
>>   (wait @movie (tc 2 3))
>>   (show @bullet-1)
>>   (wait @movie)
>>   (show @bullet-2)
>>
>> If the user skips to the end of the media clip, that simply causes
>> all WAITs on that  media clip to return instantly. If they skip
>> forward in the media clip, without ending it, all WAITs before that
>> point will return instantly.
>
> How does this work if, for example, the user seeks forward, and then
> back to an earlier position? Would some of the 'show's be undone,  
> or do
> they not seek backward with the media playback?

We don't expose arbitrary seeking controls to our users; just play/ 
pause, skip forward & back one card (which resets all state to a  
known value) and skip past the current video/audio (which just causes  
all waits on that media element to return instantly).

> Is the essential
> component of your system that all the shows be called in sequence
> to build up a display state, or that the last state trigger before the
> current playback point have been triggered?

The former.

> Isn't this slow if a bunch
> of intermediate animations are triggered by a seek?

Yes, though this is more a bug in our animation API (which could be  
taught to skip directly to the end of an animation when associated  
video/audio ends, but that just hasn't been done yet).

Actually, that brings up another point, which is a bit more  
speculative. It may be nice to have a way to register a callback that  
will be called at animation rates (at least 15 frames/second or so)  
that is called with the current play time of a media element. This  
would allow you to keep animations in sync with video, even if the  
video might stall briefly, or seek forward or backward for whatever  
reason. We haven't implemented this in our current system (as I said,  
it still has the bug that animations still take their full time to  
play even when you skip video), but it may be helpful for this sort  
of thing.

> Does your system support live streaming as well? That complicates the
> design some when the presentation media updates appear dynamically.

No, we only support progressive download.

> Anyway I think you could implement your system with the currently
> proposed interface by checking the current playback position and
> clearing a separate list of waits inside your timeupdate callback.

I agree, it would be possible, but from my current reading of the  
spec it sounds like some cue points might be missed until quite a bit  
later (since timeupdate isn't guaranteed to be called every time  
anything discontinuous happens with the media). In general, having to  
do extra bookkeeping to keep track of the state of the media may be  
fragile, so stronger guarantees about when cue points are fired is  
better than trying to keep track of what's going on with timeupdate  
events.

> I agree this should be clarified. The appropriate interpretation  
> should
> be when the current playback position reaches the frame  
> corresponding to
> the queue point, but digital media has quantized frames, while the cue
> points are floating point numbers. Triggering all cue point callbacks
> between the last current playback position and the current one
> (including during seeks) would be one option, and do what you want as
> long as you aren't seeking backward. I'd be more in favor of  
> triggering
> any cue point callbacks that lie between the current playback position
> and the current playback position of the next frame (audio frame for
> <audio/> and video frame for <video/> I guess). That means more
> bookkeeping to implement your system, but is less surprising in other
> cases.

Sure, that would probably work. As I said, bookkeeping is generally a  
problem because it might get out of sync, but with stronger  
guarantees about when cue points are triggered, I think it could work.

>>                                                           If video
>> playback freezes for a second, and so misses a cue point, is that
>> considered to have been "reached"?
>
> As I read it, cue points are relative to the current playback  
> position,
> which does not advance if the stream buffer underruns, but it would
> if playback restarts after a gap, as might happen if the connection
> drops, or in an RTP stream. My proposal above would need to be amended
> to handle that case, and the decoder dropping frames...finding the  
> right
> language here is hard.

Yes, it's a tricky little problem. Our current system stays out of  
trouble because it makes quite a few simplifying assumptions (video  
is played forward only, progressive download, not streaming, etc).  
Obviously, in order to support a more general API, you're going to  
have to deal with some trickier issues.

I guess the main question to ask is what is the purpose of a cue  
point? Is it to specify "at the moment this is called, the media is  
at this point," and what does that mean when you have quantized  
frames and floating point times? Is it to specify "the media has  
passed this particular point in playback", and what does that mean  
when you're playing backwards or seeking?

> I really like this idea. It would also be nice if, for example, the
> closed caption text were available through the DOM so it could be
> presented elsewhere, searched locally, and so on. But what about  
> things
> like album art, which might be embedded in an audio stream? Should  
> that
> be accessible? Should a video element expose a set of known cue points
> embedded in the file?
>
> A more abstract interface is necessary than just 'caption events'.

My instinct is to avoid trying to make a more general interface if  
possible. There are endless types of access you can build to  
information in underlying media elements, and I think it would put a  
large burden on implementors if they had to support accessing all of  
those types of information. Accessibility is one of the most  
important concerns in HTML, however, so I think that having special  
case support for accessibility without providing all of the other  
features would be an acceptable tradeoff.

> Here
> are some use cases worth considering:

<snip a bunch of interesting use cases>

> All of these can be handled by special server-side components and  
> AJAX,
> for example, so the main question is whether the media elements should
> expose this sort of data through the DOM.

Special server-side components and AJAX drastically increases the  
complexity of the system, increases the authoring burden, and makes  
it so that it's not possible to distribute stand-alone content, so if  
possible, I'd prefer to make it possible to do everything we need  
with just plain old JavaScript and the DOM.