[whatwg] Cue points in media elements

Tue May 1 19:41:20 PDT 2007

Hearing about cue points in media elements. Just sorta reminds me of 
keyTimes in SMIL.

I know SMIL seems funky to some people, but I do really love it! It is so 
way cool! So far as I know it doesn't do quite what you're talking about 
here, but it does similar stuff including non-linear distortions of timing 
elements and the like.

It's declarative (though I don't think it's Turing complete -- wager of 
virtual beans proposed) and its syntax is worthy of emulation in that 
classical "ontology recapitulates philology" sort of sense. It is so much a 
W3C standard that it has six or eight or twelve standards devoted to it.

David Dailey
(who is trying to learn how not to re-invent wheels)
http://srufaculty.sru.edu/david.dailey/copyright/dailey_on_copyright.htm

Damn bastard mutant wheels keep popping up around me like unwanted 
copyrighted utterances in a world where intellectual landfills are charged 
by the bit!
-- anonymous

----- Original Message ----- 
From: "Brian Campbell" <Brian.P.Campbell at Dartmouth.EDU>
To: "Ralph Giles" <giles at xiph.org>
Cc: <whatwg at whatwg.org>
Sent: Tuesday, May 01, 2007 4:57 PM
Subject: Re: [whatwg] Cue points in media elements

> On Apr 30, 2007, at 7:15 PM, Ralph Giles wrote:
>
>> Thanks for adding to the discussion. We're very interested in
>> implementing support for presentations as well, so it's good
>> to hear from someone with experience.
>
> Thanks for responding, I'm glad to hear your input.
>
>> On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:
>>
>>> in our language, you might see something like this:
>>>
>>>   (movie "Foo.mov" :name 'movie)
>>>   (wait @movie (tc 2 3))
>>>   (show @bullet-1)
>>>   (wait @movie)
>>>   (show @bullet-2)
>>>
>>> If the user skips to the end of the media clip, that simply causes
>>> all WAITs on that  media clip to return instantly. If they skip
>>> forward in the media clip, without ending it, all WAITs before that
>>> point will return instantly.
>>
>> How does this work if, for example, the user seeks forward, and then
>> back to an earlier position? Would some of the 'show's be undone,  or do
>> they not seek backward with the media playback?
>
> We don't expose arbitrary seeking controls to our users; just play/ pause, 
> skip forward & back one card (which resets all state to a  known value) 
> and skip past the current video/audio (which just causes  all waits on 
> that media element to return instantly).
>
>> Is the essential
>> component of your system that all the shows be called in sequence
>> to build up a display state, or that the last state trigger before the
>> current playback point have been triggered?
>
> The former.
>
>> Isn't this slow if a bunch
>> of intermediate animations are triggered by a seek?
>
> Yes, though this is more a bug in our animation API (which could be 
> taught to skip directly to the end of an animation when associated 
> video/audio ends, but that just hasn't been done yet).
>
> Actually, that brings up another point, which is a bit more  speculative. 
> It may be nice to have a way to register a callback that  will be called 
> at animation rates (at least 15 frames/second or so)  that is called with 
> the current play time of a media element. This  would allow you to keep 
> animations in sync with video, even if the  video might stall briefly, or 
> seek forward or backward for whatever  reason. We haven't implemented this 
> in our current system (as I said,  it still has the bug that animations 
> still take their full time to  play even when you skip video), but it may 
> be helpful for this sort  of thing.
>
>> Does your system support live streaming as well? That complicates the
>> design some when the presentation media updates appear dynamically.
>
> No, we only support progressive download.
>
>> Anyway I think you could implement your system with the currently
>> proposed interface by checking the current playback position and
>> clearing a separate list of waits inside your timeupdate callback.
>
> I agree, it would be possible, but from my current reading of the  spec it 
> sounds like some cue points might be missed until quite a bit  later 
> (since timeupdate isn't guaranteed to be called every time  anything 
> discontinuous happens with the media). In general, having to  do extra 
> bookkeeping to keep track of the state of the media may be  fragile, so 
> stronger guarantees about when cue points are fired is  better than trying 
> to keep track of what's going on with timeupdate  events.
>
>> I agree this should be clarified. The appropriate interpretation  should
>> be when the current playback position reaches the frame  corresponding to
>> the queue point, but digital media has quantized frames, while the cue
>> points are floating point numbers. Triggering all cue point callbacks
>> between the last current playback position and the current one
>> (including during seeks) would be one option, and do what you want as
>> long as you aren't seeking backward. I'd be more in favor of  triggering
>> any cue point callbacks that lie between the current playback position
>> and the current playback position of the next frame (audio frame for
>> <audio/> and video frame for <video/> I guess). That means more
>> bookkeeping to implement your system, but is less surprising in other
>> cases.
>
> Sure, that would probably work. As I said, bookkeeping is generally a 
> problem because it might get out of sync, but with stronger  guarantees 
> about when cue points are triggered, I think it could work.
>
>>>                                                           If video
>>> playback freezes for a second, and so misses a cue point, is that
>>> considered to have been "reached"?
>>
>> As I read it, cue points are relative to the current playback  position,
>> which does not advance if the stream buffer underruns, but it would
>> if playback restarts after a gap, as might happen if the connection
>> drops, or in an RTP stream. My proposal above would need to be amended
>> to handle that case, and the decoder dropping frames...finding the  right
>> language here is hard.
>
> Yes, it's a tricky little problem. Our current system stays out of 
> trouble because it makes quite a few simplifying assumptions (video  is 
> played forward only, progressive download, not streaming, etc). 
> Obviously, in order to support a more general API, you're going to  have 
> to deal with some trickier issues.
>
> I guess the main question to ask is what is the purpose of a cue  point? 
> Is it to specify "at the moment this is called, the media is  at this 
> point," and what does that mean when you have quantized  frames and 
> floating point times? Is it to specify "the media has  passed this 
> particular point in playback", and what does that mean  when you're 
> playing backwards or seeking?
>
>> I really like this idea. It would also be nice if, for example, the
>> closed caption text were available through the DOM so it could be
>> presented elsewhere, searched locally, and so on. But what about  things
>> like album art, which might be embedded in an audio stream? Should  that
>> be accessible? Should a video element expose a set of known cue points
>> embedded in the file?
>>
>> A more abstract interface is necessary than just 'caption events'.
>
> My instinct is to avoid trying to make a more general interface if 
> possible. There are endless types of access you can build to  information 
> in underlying media elements, and I think it would put a  large burden on 
> implementors if they had to support accessing all of  those types of 
> information. Accessibility is one of the most  important concerns in HTML, 
> however, so I think that having special  case support for accessibility 
> without providing all of the other  features would be an acceptable 
> tradeoff.
>
>> Here
>> are some use cases worth considering:
>
> <snip a bunch of interesting use cases>
>
>> All of these can be handled by special server-side components and  AJAX,
>> for example, so the main question is whether the media elements should
>> expose this sort of data through the DOM.
>
> Special server-side components and AJAX drastically increases the 
> complexity of the system, increases the authoring burden, and makes  it so 
> that it's not possible to distribute stand-alone content, so if  possible, 
> I'd prefer to make it possible to do everything we need  with just plain 
> old JavaScript and the DOM.
>
>