[whatwg] Cue points in media elements
ddailey at zoominternet.net
Tue May 1 19:41:20 PDT 2007
Hearing about cue points in media elements. Just sorta reminds me of
keyTimes in SMIL.
I know SMIL seems funky to some people, but I do really love it! It is so
way cool! So far as I know it doesn't do quite what you're talking about
here, but it does similar stuff including non-linear distortions of timing
elements and the like.
It's declarative (though I don't think it's Turing complete -- wager of
virtual beans proposed) and its syntax is worthy of emulation in that
classical "ontology recapitulates philology" sort of sense. It is so much a
W3C standard that it has six or eight or twelve standards devoted to it.
(who is trying to learn how not to re-invent wheels)
Damn bastard mutant wheels keep popping up around me like unwanted
copyrighted utterances in a world where intellectual landfills are charged
by the bit!
----- Original Message -----
From: "Brian Campbell" <Brian.P.Campbell at Dartmouth.EDU>
To: "Ralph Giles" <giles at xiph.org>
Cc: <whatwg at whatwg.org>
Sent: Tuesday, May 01, 2007 4:57 PM
Subject: Re: [whatwg] Cue points in media elements
> On Apr 30, 2007, at 7:15 PM, Ralph Giles wrote:
>> Thanks for adding to the discussion. We're very interested in
>> implementing support for presentations as well, so it's good
>> to hear from someone with experience.
> Thanks for responding, I'm glad to hear your input.
>> On Sun, Apr 29, 2007 at 03:14:27AM -0400, Brian Campbell wrote:
>>> in our language, you might see something like this:
>>> (movie "Foo.mov" :name 'movie)
>>> (wait @movie (tc 2 3))
>>> (show @bullet-1)
>>> (wait @movie)
>>> (show @bullet-2)
>>> If the user skips to the end of the media clip, that simply causes
>>> all WAITs on that media clip to return instantly. If they skip
>>> forward in the media clip, without ending it, all WAITs before that
>>> point will return instantly.
>> How does this work if, for example, the user seeks forward, and then
>> back to an earlier position? Would some of the 'show's be undone, or do
>> they not seek backward with the media playback?
> We don't expose arbitrary seeking controls to our users; just play/ pause,
> skip forward & back one card (which resets all state to a known value)
> and skip past the current video/audio (which just causes all waits on
> that media element to return instantly).
>> Is the essential
>> component of your system that all the shows be called in sequence
>> to build up a display state, or that the last state trigger before the
>> current playback point have been triggered?
> The former.
>> Isn't this slow if a bunch
>> of intermediate animations are triggered by a seek?
> Yes, though this is more a bug in our animation API (which could be
> taught to skip directly to the end of an animation when associated
> video/audio ends, but that just hasn't been done yet).
> Actually, that brings up another point, which is a bit more speculative.
> It may be nice to have a way to register a callback that will be called
> at animation rates (at least 15 frames/second or so) that is called with
> the current play time of a media element. This would allow you to keep
> animations in sync with video, even if the video might stall briefly, or
> seek forward or backward for whatever reason. We haven't implemented this
> in our current system (as I said, it still has the bug that animations
> still take their full time to play even when you skip video), but it may
> be helpful for this sort of thing.
>> Does your system support live streaming as well? That complicates the
>> design some when the presentation media updates appear dynamically.
> No, we only support progressive download.
>> Anyway I think you could implement your system with the currently
>> proposed interface by checking the current playback position and
>> clearing a separate list of waits inside your timeupdate callback.
> I agree, it would be possible, but from my current reading of the spec it
> sounds like some cue points might be missed until quite a bit later
> (since timeupdate isn't guaranteed to be called every time anything
> discontinuous happens with the media). In general, having to do extra
> bookkeeping to keep track of the state of the media may be fragile, so
> stronger guarantees about when cue points are fired is better than trying
> to keep track of what's going on with timeupdate events.
>> I agree this should be clarified. The appropriate interpretation should
>> be when the current playback position reaches the frame corresponding to
>> the queue point, but digital media has quantized frames, while the cue
>> points are floating point numbers. Triggering all cue point callbacks
>> between the last current playback position and the current one
>> (including during seeks) would be one option, and do what you want as
>> long as you aren't seeking backward. I'd be more in favor of triggering
>> any cue point callbacks that lie between the current playback position
>> and the current playback position of the next frame (audio frame for
>> <audio/> and video frame for <video/> I guess). That means more
>> bookkeeping to implement your system, but is less surprising in other
> Sure, that would probably work. As I said, bookkeeping is generally a
> problem because it might get out of sync, but with stronger guarantees
> about when cue points are triggered, I think it could work.
>>> If video
>>> playback freezes for a second, and so misses a cue point, is that
>>> considered to have been "reached"?
>> As I read it, cue points are relative to the current playback position,
>> which does not advance if the stream buffer underruns, but it would
>> if playback restarts after a gap, as might happen if the connection
>> drops, or in an RTP stream. My proposal above would need to be amended
>> to handle that case, and the decoder dropping frames...finding the right
>> language here is hard.
> Yes, it's a tricky little problem. Our current system stays out of
> trouble because it makes quite a few simplifying assumptions (video is
> played forward only, progressive download, not streaming, etc).
> Obviously, in order to support a more general API, you're going to have
> to deal with some trickier issues.
> I guess the main question to ask is what is the purpose of a cue point?
> Is it to specify "at the moment this is called, the media is at this
> point," and what does that mean when you have quantized frames and
> floating point times? Is it to specify "the media has passed this
> particular point in playback", and what does that mean when you're
> playing backwards or seeking?
>> I really like this idea. It would also be nice if, for example, the
>> closed caption text were available through the DOM so it could be
>> presented elsewhere, searched locally, and so on. But what about things
>> like album art, which might be embedded in an audio stream? Should that
>> be accessible? Should a video element expose a set of known cue points
>> embedded in the file?
>> A more abstract interface is necessary than just 'caption events'.
> My instinct is to avoid trying to make a more general interface if
> possible. There are endless types of access you can build to information
> in underlying media elements, and I think it would put a large burden on
> implementors if they had to support accessing all of those types of
> information. Accessibility is one of the most important concerns in HTML,
> however, so I think that having special case support for accessibility
> without providing all of the other features would be an acceptable
>> are some use cases worth considering:
> <snip a bunch of interesting use cases>
>> All of these can be handled by special server-side components and AJAX,
>> for example, so the main question is whether the media elements should
>> expose this sort of data through the DOM.
> Special server-side components and AJAX drastically increases the
> complexity of the system, increases the authoring burden, and makes it so
> that it's not possible to distribute stand-alone content, so if possible,
> I'd prefer to make it possible to do everything we need with just plain
More information about the whatwg