[whatwg] Implementation difficulties for MediaController

Wed Mar 30 11:09:41 PDT 2011

On Mar 29, 2011, at 9:05 PM, Ian Hickson wrote:

> On Tue, 29 Mar 2011, Jer Noble wrote:
>> 
>> Contained is Eric and my feedback as to the difficulty of implementing 
>> this proposal in Apple's port of WebKit:
> 
> Thank you very much for your feedback. I'll look into it more tomorrow 
> when I update the spec, but in the meantime I had some additional 
> questions:
> 
> 
>>> * playing tracks synchronised at different offsets
>> 
>> However, if the in-band tracks will be played at a different time 
>> offsets, or at different rates, playback becomes just as inefficient as 
>> playing independent files.  To implement this we will have to open two 
>> instances of a movie, enable different tracks on each, and then play the 
>> two instances in sync.
> 
> Is that acceptable? That is, are you ok with implementing multiple file 
> (or two instances of the same file at different offsets) synchronization?

Yes, this would be acceptable.

>>> * playing tracks at different rates
>> 
>> In addition to the limitation listed above, efficient playback of tracks 
>> at different rates will require all tracks to be played in the same 
>> direction.
> 
> Ah, interesting.
> 
> Is it acceptable to implement multiple playback at different rates if 
> they're all in the same direction, or would you (at least for now) be 
> significantly helped by forcing the playback rates to be the same for all 
> slaved media tracks?

It would be significantly easier to implement an across-the-board playback rate for all media elements in a media group.  This seems like a reasonable restriction for the first version of the API.

>>> * changing any of the above while media is playing vs when it is 
>>> stopped
>> 
>> Modifying the media groups while the media is playing is probably 
>> impossible to do without stalling.  The media engine may have thrown out 
>> unneeded data from disabled tracks and may have to rebuffer that data, 
>> even in the case of in-band tracks.
> 
> That makes sense. There's several ways to handle this; the simplest is 
> probably to say that when the list of synchronised tracks is changed, 
> or when the individual offsets of each track or the individual playback 
> rates of each track are changed, the playback of the entire group should 
> be automatically stopped. Is that sufficient?

I would say that, instead, it would be better to treat this as similar to seeking into an unbuffered region of a media file.  Some implementers will handle this case better than others, so this seems to be a Quality of Service issue.

> (In the future, if media frameworks optimise these cases, or if hardware 
> advances sufficiently that even inefficient implementations of this are 
> adequate, we could add a separate flag that controls whether or not this 
> automatic pausing happens.)

It seems that this could be determined on the authors' side by pausing before operations that may cause significant buffering delays, without the need for a new flag.

>> From a user's point of view, your proposal seems more complicated than 
>> the basic use cases merit.  For example, attempting to fix the 
>> synchronization of improperly authored media with micro-adjustments of 
>> the playback rate isn't likely to be very successful or accurate.  The 
>> metronome case, while an interesting experiment, would be better served 
>> through something like the proposed Audio API.
> 
> Indeed. The use cases you mention here aren't the driving factor in this 
> design, they're just enabled mostly as a side-effect. The driving factor 
> is to avoid the symmetry problem described below:
> 
>> Slaving multiple media elements' playback rate and current time to a 
>> single master media element, Silvia and Eric's proposal, seems to 
>> achieve the needs of the broadest use cases.
> 
> The problem with this design is that it is highly asymetric. The 
> implementation of a media element needs to have basically two modes: slave 
> and master, where the logic for both can be quite different. (Actually, 
> three modes if you count the lone media element case as a separate mode.) 
> This then also spills into the API, where the master is exposing both the 
> network state of its own media, as well as the overall state of playback. 
> We end up having to handle all kinds of special cases, such as what 
> happens when the master track is shorter than a slaved track, or what 
> happens when the master track is paused vs when a slaved track is paused. 
> It's not impossible to do, but it is significantly more messy than simply 
> having a distinct "master" object and having all the media elements only 
> deal with one "mode" (two if a lone media element counts as separate), 
> namely the "slave" mode. Any asymetry is reflected as differences between 
> the controller and the media element. Each media element only has to deal 
> with its own networking state, etc.
> 
> For an example of why this matters, consider the use case of a movie site 
> with the option of playing movies with a director's commentary track. Some 
> director's commentaries are shorted than the movie (most, in fact, since 
> many directors stop commenting on the movie when the credits start!). Some 
> are longer (e.g. some Futurama commentaries, where the commentary is 
> basically a bunch of the cast and crew chatting away for a while and 
> sometimes they don't really care that the show is over, they still have 
> stuff to talk about). If we have to make a media element into the master, 
> then how do we handle this case without the site having to determine ahead 
> of time which is the longer track to decide which to use as the master?
> 
> With just a controller, it doesn't matter.
> 
> (Admittedly, as currently specified the controller object lacks a defined 
> "current time" and min and max times, and an Web app would have to 
> determine what the seek bar should display by looking at all the tracks. 
> But we can fix that in a later version. It's much harder to fix in the 
> case of one media element being promoted to a different state than the 
> others, since we already have defined what the media element API does.)
> 

The distinction between a master media element and a master media controller is, in my mind, mostly a distinction without a difference.   However, a welcome addition to the media controller would be convenience APIs for the above properties (as well as playbackState, networkState, seekable, and buffered).  The case for a master media element is those APIs already exist and would simply need to be repurposed.  But adding new API to the MediaController to achieve the same functionality would, again in my mind, eliminate the remaining distinction between a master media element and a media controller. 

From an author's POV, without these APIs, calculating the media group's buffered region (for example) is an extremely complicated task.

>> If adding independent playback rates becomes necessary later, adding 
>> this support in a future revision will be possible.
> 
> Individual playback control is definitely not a critical use case, it's 
> just something that falls out of the design when you have a separate 
> controller object.

Noted.

Thanks!

-Jer

 Jer Noble <jer.noble at apple.com>