[whatwg] Apple Proposal for Timed Media Elements

Wed Mar 21 18:44:52 PDT 2007

On Mar 21, 2007, at 6:16 PM, Ian Hickson wrote:

> On Wed, 21 Mar 2007, Maciej Stachowiak wrote:
>>
>> With the recent discussions about the <video> element, we've  
>> decided to
>> post our own proposal in this area. This proposal is a joint  
>> effort from
>> the Safari/WebKit team and some of Apple's top timed media  
>> experts, who
>> have experience with QuickTime and other media technologies.
>
> Great!
>
>
>> http://webkit.org/specs/HTML_Timed_Media_Elements.html
>
> Looking at this in the context of the current spec:
>
> * The <audio>, "controller", playback rate, start and end times,  
> step(),
>   and looping features were left out of the current version of the  
> spec in
>   the interests of simplicity. I understand that Apple wishes to  
> implement
>   a larger set of features in one go than the spec currently  
> describes;
>   naturally I am interested in making sure the spec covers your  
> needs as
>   well. My concern is that we make the spec too complicated for other
>   browser vendors to implement interoperably in one go. Biting off  
> more
>   than we can chew is a common mistake in Web specification  
> development.
>   Starting with simple features, and adding features based on demand
>   rather than just checking off features for parity with other  
> development
>   environments leads to a more streamlined API that is easier to use.
>
>   How should we approach this?

I'd like to hear from other browser vendors how they feel about this.  
We think many of the new features are needed to be able to fully  
replace plugin-based solutions. I think it would be reasonable to  
agree on a "please implement this first" subset, but I'd like to hear  
especially from Mozilla and Opera reps on this.

>   Regarding specific features: what are the use cases for start/end  
> and
>   looping? People keep telling me they're important but the use  
> cases I've
>   seen either don't seem like they would work well with a declarative
>   mechanism (being better implemented directly using cue marks and  
> JS), or
>   are things that you wouldn't do using HTML anyway (like a user  
> wanting
>   to bookmark into a video -- they're not going to be changing the  
> markup
>   themselves, so this doesn't solve their use case).

Looping is useful for more presentational uses of video. Start and  
end time are useful in case you want to package a bunch of small bits  
of video in one file and just play different segments, similar to the  
way content authors sometimes have one big image and use different  
subregions. Or consider looping audio, or a single audio file with  
multiple sound effects. These are two examples.

>   For <audio> in general, there's been very little demand for <audio>
>   other than from people suggesting that it makes abstract logical  
> sense
>   to introduce <audio> and <video> at the same time. But there is  
> clearly
>   demand for something like this on the Web, e.g. internet radio,  
> Amazon
>   track sampling, etc. I'm not sure how similar the APIs should be.

I think <audio> can use almost the exact same APIs for most things as  
<video>. This has the nice side benefit that new Audio() can just  
make an <audio> element and provide all the relevant useful API.

> * I'm concerned about the "type" attribute for content negotiation.

[... snip ...] I'll respond to this in a separate reply about this  
and broader codec issues.

> * The "mute" feature is IMHO better left at the UI level, with the API
>   only having a single volume attribute. This is because there are
>   multiple ways to implement muting, and it seems better to not  
> bias the
>   API towards a particular method.
>
>   (I've seen three major muting interfaces: a mute button that sets a
>   temporary override which is independent of volume, a mute button  
> that
>   simply sets the volume to zero, and a -20dB button that you hit  
> two or
>   three times to get to 0.)
>
>   Having said that, without a mute API, the UA and the author UI can't
>   stay synchronised with respect to mute state.

As discussed on IRC, I think all three models can be implemented well  
with a mute API, and I don't think the mute independent of volume can  
be implemented quite right if multiple things can be controlling the  
video and you don't have a mute API.

> * What's the use case for hasAudio or hasVideo? Wouldn't the author  
> know
>   ahead of time whether the content has audio or video?

That depends. If you are displaying one fixed piece of media, then  
sure. If you are displaying general user-selectable content, then not  
necessarily. You might want to hide or disable volume controls for a  
video with no soundtrack for instance. Or you might want to show some  
filler content for content with a video/* MIME type that does not in  
fact have a video track (which is valid per the relevant RFCs - video  
MIME types say video may be present, but do not promise it).

> * The states in this proposal are orthogonal to the states in the  
> current
>   spec; both look useful, though, and maybe we should have both.  
> Anybody
>   have any opinions on this?

I'll have to read over both sets of states more closely.

Regarding your states: In our proposal, we don't distinguish stopped  
and paused. A stop operation would just be "pause(); currentTime = 0;  
currentLoop = 0;". "AUTOPAUSED" would be the condition where you  
return to "PRESENTABLE" or "UNDERSTANDABLE" state from "PLAYABLE" or  
"PLAYRHOUGHOK" when isPaused is false. "PLAYING" would be the case  
where you are in "PLAYABLE" , "PLAYRHOUGHOK" or "LOADED" state and  
isPaused is false.

So at first glance, I think our proposed states plus the isPaused  
boolean subsume yours, and are more immediately useful for  a custom  
controller UI.

> * Time triggers, or cue marks, are a useful feature that has currently
>   been left in the v2 list; I've heard some demand for this though  
> and I
>   would not be opposed to putting this in v1 if people think we  
> should.

I think it's pretty useful, since a lot of edge-case features (like  
triggering a URL navigation at a particular time in the video) can be  
handled by this.

> * I have no objection to adding more events. Once we have a better  
> idea
>   what should happen here I'll add the relevant events.

Sounds good.

Regards,
Maciej