[whatwg] Apple Proposal for Timed Media Elements
ian at hixie.ch
Wed Mar 21 18:16:51 PDT 2007
On Wed, 21 Mar 2007, Maciej Stachowiak wrote:
> With the recent discussions about the <video> element, we've decided to
> post our own proposal in this area. This proposal is a joint effort from
> the Safari/WebKit team and some of Apple's top timed media experts, who
> have experience with QuickTime and other media technologies.
Looking at this in the context of the current spec:
* The <audio>, "controller", playback rate, start and end times, step(),
and looping features were left out of the current version of the spec in
the interests of simplicity. I understand that Apple wishes to implement
a larger set of features in one go than the spec currently describes;
naturally I am interested in making sure the spec covers your needs as
well. My concern is that we make the spec too complicated for other
browser vendors to implement interoperably in one go. Biting off more
than we can chew is a common mistake in Web specification development.
Starting with simple features, and adding features based on demand
rather than just checking off features for parity with other development
environments leads to a more streamlined API that is easier to use.
How should we approach this?
Regarding specific features: what are the use cases for start/end and
looping? People keep telling me they're important but the use cases I've
seen either don't seem like they would work well with a declarative
mechanism (being better implemented directly using cue marks and JS), or
are things that you wouldn't do using HTML anyway (like a user wanting
to bookmark into a video -- they're not going to be changing the markup
themselves, so this doesn't solve their use case).
For <audio> in general, there's been very little demand for <audio>
other than from people suggesting that it makes abstract logical sense
to introduce <audio> and <video> at the same time. But there is clearly
demand for something like this on the Web, e.g. internet radio, Amazon
track sampling, etc. I'm not sure how similar the APIs should be.
* I'm concerned about the "type" attribute for content negotiation.
Historically, type attributes are very badly implemented and even less
reliably used. Conditional fallback in general is badly implemented and
bug-prone especially in the context of dynamic changes. In addition, I'm
not convinced there is much in the way of multi-codec data on the Web
that would be addressed by this. Most sites that have multiple codecs
available typically have different sizes available, as in, for example:
...or simply provide the user with links explaining why the user would
want one or the other:
I rarely see multiple levels of <object> fallback used. I think simply
showing the various options to the user would, while not being ideal, be
overall more reliably implemented and less likely to break. Having
magic UI isn't perfect if it doesn't work. :-)
We can further ensure that we don't have problems like this by requiring
a baseline codec like Ogg Theora to be implemented by all UAs (not, of
course, to the exclusion of anything else).
By not having automatic fallback, we sidestep a huge set of issues. (As
noted in the draft, if we really want this, maybe media queries is a
better way to do it.)
* The "mute" feature is IMHO better left at the UI level, with the API
only having a single volume attribute. This is because there are
multiple ways to implement muting, and it seems better to not bias the
API towards a particular method.
(I've seen three major muting interfaces: a mute button that sets a
temporary override which is independent of volume, a mute button that
simply sets the volume to zero, and a -20dB button that you hit two or
three times to get to 0.)
Having said that, without a mute API, the UA and the author UI can't
stay synchronised with respect to mute state.
* What's the use case for hasAudio or hasVideo? Wouldn't the author know
ahead of time whether the content has audio or video?
* The states in this proposal are orthogonal to the states in the current
spec; both look useful, though, and maybe we should have both. Anybody
have any opinions on this?
* Time triggers, or cue marks, are a useful feature that has currently
been left in the v2 list; I've heard some demand for this though and I
would not be opposed to putting this in v1 if people think we should.
* I have no objection to adding more events. Once we have a better idea
what should happen here I'll add the relevant events.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg