[whatwg] Apple Proposal for Timed Media Elements

Wed Mar 21 18:16:51 PDT 2007

On Wed, 21 Mar 2007, Maciej Stachowiak wrote:
> 
> With the recent discussions about the <video> element, we've decided to 
> post our own proposal in this area. This proposal is a joint effort from 
> the Safari/WebKit team and some of Apple's top timed media experts, who 
> have experience with QuickTime and other media technologies.

Great!

> http://webkit.org/specs/HTML_Timed_Media_Elements.html

Looking at this in the context of the current spec:

* The <audio>, "controller", playback rate, start and end times, step(),
  and looping features were left out of the current version of the spec in 
  the interests of simplicity. I understand that Apple wishes to implement
  a larger set of features in one go than the spec currently describes;
  naturally I am interested in making sure the spec covers your needs as 
  well. My concern is that we make the spec too complicated for other
  browser vendors to implement interoperably in one go. Biting off more
  than we can chew is a common mistake in Web specification development.
  Starting with simple features, and adding features based on demand 
  rather than just checking off features for parity with other development
  environments leads to a more streamlined API that is easier to use.

  How should we approach this?

  Regarding specific features: what are the use cases for start/end and 
  looping? People keep telling me they're important but the use cases I've 
  seen either don't seem like they would work well with a declarative 
  mechanism (being better implemented directly using cue marks and JS), or 
  are things that you wouldn't do using HTML anyway (like a user wanting 
  to bookmark into a video -- they're not going to be changing the markup 
  themselves, so this doesn't solve their use case).

  For <audio> in general, there's been very little demand for <audio> 
  other than from people suggesting that it makes abstract logical sense 
  to introduce <audio> and <video> at the same time. But there is clearly
  demand for something like this on the Web, e.g. internet radio, Amazon
  track sampling, etc. I'm not sure how similar the APIs should be.

* I'm concerned about the "type" attribute for content negotiation. 
  Historically, type attributes are very badly implemented and even less 
  reliably used. Conditional fallback in general is badly implemented and 
  bug-prone especially in the context of dynamic changes. In addition, I'm
  not convinced there is much in the way of multi-codec data on the Web
  that would be addressed by this. Most sites that have multiple codecs
  available typically have different sizes available, as in, for example:

    http://www.apple.com/iphone/hello/

  ...or simply provide the user with links explaining why the user would 
  want one or the other:

    http://www.spacex.com/webcast.php

  I rarely see multiple levels of <object> fallback used. I think simply 
  showing the various options to the user would, while not being ideal, be 
  overall more reliably implemented and less likely to break. Having 
  magic UI isn't perfect if it doesn't work. :-)

  We can further ensure that we don't have problems like this by requiring 
  a baseline codec like Ogg Theora to be implemented by all UAs (not, of 
  course, to the exclusion of anything else).

  By not having automatic fallback, we sidestep a huge set of issues. (As 
  noted in the draft, if we really want this, maybe media queries is a 
  better way to do it.)

* The "mute" feature is IMHO better left at the UI level, with the API 
  only having a single volume attribute. This is because there are 
  multiple ways to implement muting, and it seems better to not bias the 
  API towards a particular method.

  (I've seen three major muting interfaces: a mute button that sets a 
  temporary override which is independent of volume, a mute button that
  simply sets the volume to zero, and a -20dB button that you hit two or 
  three times to get to 0.)

  Having said that, without a mute API, the UA and the author UI can't 
  stay synchronised with respect to mute state.

* What's the use case for hasAudio or hasVideo? Wouldn't the author know 
  ahead of time whether the content has audio or video?

* The states in this proposal are orthogonal to the states in the current 
  spec; both look useful, though, and maybe we should have both. Anybody 
  have any opinions on this?

* Time triggers, or cue marks, are a useful feature that has currently 
  been left in the v2 list; I've heard some demand for this though and I 
  would not be opposed to putting this in v1 if people think we should.

* I have no objection to adding more events. Once we have a better idea 
  what should happen here I'll add the relevant events.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'