[whatwg] Video proposals

Ian Hickson ian at hixie.ch
Thu Mar 15 20:39:07 PDT 2007

Wow, what a lot of feedback on video! I've added a <video> element, with 
basic features, but really what we need is feedback from video experts.

In the meantime, here's replies to the comments I got. I haven't quoted 
all the e-mails, since many said the same thing or went in circles (well, 
they did! sorry!), but if I missed anything, let me know, and I'll address 
it separately.


On Mon, 30 Oct 2006, Maciej Stachowiak wrote:
> The main advantages for distinguished elements would be:
> 1) Better semantics. A search engine indexing documents to find "most 
> popular videos" or the like would be able to see from the source 
> document what is embedded as a video rather than having to guess based 
> on the type or URL an <object> points to. Similarly, screen readers 
> would know that a <video> element might still be partially accessible to 
> a [deaf] user whereas <audio> would not.
> 2) Potential to define a useful common API for controlling timed media; 
> right now each plugin exposes its own different API if it exposes one at 
> all.


On Mon, 30 Oct 2006, Charles Iliya Krempeaux wrote:
> #2: Video players.  (This would be embedding some kind of video screen 
> in a webpage... possibly with a play/pause button, stop button, etc.)
> #4:(Static or animated) thumbnails to videos.

I agree that we should provide the building blocks to build video players. 
I don't understand what you mean by #4 above though.

On Wed, 28 Feb 2007, Anne van Kesteren wrote:
> Opera has some internal expiremental builds with an implementation of a 
> <video> element. The element exposes a simple API (for the moment) much 
> like the Audio() object:
>   play()
>   pause()
>   stop()
> The idea is that it works like <object> except that it has special 
> <video> semantics much like <img> has image semantics. In markup you 
> could prolly use it as follows:
>  <figure>
>    <video src=news-snippet.ogg>
>      ...
>    </video>
>    <legend>HTML5 in BBC News</legend>
>  </figure>
> I attached a proposal for the element and as you can see there are still 
> some open issues. The element and its API are of course open for debate. 
> We're not enforcing this upon the world ;-)

I have added such an element and its corresponding API (influenced by the 
other feedback received) to the specification. Thank you for the proposal 
and implementation experience!

On Wed, 28 Feb 2007, James Justin Harrell wrote:
> Can't such an API be provided for <object> elements that reference video?

Feedback from browser vendors is that overloading <object> is hard. The 
poor state of <object> implementations tends to support this argument.

On Sun, 4 Mar 2007, Maik Merten wrote:
> * Video support in browsers is important IMO. Otherwise the web may more 
> and more slip into dependency on Flash or similiar formats ("We have to 
> use Flash anyway for video, so why not make the whole site with 
> Flash?").



On Mon, 30 Oct 2006, Shadow2531 wrote:
> I think the <video> element should support fallback content like 
> <object>


On Tue, 6 Mar 2007, Elliotte Harold wrote:
> Maik Merten wrote:
> > 
> > Well, I guess everybody here will hate me for proposing it... and I
> > think it's ugly... but well...
> > 
> > <video>
> > Perhaps a verbose description of what can be seen here?
> > <novideo>
> > D'oh, your browser is outdated... let's embed an <object> here
> > </novideo>
> > </video>
> I don't think we need a novideo element. This would work:
> <video>
>   <p>
>     Complete marked up transcript of the video.
>   </p>
> </video>
> This is much more accessible and great for search engine optimization.



On Mon, 30 Oct 2006, Shadow2531 wrote:
> I think *maybe* no attributes should count as params (only 
> param elements).

Well, if we design our own element for a specific purpose (video) then we 
know what the parameters are, so we can use attributes.

> In general, make <video> so there's only one way to do something. That 
> way you don't get:
> <video file="this"></video>
> on some pages and
> <video>
>    <param name="file" value="this">
> </video>
> on others.


On Wed, 1 Nov 2006, Charles Iliya Krempeaux wrote:
> Simplifying [the object element's type attribute] to allow type="video" 
> would make life alot easier on web developers IMO.  And alot of times, 
> when I asked web developers to do this, I didn't care what the subtype 
> was... I only cared whether it was a "video" or not.

Wouldn't this be better served by just having specific elements like <img> 
and <video> that mostly ignore MIME types?


On Mon, 30 Oct 2006, Shadow2531 wrote:
> The handler should also support some type of playlist like 
> <http://www.xspf.org/>.

On Mon, 30 Oct 2006, Charles Iliya Krempeaux wrote:
> #3: Playlists.  (A single video file just won't cut it.)

These were the only requests for playlists. Could you elaborate on the use 
cases for playlists? What are the needs for playlists?


On Mon, 30 Oct 2006, Charles Iliya Krempeaux wrote:
> #5: When to pre-fetch and when NOT to pre-fetch videos (and "download" 
> it at the last possible minute).

Could you elaborate on this?

> #6: JavaScript API for "playing", etc video.
> #7: Scrubbing though video


> #8: Alternate versions.

Could you elaborate on this?

> As I'm going to mention more in my list... I'd recommend that web developers
> can create their [own] UIs... create their own Video Players.


> The frame capturing would be cool (and useful).

Could you elaborate on the use case for this? Since the author will have 
the complete data on his end, there doesn't seem much use for actual frame 
capture on the client.

> Also... when implementing UIs, it's useful to have a "toggle()" 
> procedure. Something that makes it "pause" if it is "playing".  And 
> makes it "play" if it is "pausing".  Without this you have to keep track 
> of the state of the player.

Interesting. I'll bear this in mind.

On Thu, 1 Mar 2007, Shadow2531 wrote:
> [long list of desired features]

I took your suggestions into account when desiging the API. I got feedback 
from a number of people (including some off-list from people who didn't 
want to express their interest publicly), some of which was contradictory, 
so the proposed API doesn't have everything you asked for. Let me know if 
there's anything that you think is missing that you really wanted.

> .loop, .startpos
> loop = false | true
> autostart = true | false
> startpos = 0 | specified pos

Could you elaborate on the use cases for these?

On Thu, 1 Mar 2007, Nicholas Shanks wrote:
> You may want to consider aspect ratio too:  ratio="preserve" being 
> default, ratio="1.333" could indicate 4:3 or get tricky and accept 
> "16:9" for precision reasons.

Wouldn't we simply always want to use the authored size?

On Thu, 1 Mar 2007, Benjamin Hawkes-Lewis wrote:
> Interesting. I just wanted to ask for a bit more detail on how this 
> works in practice and what it can be used for. How would this support 
> audio descriptions, captions, and subtitles? e.g. Can the captions be 
> displayed to match user preferences for fonts and so forth and exposed 
> to accessibility frameworks? Might it support any form of hyperfilm 
> (e.g. clicking on something in the film like one can click on parts of a 
> Flickr photograph, changing perspective etc) or is it intended only for 
> traditional linear video? (These capabilities look like potential 
> advantages of SMIL.)

Are you requesting these features? Or just curious as to whether they are 
supported in Opera's implementation?

On Thu, 1 Mar 2007, Benjamin Hawkes-Lewis wrote:
> Isn't it important that content authors know whether there will or won't 
> be an automatic UI provided, so that end users don't end up being 
> presented with two (possibly conflicting, certainly confusing) UIs? 
> That's why I suggested using an attribute to control For most use-cases, 
> I suspect the minimum functionality would not only be more than enough, 
> but superior than anything the content producer would put together. This 
> would actually make it a lot easier for ordinary HTML authors to put 
> video on the web. If we could mandate captioning and audio description 
> exposure by UAs it would make putting video on the web in an accessible 
> manner much easier too. Which would be great, as it currently seems to 
> be a somewhat complicated task.

Could you elaborate on the captioning aspect?

Regarding the idea of default UI, I agree that it would be useful on the 
long run. The problem is one of feature creep; with just the API we have 
already added a lot, adding a UI on top of that is asking for 
interoperability problems. Baby steps are probably wise here.

I've made the spec allow a UI if it doesn't interfere with an 
author-provided one.

On Thu, 1 Mar 2007, Spartanicus wrote:
> I strongly dislike audio and/or video that automatically downloads and 
> starts playing automatically, so much so that I've disabled media player 
> plugins altogether. Both audio and video files are often considerable in 
> size. I don't want my web browser to start making noise unless I've 
> explicitly chosen to play audio, this should not be the result of simply 
> loading a web page. I'd favour a spec requirement that a UA must offer 
> users a configuration option not to automatically download and start 
> audio and/or video and let the user decide first.

I have made sure the spec handles your use case.

On Mon, 5 Mar 2007, Kornel Lesinski wrote:
> I think it's a good idea to provide API for controlling and monitoring 
> video playback and specyfing that it should be possible to overlay HTML 
> elements on top of a movie. Probably one of the reasons for adoption of 
> Flash as a movie container is ability to create custom players, which 
> are consistent with websites' UI/branding, can add advertisements and 
> other features.



On Mon, 30 Oct 2006, Charles Iliya Krempeaux wrote:
> One of the biggest problems with video on the web (and probably video on 
> the Internet in general) right now is that there is no universally 
> supported video format.
> [good reasons why we need one codec]
> Having said all that, I believe that whatever video format is choosen 
> can NOT be encumbered.  By patents or anything else. [...]
> Given this, I would suggest Ogg Theora be the natively supported video 
> format common to all browsers.  It's designed from the beginning to be 
> unencumbed.  And implementations for it already exist under licenses 
> that should make everyone happy.

A number of other people said similar things about Ogg Theora.

For now, the spec says that UAs SHOULD support Theora for video and Vorbis 
for audio, and SHOULD support the Ogg container format (it's not a MUST 
because some vendors may have legal reasons why they can't or won't 
support it, and there's no point making them non-conforming when they have 
no choice in the matter).

On Thu, 1 Mar 2007, Shadow2531 wrote:
> I think it'd be cool if the video element *just* supported theora.

Supporting only one encoding is not going to fly: you can't stop browser 
vendors from adding features; and you want to allow the standard to evolve 
over time.

On Tue, 31 Oct 2006, Lachlan Hunt wrote:
> Defining which video format for browsers to support is out of scope of 
> the WHATWG and HTML5.

It doesn't have to be out of scope (HTML5 is assuming CSS and JS, for 

On Fri, 2 Mar 2007, Gervase Markham wrote:
> I think there's a strong driver for uptake. As I understand it, all 
> these video-sharing sites are paying mountains of cash to 
> Adobe/Macromedia for the backend software licences to support Flash 
> video streaming. If they could have 15 or 20% fewer servers doing that, 
> and stream to Firefox using Theora instead, the cost saving would be an 
> incentive for them to change their site. Particularly if we implemented 
> <video> in a way which gave them all the capabilities the flash player 
> has - e.g. fast forward, rewind, seek etc.
> Of course, I don't know how those costs compare to the bandwidth bill.

Henri cited my boss earlier in this thread as saying that YouTube uses 
Flash over Ogg Theora primarily due to bandwidth concerns. So...

On Fri, 2 Mar 2007, Elliotte Harold wrote:
> But there's one capability of Flash I don't want to give them: the 
> ability to block users from easily downloading, editing, and reusing the 
> content.
> You may be right, and I hope you are, but I suspect content hording may 
> be important enough to them to justify the extra 15% or 20% cost.

Google Video allows original format download if the uploader enabled it, 
so it seems that this isn't a feature that is necessarily desired.

On Fri, 2 Mar 2007, Magnus Gasslander wrote:
> We need to be 100% sure that the format is patent free (no more GIF).

It is unclear to me how we could do this.

On Thu, 1 Mar 2007, Spartanicus wrote:
> Another current common frustration amongst authors is how to get file
> based media files to play before they've been fully downloaded. This is
> currently achieved by using text based redirector files containing the
> url to the actual media file, but these redirector formats have only
> been defined for a limited number of media formats. That would suggest
> that a UA could by default employ progressive downloading.

I have ensured the spec mentions this.

On Sun, 4 Mar 2007, Maik Merten wrote:
> * Browser makers should negotiate on one base format. This format should 
> be free and available on all platforms. I don't say formats that need 
> patent licensing are evil by-itself, but I'm pretty sure Debian and 
> Fedora would have to remove video support from their browsers if that 
> functionality would depend on a format that needs such licensing. To my 
> knowledge only Ogg Vorbis+Theora are performing well enough and are 
> usually accepted to be "safe" and open.

It's not clear they actually are performing well enough. But yes.

> * I don't think the spec should require implementations to only support 
> one format. It should require at least one base format (see above) and 
> allow optional formats to keep track of codec development and to keep 
> political minds calm. I doubt Microsoft would ever implement a <video> 
> element if they weren't allowed to support their own formats as well (it 
> may be hard enough for them to support any base format not being theirs 
> anyway).



On Mon, 5 Mar 2007, Elliotte Harold wrote:
> If we add a video element, should we for the same reasons add an audio 
> element? If not, why not?
> It seems to me these two cases are similar enough to justify similar 
> treatment. Is there any distinction between the two that would suggest 
> audio is inappropriate while video is appropriate or vice versa?

(Other people made similar comments or mentioned an <audio> element in 

We already have an Audio API, I'm not sure it makes much sense to have an 
<audio> element. What's the use case? Audio is either asynchronous or 
orthogonal to the presentation in most media. We need a <video> _element_ 
because in visual media, visual content has a place relative to the other 
content; but in aural media, aural content under the control of the page 
itself does not have such placement (if it's music, e.g., it can be played 
in the background, and if it's content, then you would alternate between 
playing it and playing the content of the rest of the page; in neither 
case would you simply treat the content of the media as inserted into the 
playback stream with no ability to pause it independently of the main 
document content).


On Tue, 31 Oct 2006, Bjoern Hoehrmann wrote:
> And there I thought <video> had already been introduced in 1998.

Actually the SMIL <video> element is more akin to the HTML <object> 
element than the proposal here. (SMIL <video> is defined to be 
semantically equivalent to SMIL <ref>.)

On Wed, 28 Feb 2007, Bjoern Hoehrmann wrote:
> May I suggest Opera does not implement features that are incompatible 
> with SMIL, the SMIL implementation in Internet Explorer, and SVG for no 
> extraordinarily good reason?

Could I ask you to reply to the various replies you received in response 
to the above comment? I can't really use your feedback without 
understanding it.

On Tue, 6 Mar 2007, Charles McCathieNevile wrote:
> At which point you start heading back to "object". It seems we should 
> either take the SMIL approach and make special containers for each kind 
> of media (how many kinds? What is a flash video that has interactive 
> bits? Or an SVG that is mostly video with a few interaction choices? Or 
> interactive SVG with some audio?), or fix object...

Actually the SMIL approach only has one kind of object, it just has many 
names. (As far as I can tell, at least.)

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list