[whatwg] <video> feedback

Ian Hickson ian at hixie.ch
Thu Mar 25 17:49:22 PDT 2010

On Mon, 8 Mar 2010, balachandar muruganantham wrote:
> I have heard from people that there have been a discussion on supporting 
> the fullscreen mode for HTML5 video element. can anyone share the 
> information on the conclusion we arrived at? i searched in the archive 
> but i could not come to any conclusion.

The conclusion was that it is a presentational issue and therefore should 
be handled in one of the CSSOM specs. Unfortunately we don't have anyone 
who has the bandwidth to edit a spec to specify how to make things go 
full-screen. WebKit is experimenting with some APIs in this space, I 

On Wed, 10 Feb 2010, Brian Campbell wrote:
> On Feb 9, 2010, at 9:03 PM, Ian Hickson wrote:
> > On Sat, 31 Oct 2009, Brian Campbell wrote:
> >> 
> >> As a multimedia developer, I am wondering about the purpose of the 
> >> timeupdate event on media elements.
> > 
> > It's primary use is keeping the UIs updated (specifically the timers 
> > and the scrubber bars).
> > 
> >> On first glance, it would appear that this event would be useful for 
> >> synchronizing animations, bullets, captions, UI, and the like.
> > 
> > Synchronising accompanying slides and animations won't work that well 
> > with an event, since you can't guarantee the timing of the event or 
> > anything like that. For anything where we want reliable 
> > synchronisation of multiple media, I think we need a more serious 
> > solution -- either something like SMIL, or the SMIL subset found in 
> > SVG, or some other solution.
> Yes, but that doesn't exist at the moment, so our current choices are to 
> use timeupdate and to use setInterval().

Yes, currently synchronising accompanying slides and animations isn't 
supported. I expect in the relatively near future we'll add something to 
address this problem.

> >> At 4 timeupdate events per second, it isn't all that useful. I can 
> >> replace it with setInterval, at whatever rate I want, query the time, 
> >> and get the synchronization I need, but that makes the timeupdate 
> >> event seem to be redundant.
> > 
> > The important thing with timeupdate is that it also fires whenever the 
> > time changes in a significant way, e.g. immediately after a seek, or 
> > when reaching the end of the resource, etc. Also, the user agent can 
> > start lowering the rate in the face of high CPU load, which makes it 
> > more user-friendly than setInterval().
> I agree, it is important to be able to reduce the rate in the face of 
> high CPU load, but as currently implemented in WebKit, if you use 
> timeupdate to keep anything in sync with the video, it feels fairly 
> laggy and jerky. This means that for higher quality synchronization, you 
> need to use setInterval, which defeats the purpose of making timeupdate 
> more user friendly.
> Perhaps this is just a bug I should file to WebKit, as they are choosing 
> an update interval at the extreme end of the allowed range for their 
> default behavior; but I figured that it might make sense to mention a 
> reasonable default value (such as 30 times per second, or once per frame 
> displayed) in the spec, to give some guidance to browser vendors about 
> what authors will be expecting.

Well, as mentioned, it's really not intended for keeping things synced up.

> Most (if not all) video formats supported by <video> in the various 
> browsers do not store alpha channel information. In order to composite 
> video against a dynamic background, authors may copy video data to a 
> canvas, then paint transparent to all pixels matching a given color.
> This use case would clearly be better served by video formats that 
> include alpha information, and implementations that support compositing 
> video over other content, but given that we're having trouble finding 
> any video format at all that the browsers can agree on, this seems to be 
> a long way off, so stop-gap measures may be useful in the interim.
> Compositing video over dynamic content is actually an extremely 
> important use case for rich, interactive multimedia, which I would like 
> to encourage browser vendors to implement, but I'm not even sure where 
> to start, given the situation on formats and codecs. I believe I've seen 
> this discussed in Theora, but never went anywhere, and I don't have any 
> idea how I'd even start getting involved in the MPEG standardization 
> process.

If compositing is a use case we should address, we should address it 
explicitly, so that it is performant. We shouldn't have people copy each 
frame to a canvas, process it, then copy it back.

However, I don't think compositing video, as important as it is, is more 
important than some of the other missing features we have (such as 
subtitles), so let's wait a bit longer before adding it!

On Thu, 11 Feb 2010, Robert O'Callahan wrote:
> On Thu, Feb 11, 2010 at 8:19 AM, Brian Campbell 
> <lambda at continuation.org>wrote:
> > 
> > But no, this isn't something I would consider to be production 
> > quality. But perhaps if the WebGL typed arrays catch on, and start 
> > being used in more places, you might be able to start doing this with 
> > reasonable performance.
> With WebGL you could do the chroma-key processing on the GPU, and 
> performance should be excellent. In fact you could probably prototype 
> this today in Firefox.

That does seem like the way to go.

On Wed, 10 Feb 2010, Silvia Pfeiffer wrote:
> I'd like to address the issue video source selection where a content 
> provider wants to stream the best quality video to the user based either 
> on maximising use of the pipe between him and the user, or based on 
> explicit user choice.
> Firstly, I think that explicit user choice isn't a problem. [...]
> Secondly, choosing the best video encoding format for a given 
> user-server connection (and user device capabilities) is actually a 
> really difficult decision to make automatically.
> Let's say we declare the quality in the <source> elements in some form
> or other (either an additional attribute or by addition to the media
> queries). Now we have to take this information into account in the
> source selection algorithm, since we are asking the UA to make a
> choice of which media source to use based on the quality information.
> The source selection algorithm goes through the list of <source>
> elements from top to bottom and stops at the first one that it is able
> to play. It does not check whether in that list there would be a
> better choice. Thus, we have to require from authors to build the list
> in a way that the highest quality content is put at the top of the
> list, while lower qualities are put further down.
> For example:
> <video>
>   <source src='video-hd.ogv' media='quality:1.0' type='video/ogg;
> codecs="theora, vorbis"'>
>   <source src='video-hq.ogv' media='quality:0.5' type='video/ogg;
> codecs="theora, vorbis"'>
>   <source src='video-sd.ogv' type='video/ogg; codecs="theora, vorbis"'>
> </video>
> Now, we need to devise an algorithm for UAs to determine which quality
> to choose based on the given computer/device and connection. This is
> not trivial, but let's assume we are able to do so and set
> * quality:1.0 to any connection >5Mbit, CPU >  2GHz, and
> * quality:0.5 to any connection > 1Mbit, CPU > 1.5GHz.
> This would be measured once during source selection and thus the choice 
> made. But it's actually not a guarantee that it will work. If your 
> connection degrades or your CPU gets busy with other applications, the 
> choice may need to be revised. YouTube doesn't currently allow for this, 
> so this kind of solution would replicate what YouTube does at this point 
> - which doesn't seem to be such a bad thing, since YouTube is acceptable 
> for most people.
> An improvement over this would be the introduction of an adaptive stream 
> scaling scheme over HTTP, similar to e.g. Microsoft's Smooth Streaming 
> and Apple's HTTP Live Streaming (also note: Adobe is in the process of 
> developing HTTP streaming support). There is no such thing available for 
> Ogg yet, but the Ogg community is interested in developing/using 
> something that is open and fulfills the needs for HTML5. It may well be 
> that an activity should be taken up by the WHATWG (or W3C? or IETF?) to 
> develop a media-format independent adaptive streaming standard over 
> HTTP. The point about adaptive streaming is that it does not require any 
> new HTTP headers to deliver the data or any new software on the HTTP 
> server - the choice is made client-side by switching between different 
> encodings of the same resource on the server. This requires declaration 
> of the available alternative files to the client - which could either be 
> done inside HTML5 or through some extra resource. Apple's scheme, for 
> example, uses m3u-based files (m3u8), while MS's scheme uses SMIL-like 
> files (ismv).
> Apple's scheme is already going through the IETF for standardisation as 
> an informal RFC, but not through a working group. Apple's scheme is 
> based on massive creation of small chunks (e.g. 10s duration) on the 
> server - an overhead that could possibly be avoided by using W3C Media 
> Fragment URIs. There are lots of things to discuss for such an activity 
> and the WHATWG may not be the best forum for discussing this - though in 
> the end it's up to the browser vendors to implement it, so maybe it 
> would.
> Note that adaptive HTTP streaming deliberately avoids introducing new 
> HTTP parameters and server requirements, because these are really 
> difficult to roll out, in particular since they also create new 
> requirements on HTTP proxy infrastructure.
> If we develop such an adaptive streaming approach, the source selection 
> algorithm would then select the default resource to stream from, while 
> being given the option for adaptive streaming through the extra 
> information (e.g. delivered through an extra attribute on the <source> 
> elements, e.g. @adaptive="alternatives.xml"). There could then be 
> dynamic switching between the files listed as alternatives in the 
> @adaptive file.

I think that having such a mechanism is probably the best way to go. It is 
orthogonal to the issue of format and media features, which is what 
<source> is really meant to chose between, and it means that the quality 
can be changed dynamically, which is far more likely to be necessary with 
quality than with format and media features.

This requires no changes to <video>; <video> is protocol agnostic.

> Incidentally, it may make more sense to expose the actual components of 
> "quality" explicitly in media queries, just like they are explicitly 
> exposed both in m3u8 and ismv, in particular bandwidth and resolution.

Resolution is already exposed. It might make sense to expose bandwidth; 
that is an issue for the CSSWG.

On Mon, 15 Feb 2010, Hugh Guiney wrote:
> I can *maybe* see this feature being a video player UI component (more 
> on why in a bit), though not a JS-based one. I imagine people with 
> slower computers/connections and/or in more restrictive environments, 
> who probably stand to benefit the most from this, would be more likely 
> to have JS off.

I'm not aware of such a correlation.

> Additionally, it would require document authors to take on the 
> responsibility of scripting their own content selection algorithms 
> (unless there's a standard library that everyone just copies and 
> pastes), which seems unnecessary given the fact that resource selection 
> is already capable of being done by the browser and/or server.

To be fair, making such an algorithm (mapping a user selection to a source 
URL) is pretty trivial, certainly far less work than the rest of the video 
player would be.

> [Having users log in and express a preference] may be fine for video 
> portal sites, but not every page utilizes logins. Most people just want 
> to share a video they made or like with their audience, the same way 
> they would an image. And they may be using a free blogging service that 
> doesn't allow them to implement additional features.

Would such authors still have multiple video qualities though?

> > [The source selection algorithm doesn't check if a later video might 
> > be better.]
> Which is why I think it'd ultimately be best to update the source 
> selection algorithm to function non-linearly. I realize that may be 
> asking a lot at this stage in the game, but then again, it'd have to be 
> changed to incorporate quality values anyway.

Indeed, if there are multiple axes and videos for each combination, the 
current algorithm won't work. Is that going to be a common situation? I 
think we may be considering using the algorithm for purposes for which it 
wasn't intended here.

On Tue, 16 Feb 2010, Eric Carlson wrote:
> On Feb 15, 2010, at 11:30 PM, Tim Hutt wrote:
> > Anyway, with respect to the actual discussion. My vote is to add two 
> > optional tags to <video>
>   I assume you mean to add these to the <source> element rather than 
> <video>?
> > : bitrate="800" (in kb/s) and
>   If a UA is to use bitrate as a selection criteria, what data should it 
> base the selection on? Would you have it ping the server where the 
> resource is located? If so, how much data should it be required to read?

On Tue, 16 Feb 2010, Ashley Sheridan wrote:
> Well, if there were a choice of bitrates, mobile devices could choose 
> the lowest (assuming the other bitrate versions were all the same 
> format, etc). I'm not sure whether this is something that would be 
> better handled from the server though, to deliver media based on the 
> user agent it can see?

On Tue, 16 Feb 2010, Tim Hutt wrote:
> It's up the UA. It can ping the server if it wants. If I were writing 
> the UI for firefox, for example I would have it do the following:
> 1. Display a drop-down of available video formats: "640x480, 400
> kpbs", "800x600, 600 kbps", etc.
> 2. The default choice would be the option that is most similar to the
> previous value a user selected. There would also be an option in the
> preferences: "[x] Always use the highest available video quality"
> where highest is decided by bitrate, then resolution.
> 3. If the default isn't the highest quality, show a little "Better
> quality available" tooltip similar to youtube's "Watch in HD".
> 4. If the video stutters a lot, and there is a lower quality video
> available, display a (non-modal) message along the lines of "Lower
> quality video is available, it may work better."

It seems like a better solution is to just have the quality be dynamically 
dialed up or down based on negotiation between the server and the client.

On Tue, 16 Feb 2010, Tim Hutt wrote:
> On 16 February 2010 16:08, Gregory Maxwell <gmaxwell at gmail.com> wrote:
> > On Tue, Feb 16, 2010 at 10:33 AM, Tim Hutt <tdhutt at gmail.com> wrote:
> >> It's up the UA.
> >
> > Imagine that you are a user-agent. Place these streams in order of "quality":
> >
> > 1.  854x480 4:2:0 @  1mbit/sec. average rate.
> > 2. 1280x720 4:2:0 @  1mbit/sec. average rate.
> > 3.  640x360 4:4:4 @  2mbit/sec. average rate.
> My point exactly. There is no single 'quality' metric, so the best we 
> can do is give the user agent the relevant information and let it 
> decide.

Decide how?

> > I don't think it's hard to imagine that in each of these cases there 
> > exists a real "quality" ranking which the creator of the videos could 
> > be well aware of, but that no user-agent could determine 
> > automatically.
> No I think the user agent is in the best position to decide. Let's think 
> about this logically. The only factors affecting the choice are:
> 1. Hardware specs of the player.
> 2. Bandwidth of the network connection.
> 3. Data cost of the connection.

Also quality of the encoder, the type of video (e.g. talking heads vs 
panning across fast-moving action), the other running apps on the client's 
hardware, etc.

> These are both best known by the UA (or the user for 3.). Consider the
> following examples:
> * A phone with hardware mpeg4 decoding (so it's not only video quality
> that comes into the decision; codec too).
> * A user with a slow computer (no 720p) but a very fast network
> connection (this was me until recently).
> * A user with a fast computer, but a monthly data cap.
> In each case the UA (or the user) is in a much better position to
> decide than the content author. There's probably no foolproof
> preference function, but that doesn't mean we shouldn't try to make an
> educated choice, and give the user the option to override it.

It seems like we either want the client to negotiate it dynamically, or we 
just want to expose the choices to the user independent of the <source> 
metchanism (e.g. in JS). I don't think we'd gain much by making the 
<source> elements have even more information on them -- I'm getting less 
and less convinced that even the media="" attribute is that useful.

On Tue, 16 Feb 2010, David Singer wrote:
> I am by no means convinced that automatic selection of sources other 
> than that based on the most obvious, automated, criteria, is wise or 
> needed.  We have had for many years, in QuickTime, this facility, and 
> quite a few sites opted not to use it and allow the user a manual choice 
> instead.

That is possibly the most useful information in this thread. :-)

On Sat, 20 Feb 2010, Philip Jägenstedt wrote:
> [...] I don't think it's realistic to change the resource selection 
> algorithm to support dynamic switching between sources and think this 
> kind of thing belongs at the protocol level or in a resource file.

I agree, at least for now. I think we should at least wait a while and see 
what authors actually do with <video>. Maybe they don't even need 
<source>, and prefer to just do everything using src="" and JS.

On Wed, 10 Feb 2010, Simon Pieters wrote:
> > +  <div class=example>
> > +   <p>If the author isn't sure if the user agents will all be able to
> > +   render the media resources provided, the author can listen to the
> > +   <code title=event-error>error</code> event on the last
> > +   <code><a href=#the-source-element>source</a></code> element and trigger
> > fallback behaviour:</p>
> > +   <pre><video controls autoplay>
> > + <source src='video.mp4' type='video/mp4; codecs="avc1.42E01E,
> > mp4a.40.2"'>
> > + <source src='video.ogv' type='video/ogg; codecs="theora, vorbis"'
> > +         onerror="fallback(parentNode)">
> > + ...
> > +</video>
> > +<script>
> > + function fallback(video) {
> > +   // replace <video> with its contents
> > +   while (video.hasChildNodes())
> > +     video.parentNode.insertBefore(video.firstChild, video);
> > +   video.parentNode.removeChild(video);
> > + }
> > +</script></pre>
> The script should probably be before the video, because it's possible 
> that a UA will fire the error event before having parsed the script 
> defining the function.


> Also, the script results in invalid HTML since it puts <source>s outside 
> <video>.


Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list