[whatwg] <video> element feedback

Mon Oct 8 21:04:23 PDT 2007

This e-mail replies to e-mails sent to both whatwg at whatwg.org and 
public-html at w3.org, as the thread in question ended up spilling over both 
mailing lists.

WHEN REPLYING TO THIS E-MAIL PLEASE PICK ONE MAILING LIST AND REPLY TO 
JUST THAT ONE. PLEASE DO NOT CROSS-POST THIS THREAD TO BOTH LISTS.

Also, please adjust the subject line appropriately to talk about just the 
issue you are responding to.

Thanks!

On Tue, 20 Mar 2007, Benjamin Hawkes-Lewis wrote:
> 
> Obviously a preferable solution would be for everyone to create 
> accessible content using open technologies in the first place, and we 
> must do everything we can to encourage and enable that. But falling 
> short of such revolutions, can anyone suggest an alternative way of 
> limiting the disillusion caused by inaccessible downloads?

In the case of multimedia content, the best place for content whose 
purpose is ensuring universal use is within the multimedia content itself, 
so that when the data is moved (e.g. when the user saves a video file to 
disk) the content remains universally useful ("accessible").

This also allows us to leverage industry expertise without reinventing 
any wheels ourselves.

> What would happen if the <video> element actually contained <audio> 
> elements for the audio, <audiodescription> elements for the audio 
> descriptions, <caption> elements for the captions, and <subtitle> 
> elements for the subtitles? Would it be technologically possible for 
> HTML elements to act as containers in that way?

The result would be that using video in an HTML context required far more 
work for accessibility purposes than using the same video in non-HTML 
contexts (e.g. on an iPod). This would be bad for the end-user.

> Alternatively (thinking of XSPF playlists), what if <video>'s src 
> attribute pointed to an XML (or text/html-esque) file which contained 
> these separate elements? It would be a powerful way of building a level 
> of transparent accessibility into the system, without requiring users to 
> download and play high-bandwidth content to find out if it has the 
> features they need.

Most video formats already have support for timed text and other features 
for accessible content. So effectively what the spec says today is pretty 
much what you describe here, except that we side-step the problem of 
having to invent a new format.

On Tue, 20 Mar 2007, Benjamin Hawkes-Lewis wrote:
> > 
> > I have seriously considered doing this. Unfortunately I don't think we 
> > can actually do it given the large amount of legacy content, e.g. 
> > tutorials for how to embed flash which encourage use of <object>.
> 
> In the unlikely event that <object> be in any way discouraged, can we 
> ensure we allow element level fallback content for <img> (or some 
> replacement element) as opposed to the alt attributes we're currently 
> lumbered with and the longdesc attribute that WHATWG has done away with?

I'll discuss <img> fallback as part of <img> feedback.

On Wed, 21 Mar 2007, Gareth Hay wrote:
>
> This is a bit of a sideways step here, but why not make tags reflect 
> MIME type,
> 
> e.g.
> <image>		image/*
> <video>		video/*
> <application> 	application/*
> <audio>		audio/*
> 
> That way we have a clear identification of what is going to be in the 
> tag, API's can be tailored sufficiently for each one. Each tag can have 
> appropriate fallback also. Just a thought, and it gets us out of the 
> <object> hole.

That's pretty much where we are today with HTML5, except for the vague 
category "application/*".

On Wed, 21 Mar 2007, Martin Atkins wrote:
> 
> What do you imagine "application" being used for?
> 
> The "application" type category is pretty-much just "miscellaneous".

Indeed.

On Tue, 20 Mar 2007, Robert Brodrecht wrote:
> Simon Pieters said:
> >
> > Oh. I thought <video> fallback would work pretty much like <object> 
> > fallback, but I see that's not the case. When I think about it it 
> > makes sense; <video> is pretty much like <iframe>, it never falls back 
> > in UAs that support it.
> 
> Oh, damn it.  I thought it'd work like <object>, too.  I'm not sure I 
> like the only-fallback-if-no-support idea.  I'm getting the feeling that 
> there won't be one common video format among the browsers.  I think not 
> having fallback to nested video elements to get at other formats would 
> ultimately be a bad thing.

We already have fallback for multiple formats through the use of multiple 
<source> elements.

> When PNG support sucked in IE6, I just didn't use alpha PNGs and opted 
> for some other format.  If there is no shared format, the only ways to 
> support multiple video formats for multiple browsers would be:
> 
> 1. Just have two video elements on screen (bad).
> 2. Swap the src with JavaScript (won't work if JS is off).
> 3. Delegate content on the server based on http-accept [?] (best of the 
> three, but not very fun).
> 4. Maybe conditional comments if IE is the oddball (we'll see, but I 
> don't like this option much either).

5. The <source> element. :-)

> Any thoughts on this or did I miss something?

Yup, the <source> element. :-)

On Wed, 21 Mar 2007, Laurens Holst wrote:
> > 
> > <object> right now is overloaded to do at least four things:
> > 
> >    * inline images
> >    * plugins
> >    * nested browsing contexts (iframes)
> >    * fallback container
> > 
> > ...each of which has very distinct behaviour
> 
> I really don’t see how those are different, except for fallback 
> content.

They are very different from an implementation perspective.

> > (e.g. whether it has a scripting context, whether it shrinkwraps, 
> > whether it is replaced or not;
> 
> That is implemented today, so it is possible. Also, I think these 
> differences only apply to fallback content vs. other content…? The 
> problem here is: I don’t know. Nobody seems to know, except for you. 

I'm far from the only experienced person here, I assure you! However, this 
lack of clarity is one of the reasons we started the HTML5 project in the 
first place. Hopefully many of the differences are now documented in the 
spec, which should make this easier to approach.

> > Adding a fifth (inline video with an API) would increase the 
> > complexity yet again.
> 
> There is no ‘adding’. Video is already embedded via <object>, today. 
> Also, having video via <object> is no different from having images, so I 
> don’t see why you consider it a separate thing.

I understand that you do not consider <object> implementation complexity 
to be high; however, browser vendors have uniformally expressed their 
despair at the difficulty of implementing <object> fully and correctly, 
which is a clear sign, to me at least, that adding further features to 
<object> is poor language design.

> > <object> is *very badly* implemented. It has been a decade since 
> > <object> was first created and browsers STILL don't do it right in all 
> > cases (or even in most cases, frankly). Adding more complexity to such 
> > a disaster zone is bad design.
> 
> If the existing problems with <object> are so severe that it can’t be 
> reused (which I somehow doubt), create a new element where you do it 
> right. However, don’t start separating it out into separate tags.

The problem, as far as I can tell, is that it is just one element. "Doing 
it right" means splitting the problem into different elements.

> You are using one argument, current implementation of object is broken 
> in several ways, to promote another idea, splitting up what is perceived 
> as different types of media into separate tags.

The argument I'm using is that the element's current overloaded design led 
to uniformally poor implementations. I am basing this on what browser 
vendors have told me. If the problem is that the element is overloaded, 
splitting it seems like the best way forward.

> > It wouldn't be "simply", though. You'd need to define how to determine 
> > what the media group is, you'd need to define how to change from one 
> > type to another, you'd need to have browsers implement all this on top 
> > of all their existing bugs -- sometimes, it's just better to keep 
> > things separate, even if they seem like they could be abstracted out 
> > into one concept. We can't ignore our past experiences in designing 
> > HTML5.
> 
> That’s just nonsense. It is generic. Create drawing object, retrieve 
> source, check MIME type, invoke renderer depending on MIME type with 
> drawing object as parameter. Really, what is so difficult here.

I encourage you to try to implement <object> in a Web browser in a fully 
interoperable way; that will likely answer your question. :-)

> I find this argument really awkward. Especially since you’re saying 
> that anything that doesn’t fall in the <video> category *or* is 
> not-natively implemented could still use <object>. So apparantly it does 
> work ‘good enough’ after all.

We have <embed> for this case.

> You keep saying that <object> is a huge bag of problems. It would help 
> if someone who knows exactly what aspects of <object> are implemented 
> badly (you) would instead of proactively making changes to a 
> specification on his own judgement with input from others, create a 
> document that clearly describes the issues with <object> and what is 
> implemented consistently and where browser implementations differ. That 
> way everyone can consider what is wrong exactly, and how it can be 
> fixed.

We've somewhat tried that with <object> in HTML5. However, I have 
received strong feedback from implementors that they do not wish to add 
further features to <object>.

> Because without that, it is really just a guessing game. For a change 
> like this, there needs to be a clear overview of what is wrong first. 
> Otherwise, it is just people saying you should do this or that, and you 
> responding overloaded this browser authors that, and there is no real 
> way to verify that what you say is correct, to make a general estimate 
> of how big the problem is that is tried to be fixed, to provide 
> alternative suggestions, and to judge whether what you say is wrong 
> really warrants these changes (personally, I think not). I would like to 
> see a more structured approach, and frankly, a more open approach.

It's not clear to me how I could really be more open.

One way to verify what I've been saying is this: if implementing <video> 
in <object> was so easy, why wouldn't browser vendors have done it by now? 
Meanwhile, at least three browser vendors have indicated strong interest 
in implementing <video>.

On Wed, 21 Mar 2007, Sander Tekelenburg wrote:
> 
> I thought the idea of the Web was that the user is always in control 
> (because the author cannot know the user's browsing environment). Why 
> would authors ever have to be in control?

We've added controls="" now, and the spec suggests that some controls 
always be available. So this should be a non-issue now.

> If <video> is to be a first-class Netizen, it'd better not be
> javascript-dependant.

The spec now requires controls to be visible when scripting is disabled.

> Something else concerning first-class Netizenry: I'd like to see the 
> spec to require UAs support implicit anchors, so that one can link to a 
> specific startpoint: <URL:http://domain.example/movie.ogg#21:08>, to 
> mean "fetch the movie and start playing it at 21 minutes 8 seconds into 
> the movie". (Or better yet, if this can be achieved reliably, don't 
> fetch the entire movie, but only from 21:08 on.)

We have a start="" attribute for this now for inline videos. It does, 
however, prevent the user from seeking to before that point. The only 
current way to start somewhere other than the start of the video clip and 
be able to seek back is to use the API.

When you link straight to a video (with no <video> element in use) the 
fragment identifer syntax is left up to the MIME type RFC for that video, 
and is out of scope of HTML5.

> > I agree that <video> needs a standard UI (in v2, at least).
> 
> It needs it right from the start, in v1.0. Without it, it would be like 
> a browser without its own back button, relying on authors to provide 
> such functionality.

The spec now has this.

On Wed, 21 Mar 2007, Spartanicus wrote:
> 
> Recently I have begun to fear that the principle of not relying on CSS 
> [1] and/or Javascript for anything essential has also been abandoned by 
> various specifications.

I assure you that this is still a concern for me, at least; HTML5 for 
instance introduces irrelevant="" to help with this.

On Wed, 21 Mar 2007, MegaZone wrote:
> 
> Strongly agreed.  I know more than a few people who are (still) rabidly 
> anti-JavaScript as end-users, because of the repeated security issues in 
> various implementations - and how it keeps popping up in things like 
> Quicktime where you wouldn't necessarily expect it.

(There are security issues when you have scripting disabled just like when 
you have it enabled; I think disabling scripting is more of a good luck 
charm than a real security measure. But anyway.)

With JS disabled, <video> will now have controls, per spec.

On Fri, 23 Mar 2007, Silvia Pfeiffer wrote:
> 
> About 8 years ago, we had the idea of using fragment offsets to start 
> playing from offsets of media files. However, in discussions with the 
> URI standardisation team at W3C it turned out that fragment offsets are 
> only being seen by the UA that sends them, so they will never reach the 
> web server. This makes it impossible to use them for "play from this 
> offset" since obviously the offsetting should be done by the server and 
> avoid downloading the bunch of data that comes before the offset point.
> 
> The only solution was to use the query "?" identifier for defining 
> offsets.
> 
> This has been done and specified in 
> http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.txt, though 
> we never took it through full standardisation.
> 
> An implementation of that feature can be seen at 
> http://media.annodex.net/cmmlwiki/ where the videos are marked-up with 
> sections and the sections are referred to through URIs such as 
> http://media.annodex.net/cmmlwiki/SFD2005-Trailer?t=0:00:01.962.
> 
> Another example is used by the metavid guys: e.g. 
> http://metavid.ucsc.edu/overlay/video_player/webview?stream_name=house_proceeding_12-07-06&t=0:14:02/0:14:37 
> which provides a section out of the video.
> 
> Both of the above given examples use Ogg Theora as the video, though the 
> files are being served through a server plugin (in both cases it is 
> mod_annodex) that provides for the offset functionality without breaking 
> the file format.

The problem with such an approach is that while it lets the user start 
from that point, it doesn't let the user seek to before that point if he 
downloads the whole video and uses it offline.

On Thu, 22 Mar 2007, Nicholas Shanks wrote:
> >
> > This makes it impossible to use them for "play from this offset" since 
> > obviously the offsetting should be done by the server and avoid 
> > downloading the bunch of data that comes before the offset point.
> 
> But it doesn't stop the UA from taking cues from the markup (such as my 
> gegenschein example from yesterday) and generating a query such as 
> ?start=17:33. They don't have to request the exact value of the src 
> attribute.

Don't they? Why not?

On Thu, 22 Mar 2007, Kornel Lesinski wrote:
> 
> I think we had in mind (at least I did) URL of the page that contains 
> the video, not the URL of the video file itself. Because of this 
> indirection it's completly up to UA to read fragment identifier and 
> translate it into appropriate HTTP request for the video file (which 
> could use Range header that's more proxy-friendly than query string).

That wouldn't work very well on pages with multiple videos (which is 
common already today).

> Let's say there's http://example.com/example.html page which contains 
> embedded video: ...<video src="video.ogg">...
> 
> I'd like to be able to construct URL like: 
> http://example.com/example.html#@12:35 that would cause UA to start 
> playing the embedded video.ogg from 12:35.

One could certainly see script in the page supporting this.

On Fri, 23 Mar 2007, Silvia Pfeiffer wrote:
> 
> The Range header is providing for byteranges to be downloaded. There is 
> however no simple way to map a timerange to a byterange without finding 
> out about the filetype. So, in effect, if you are trying to get a 
> byterange, you will have to request the server to inspect the file and 
> to return to you a mapping of a timerange to a byterange before you can 
> undertake a byterange query, which can then be proxied.
> 
> This process is exactly what we suggested in 
> http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.txt.

Interesting stuff. Unfortunately this URI doesn't seem to work any more.

On Fri, 23 Mar 2007, Kornel Lesinski wrote:
> 
> If such altered behavior of play() is not unacceptable, then that might 
> work:
> 
> http://example.com/example.html#myvideo@12:35
> 
> Where "myvideo" part is interpreted as ID of element in the document 
> (and if there's no such element - assume document.body). If the element 
> is a <video>, then seek() that video. If it isn't, then seek first 
> <video> descendant of that element (something like: 
> (document.getElementById("myvideo") || 
> document.body).getElementsByTagName('video')[0].seek(12*60000+35000))).

This certainly can be implemented by script on the page.

> My rationale:
>
> * it doesn't require any changes to the document, so user can control 
> starting position in any document, even if author didn't think of such 
> possibility

Well, you can do that just by creating a new page with a <video> element 
pointing to the part you want. Embedding videos is quite common, there's 
no reason it couldn't also be common to embed with a start="" point.

> * It's part of document's URL, not URL of the video file, so user 
> doesn't have to extract video file URL from the document and can still 
> use the page (which provides controls for the video).

True.

> * it can be implemented in JavaScript with current <video> API (also in 
> User JavaScript, but I think for interoperability it's important to be 
> part of the spec).

I think this is something we can safely wait until v2 to add, though.

> * it's orthogonal to server-side support for seeking

Indeed.

> > Also, it could be interpreted by the UA only, since everything after 
> > "#" will not be transferred to the server.
> 
> Yes, that's intentional. It allows user to modify *any* URL without risk 
> of breaking it (some servers/applications may not like extra query 
> string). I think use of hash for this is appropriate - just like UA 
> scrolls HTML to given element, UA would "scroll" the video - it's just a 
> change of axis from Y to time :)

You can't do this today with, e.g., iframes, or images, though. Why are 
videos special?

On Sat, 24 Mar 2007, Silvia Pfeiffer wrote:
>
> How about the following idea:
> 
> Example.html contains:
> 
> <video id="myvideo_1" src="video.ogg">
>   to provide the full video
> 
> <video id="myvideo_2" src="video.ogg?t=0:12:35">
>   to provide the video from offset 12:35
> 
> <video id="myvideo_3" src="video.ogg?t=0:12:35/0:20:40">
>   to provide the video segment between offset 12:35 and 20:40
> 
> <video id="myvideo_4" src="video.ogg?id=section4">
>   to provide the video from named offset "section4"
> 
> These provide the Web page author with the power to do offsets.

The spec does this today with attributes. We don't want to require the URI 
to use a specific syntax, since that would severely limit the options on 
the server side (and would somewhat step outside the bounds on what 
HTML5's scope should be -- we're supposed to be URI agnostic).

> And example URLs relating to the webpage:
> 
> http://example.com/example.html#myvideo_1&t=0:12:35
>   to provide the Web page with the first video playing from offset 12:35,
>   offset action provided by  the UA (i.e. video gets fully downloaded)
>
> http://example.com/example.html#myvideo_1?t=0:12:35/0:20:40
>   to provide the Web page with the first video playing section 12:35-20:40,
>   UA resolves this to create a query for video.ogg?t=0:12:35/0:20:40,
>   offset action provided by the server
>
> These provide the user with control over the start points of the videos 
> on the Webpage.

I'm not sure they really do so in a particularly user-friendly way, 
though. (It also isn't immediately clear to me that there is a big demand 
for this kind of feature, given what we see generally on the Web with 
video today.) In any case, it seems like a feature we can safely introduce 
later in version 2 -- the current API is already probably too heavy for us 
to get decent interoperability in version 1.

On Fri, 23 Mar 2007, Sander Tekelenburg wrote:
> 
> While that might be useful, it's not at all obvious to me that it is a 
> *requirement*. What is so wrong with fetching the entire file, and start 
> playing it at the point referenced by the fragment identifier? That's 
> how fragment identifiers work for textual resources (and they fetch the 
> usual truckload of images along with the HTML file).

Well, video files are orders of magnitude bigger, as you point out.

On Fri, 23 Mar 2007, Gareth Hay wrote:
>
> In this case, there is a big difference between streamed data, which can 
> be played from various positions, and non-streamed data which requires a 
> complete download, or at least the start of the file.
> 
> Perhaps there should be some reflection of this in the tag?

It's not clear to me what we should do.

On Sat, 24 Mar 2007, Silvia Pfeiffer wrote:
>
> The difference between streaming and non-streaming is artificial and not 
> technically necessary - except for life content, where you cannot jump 
> "into the future".

The spec currently doesn't distinguish these cases.

On Sat, 24 Mar 2007, Geoffrey Sneddon wrote:
> On 23 Mar 2007, at 03:15, liorean wrote:
> > 
> > Well, it would be nice to not have to download an hour long lecture to 
> > see the 30 second interval of interest starting at at 47:26... 
> > However, as I understand the Ogg Theora format, it contains essential 
> > data for decoding in the start of the file, so unless the server has 
> > some format specific knowledge and handling the client must either 
> > have already gotten that information somehow, or must request the 
> > entire file. I have no idea whether the other codecs I've heard 
> > discussed (Dirac and H.264) have a similar issue or not.
> 
> That sort of info is held within the container, so everything within Ogg 
> (so both Theora and Dirac) will suffer from it.

Or benefit from it -- you can fetch the start, then jump to where the data 
you want is, without geting the whole file.

On Sat, 24 Mar 2007, Maik Merten wrote:
> 
> Well, with Ogg you can just fetch a bit of the start (seems that's 
> needed for MPEG, too - I just killed a few bytes from the beginning of a 
> .mp4 files and it won't play) and get an educated guess about bitrate 
> etc. to directly jump to a position in the file (you there get a precise 
> timestamp). If you ended up jumping too far away from the destination 
> you can repeat once or twice and you're "close" enough.
> 
> That has been done before, works like a charm. 
> http://stream.fluendo.com/demos.php?stream=ondemand (that's a Java Ogg 
> Theora streaming applet)

Good to know. Hopefully implementations will use this.

On Sat, 24 Mar 2007, Silvia Pfeiffer wrote:
> 
> 1) The UA doesn't know what byterange a timecode or timerange maps to. 
> So, it has to request this information from the server, who has access 
> to the file. For QuickTime movies, the UA would need to request the 
> offset table from the server and for AVI it would need to request the 
> chunking information.
> 
> 2) Just streaming from an offset of a video file often breaks the file 
> format. For nearly all video formats, there are headers at the beginning 
> of a video file which determine how to decode the video file. Lacking 
> this information, the video files cannot be decoded. Therefore, a simple 
> byterange request of a subpart of the video only results in undecodable 
> content. The server actually has to be more intelligent and provide a 
> re-assembled correct video file if it is to stream from an offset.

Why can't the client just get the start of the file and the middle of the 
file and do the work of seeking itself?

On Sat, 24 Mar 2007, Michael Dale wrote:
>
> There is no reason why both methods can't be supported. If people wanted 
> to use annodex for seeking they could just write a js function that will 
> remap the src of the video element on seek overriding the UA http offset 
> seek method.

Indeed.

> The UA http offset method evoked by stream_id#time as the src would be 
> more or less equivalent to the calling stream_id.seek(time); All that is 
> required of video element is that it be open to annodex content and not 
> freak out when the src element has a request string and the timestamps 
> for the video stream don't start with zero.

The spec doesn't currently say that the times in the API are in any way 
related to the times in the media file, other than requiring one unit of 
media time to be treated as one unit of wall clock time.

Should it say somewhere that times in the media file must be examined and 
processed such that if the file starts at a non-zero media time, an offset 
must be applied equal to that media time to obtain API time?

On Fri, 23 Mar 2007, Sander Tekelenburg wrote:
> 
> If the spec requires UAs to be able to return the movie's "duration" and 
> "current position", etc. (which I got the impression is the intention of 
> both Opera and Apple's proposals), to for instance allow, through 
> javascript, playing from a certain point, then I don't see why it would 
> not be possible to trigger the same event through a fragment identifier. 
> I don't see how this would require anything from the author.

It's not clear to me exactly how this could work, but I'm open to 
suggestions.

We could also make the seeking happen through the autostart="" attribute, 
or through a new attribute, or punt it to v2 for now.

> (That aside, a lot of what is being defined on this list is javascript, 
> not HTML. The popular term "HTML5" is misguiding. The offical name "Web 
> Apps 1.0" is more descriptive.)

Indeed, the "HTML5" spec also covers DOM5 HTML, the API to the HTML5 
language.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'