[whatwg] Start position of media resources

Fri Jun 5 13:06:27 PDT 2009

On Fri, 1 May 2009, Silvia Pfeiffer wrote:
> David wrote:
> >
> > which is fine.  I don't see the problem;  given a fragment we
> > a) focus the user's attention on that fragment
> > b) attempt to optimize network traffic to display that fragment as quickly
> > as possible
> >
> > Neither of these stop
> > c) the user from casting his attention elsewhere
> > d) more network transactions being done to support this
> 
> 
> re c):
> It depends on how the UA displays it. If the UA displays the 5s offset
> as the beginning of the video, then the user cannot easily jump to 0s
> offset. I thought this was the whole purpose of the discussion:
> whether we should encourage UAs to display just the addressed segment
> in the timeline (which makes sense for a 5sec extract from a 2 hour
> video) or whether we encourage UAs to display the timeline of the full
> resource only. I only tried to clarify the differences for the UA and
> what the user gets, supporting an earlier suggestion that UAs may want
> to have a means for switching between full timeline and segment
> timeline display. Ultimately, it's a UA problem and not a HTML5
> problem.

I agree that this is more of a UI issue. The API exposes the entire clip.

On Fri, 1 May 2009, David Singer wrote:
> 
> I think we came to a slightly more abstract conclusion, that the UA 
> focuses the user's initial attention on the indicated fragment.
> 
> [And we are silent about how it does that, and also about how easy it is 
> to look elsewhere.]

The spec says that the UA has to seek to the indicated time, but it 
doesn't preclude anything else, certainly.

On Mon, 4 May 2009, Jonas Sicking wrote:
> 
> I think there are two use cases:
> 
> 1. Wanting to start the user at a particular point in a video, while 
> still showing the user that the full video is there. For example in a 
> political speech you may want to start off at a particularly interesting 
> part, while still allowing the viewer to "rewind" to any part of the 
> speech in order to gain more context if so desired. This is very similar 
> to how web pages work today if you include a fragment identifier. The UI 
> accounts for the full page, but the page is scrolled to a particular 
> part.

Right.

> 2. Wanting to only show a small part of a longer video. For example in 
> the video containing a movie, it would be possible to link to a 
> particular scene, with a given start and end time.

This can be done by providing a custom (JS) controller for the element, 
though nothing can ever prevent the user from overriding this (at the 
extreme, by just downloading the video and watching it in a separate 
player, but indeed the browser is supposed to allow this anyway).

> The danger of only doing 2, even if it's possible somehow for the user 
> to switch to make the UI display the full range of the movie, is that 
> unless the UI is extremely obvious, most users are not going to see it.
> 
> Or to put it another way. I think there is a use case for both linking 
> to a specific point in a video file, as well to point to a range in it. 
> Probably even a combination of the two where you point to a point inside 
> a range.

That's possible. I think we should probably wait until the next version of 
the API before introducing an explicit way of limiting the range, though.

On Thu, 7 May 2009, David Singer wrote:
> 
> Roughly, yes.  I am saying that
> 
> ? -- the author of the URI has to know that the server he points the URI 
> at supports the ? syntax.  The server essentially makes a resource using 
> the query instructions, and delivers it to the UA.
> 
> # -- the UA focuses the user's attention on, and optimizes the network 
> usage for that focus of, the indicated fragment.  It does this (a) 
> visually, using whatever indicator it likes (we don't specify what the 
> 'controller' looks like) and (b) using whatever network support it can 
> get from the server (time-range, byte-range, or no support at all).
> 
> A reason I say this is that technically I believe that # is stripped by 
> the UA;  we cannot then put a delivery requirement in, because that 
> would apply to the server, which doesn't even get to see the # in all 
> likelihood.

This seems to be an issue for the working group defining these syntaxes; 
the HTML5 spec just says that if "the address of the current media 
resource [indicates] a particular start time" that you seek to it, without 
elaborating.

On Fri, 8 May 2009, Conrad Parker wrote:
> 
> However I don't think this should change how we do data transport (ie. 
> the HTTP and the media container). I'd suggest that the scope that 
> should be covered in HTML5 is to suggest how much seek bar should be 
> displayed for the video. However when the user navigates on that seek 
> bar (by clicking on a random position), the UA may use whatever means 
> are appropriate for the media type [as suggested by the media fragments 
> wg] in order to retrieve that data.

The HTML5 spec doesn't even guarantee that there is a seek bar, but it 
does suggest that UAs allow the user to seek to any seekable position.

On Fri, 8 May 2009, David Singer wrote:
> 
> The reason I want clarity is that this has ramifications.  For example, 
> if a UA is asked to play a video with a fragment indication 
> #time="10s-20s", and then a script seeks to 5s, does the user see the 
> video at the 5s point of the total resource, or 15s?  I think it has to 
> be 5s.

This is consistent with the spec's current requirements, I believe, yes.

On Thu, 30 Apr 2009, Robert O'Callahan wrote:
> On Thu, Apr 30, 2009 at 12:00 AM, Ian Hickson <ian at hixie.ch> wrote:
> > On Thu, 30 Apr 2009, Robert O'Callahan wrote:
> > > > >
> > > > > So I think a safer design would be to interpret currentTime as 
> > > > > relative to the startTime, perhaps renaming startTime to 
> > > > > 'timeOffset' instead?
> > > >
> > > > I considered that, but it seems that in the streaming video 
> > > > ("DVR-like") case, in the steady state where the data in the 
> > > > buffer is being thrown away at the same rate as the video is being 
> > > > played you'd end up in a weird position of the currentTime not 
> > > > changing despite the video playing, which would likely be even 
> > > > more confusing.
> > >
> > > Why should the "start time" change in this case? I assume you mean 
> > > the server is streaming video and does not support sending any data 
> > > except the data for the current time, and the UA is caching a window 
> > > of data. Then I would expect the element to expose a fixed start 
> > > time (the time, relative to the start of the resource, at which the 
> > > UA first opened the stream). As the stream plays, 'duration' would 
> > > increase and the 'seekable' and 'buffered' TimeRanges would change 
> > > to reflect the data the UA has in its buffer.
> >
> > I mean, e.g., a TiVo-like interface, where the input is a TV tuner, or 
> > a streaming video service that doesn't let you seek but where the UA 
> > is buffering 15 minutes of content so that the user can seek within 
> > that. The way this is supported in the spec now, the start time (the 
> > "earliest possible position") continually changes, so that the UI 
> > doesn't show an increasingly long time frame, but instead only shows 
> > the last 30 minutes.
> 
> To me, "the earliest possible position" seems redundant with the 'seekable'
> TimeRanges.

Yes, generally speaking .startTime is a convenient way to obtain 
.seekable.start(0). The exception is that if the content is not seekable 
at all, the .startTime attribute will still return a value (either the 
same as currentTime, if the UA is playing live video without buffering at 
all, or zero, if it's just that the codec doesn't support seeking) whereas 
the .seekable attribute in those cases would return an empty object.

> I think your use-case can be exposed perfectly adequately with an 
> indefinite 'duration' and 'seekable' and 'buffered' TimeRanges changing 
> over time (which will happen in many non-streaming cases too).

I don't understand the difference between what you're proposing here and 
what the spec says, except that the spec exposes the first start point of 
the seekable object as .startTime.

> There is no need to have 'startTime' changing; that seems an unnecessary 
> complication. I think that dynamically changing the "time coordinate 
> system" in which 'currentTime' is interpreted is a bad idea.

If the user agent is dynamically throwing out data continually, keeping 
only a small amount of data buffered, as for example the BitGravity player 
as seen here:

   http://live.twit.tv/

...I don't see what else we can do but change the seekable range (and thus 
the startTime). The original start time is not of much interest (it's an 
arbitrary time -- when the user started watching) and the current playback 
time is definitely advancing.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'