[whatwg] [media] startOffsetTime, also add startTime?

Mon Apr 2 17:21:43 PDT 2012

On Fri, 9 Mar 2012, Philip Jägenstedt wrote:
> On Thu, 08 Mar 2012 19:16:40 +0100, Ian Hickson <ian at hixie.ch> wrote:
> > On Thu, 8 Mar 2012, Philip Jägenstedt wrote:
> > > 
> > > I suggest the property offsetTime, defined as the stream time in 
> > > seconds which currentTime and duration are relative to.
> > 
> > I don't understand what this means. The currentTime is relative to the 
> > media timeline, which is UA-defined and "should" be based on the media 
> > timeline.
> 
> The BBC wrote a blog post [1] about how currentTime varies between 
> Firefox and Chrome. Opera does the same as Firefox here. You're right, 
> however, that the way "media timeline" doesn't make any guarantee that 
> currentTime starts at 0 or that duration is the duration. I think that 
> the implementations predate the "media timeline" concept, and I agree 
> with the BBC blog post that the Opera/Firefox behavior is better. 
> Controls written assuming that currentTime goes from 0 to duration won't 
> break and duration will actually mean duration.

Controls written assuming that currentTime goes from 0 to duration are 
going to look mighty ugly when dealing with infinite streams where the 
browser is only buffering the last 30 minutes, DVR-style. I don't think 
this is a sane assumption.

Or to put it another way: currentTime does always go from 0 to duration, 
and duration could be Infinity; but at any particular time, only a part of 
that is a seekable range.

> > > In practice it would often be understood as the "time since the 
> > > server began streaming" and would be useful to sync live streams 
> > > with out-of-band content simply by letting the out-of-band content 
> > > be relative to the start of the stream.
> > 
> > That "should" be zero. I can change that to a "must" if you like; it's 
> > a "should" because in some cases (e.g. MJPEG) you don't know what the 
> > media timeline is or how to interpret it, so there's no way to do it.
> 
> Which "should" are you referring to here?

"If the media resource somehow specifies an explicit timeline whose origin 
is not negative, then the media timeline should be that timeline" and "In 
the absence of an explicit timeline, the zero time on the media timeline 
should correspond to the first frame of the media resource".

> I really don't know what startOffsetTime is intended for. AFAICT it's a 
> piece of metadata that you could just as well provide out-of-band, but 
> for convenience it is exposed via the DOM API. I think it could be handy 
> to have and would like to implement it, but I don't understand if it's 
> any different from other metadata like producer or location of a video.

The startOffsetTime is useful for controllers who want to display a 
controller with real times, e.g. like TiVo's DVR UI, even when the 
underlying media resource has some more or less arbitrary timeline.

e.g. if a TV station starts broadcasting on some Friday at 2pm, that would 
be its zero time for its timeline, but eight months later, a user joining 
that stream doesn't care that the stream is 21 megaseconds old -- they 
just want to see 14:20 as the time that corresponds to what was streaming 
at 2:20pm.

> > > However, knowing the date of a video is still useful, potentially 
> > > even for the streaming case, so we do want to expose the DateUTC 
> > > field from WebM. However, startOffsetTime is a bad name for it, 
> > > since it's not using the same unit as currentTime. I suggest 
> > > offsetDate, to go with offsetTime.
> > 
> > I don't mind renaming startOffsetTime if people think that would help. 
> > I don't think "offsetDate" is any clearer though.
> > 
> > How about "mediaTimelineOriginDate"?
> 
> Simply "originDate" or "startDate", perhaps?

Ok, I renamed it to startDate.

> It could also do with a good example. The spec says:
> 
> "If the media resource specifies an explicit start time and date, then 
> that time and date should be considered the zero point in the media 
> timeline; the timeline offset will be the time and date, exposed using 
> the startOffsetTime attribute."
> 
> I interpret this as a date at currentTime=0 in the spec's definition of 
> currentTime

Right.

> and currentTime=-initialTime (unless media fragments are used) in the 
> Opera/Firefox definition of currentTime.

Not sure what this means.

> However, there's a weird spec example which can lead one into thinking 
> otherwise:
> 
> "The startOffsetTime attribute would return a Date object with a time 
> corresponding to 2010-03-20 23:15:00 UTC. However, if a different user 
> agent connected five minutes later, it would (presumably) receive 
> fragments covering timestamps 2010-03-20 23:20:00 UTC to 2010-03-21 
> 00:05:00 UTC and 2010-02-12 14:25:00 UTC to 2010-02-12 14:35:00 UTC, and 
> would expose this with a media timeline starting at 0s and extending to 
> 3,300s (fifty five minutes)."
> 
> This seems like a rather atypical streaming scenario. It would be a lot 
> nicer if the single example of startOffsetTime was for the common 
> scenario where each client gets the same stream that thus has the same 
> timeline and the same startOffsetTime.

I've added another example and tried to clarify that one.

> > > Finally, what about initialTime? It can be set to a non-zero value 
> > > at two points in the spec:
> > > 
> > > "Establish the media timeline for the purposes of the current 
> > > playback position, the earliest possible position, and the initial 
> > > playback position, based on the media data."
> > > 
> > > "If either the media resource or the address of the current media 
> > > resource indicate a particular start time, then set the initial 
> > > playback position to that time and"
> > > 
> > > Does any format expose something like this in-band? I don't know of 
> > > any that do and how to implement this, so the only thing that 
> > > remains is exposing the start time of media fragments. This seems 
> > > rather useless to me, so unless someone has already implemented 
> > > initialTime and explain what it means, I suggest dropping it from 
> > > the spec.
> > 
> > The address of the current media resource can indicate a particular 
> > start time if you implement media fragments.
> 
> Yes, but why do we need to expose that in the DOM API, what is the use 
> case?

Allows controllers to trivially implement UI to jump back to where the 
stream started, while still showing the full seekable range.

> > > We discussed the concatenation of two clips and how to represent the 
> > > date. At least chained WebM and chained Ogg should be able to 
> > > represent this.
> > 
> > The spec requires ("must") that in the case of chained clips with 
> > discontinuous timelines, the first clip's timeline be extended to 
> > cover the others, and any data regarding the timeline in the 
> > subsequest clips is dropped.
> 
> So the second and subsequent clips of a chain have their timelines 
> normalized, but not the first?

Right.

> > > To reduce the possibility for confusion about what date is 
> > > represented and to allow the recording date to be preserved in 
> > > editing, how about exposing currentDate instead?
> > 
> > What's the use case?
> 
> The use case is "don't be confusing", so let me first try to summarize 
> what I think the spec says:
> 
> * currentTime need not start at 0, for streams it will typically 
> represent for how long the server has been serving a stream.

I don't really know what you mean by "start" here.

> * duration is not the duration, it is the last timestamp of a resource.

"duration", if it is not Infinity, is the last time that it would make 
sense to show on a seek bar / scrubber, possibly beyond the last seekable 
range. Whether it is a timestamp in the resource is hard to say.

> * startOffsetTime is the date at time 0, it's not an offset. It has 
> nothing to do with syncing live streams.

This is now startDate. Agreed that it has nothing to do with synchronising 
anything. I'm happy to say it's not an offset.

> * initialTime is the first timestamp of the stream or the start time of 
> a media fragment URL, if one is used.

Typically, yes.

> * For chained streams, the 2nd and subsequent clips have their timelines 
> normalized and appended to the first clips timeline.

Right.

On Tue, 13 Mar 2012, Philip Jägenstedt wrote:
> 
> "In the absence of an explicit timeline, the zero time on the media 
> timeline should correspond to the first frame of the media resource. For 
> static audio and video files this is generally trivial. For streaming 
> resources, if the user agent will be able to seek to an earlier point 
> than the first frame originally provided by the server, then the zero 
> time should correspond to the earliest seekable time of the media 
> resource; otherwise, it should correspond to the first frame received 
> from the server (the point in the media resource at which the user agent 
> began receiving the stream)."
> 
> There are multiple problems here, and I think it's responsible for some 
> of the confusion.
> 
> * What is an "explicit timeline"? For example, does an Ogg stream that 
> starts with a non-zero timestamp have an explicit timeline?

If there's a timestamp in the resource, then yes, it has an explicit 
timeline. That seems self-evident, but if you can think of a way that I 
could clarify this, I would be happy to do so.

An example of a video resource without an explicit timeline would be 
a multipart/x-replace JPEG stream. There, the time between the frames is 
determined by the server's transmission rate, and the data itself has no 
timing information.

> * Does "For streaming resources ..." apply only in the absence of an 
> explicit timeline, or in general? In other words, what's the scope of 
> "In the absence of an explicit timeline"?

I've updated the second sentence to explicitly state that it also only 
applies in the absence of a timeline.

> * Why does the spec differentiate between static and streaming resources 
> at all?

If you receive the entire file, there's no complication with respect to 
streaming to a point before the first rendered frame. The distinction is 
not intended to be normatively detectable, it's only intended to 
distinguish the easy case from the harder case. Again, if you think 
there's some way I could clarify that, please let me know.

> This is not a distinction Opera makes internally, the only "mode switch" 
> we have depends on whether or not a resource is seekable, which for HTTP 
> means support for byte-range requests. A static resource can be served 
> by a server without support for byte-range requests such that the size 
> and duration are known up front, and I certainly wouldn't call that 
> streaming.

If you can download the file in its entirety, then I would call that a 
static file. But I don't think that should be important for the spec.

> These definitions can be tweaked/clarified in one of two ways:
> 
> 1. currentTime always reflects the underlying timestamps, such that a 
> resource can start playing at a non-zero offset and seekable.start(0) 
> could be non-zero even for a fully seekable resource. This is what the 
> spec already says, modulo the "streaming resources" weirdness.
> 
> 2. Always normalize the timeline to start at 0 and end at duration.
> 
> I think that the BBC blog post is favoring option 2, and while that's 
> closest to our implementation I don't feel strongly about it. A benefit 
> of option 1 is that currentTime=300 represents the same thing on all 
> clients, which should solve the syncing problem without involving any 
> kinds of dates.

The spec definitely intends #1 if the format supports it. I don't think #2 
makes sense for many cases (e.g. broadcast TV, any case where you can 
seek to before the first rendered frame), and more importantly, if you 
connect to a stream and then later start discarding earlier data, you end 
up in #1 even if you started in #2 so I see no benefit to going out of our 
way to start in #2.

> Make it pedantically clear which of the above two options is correct, 
> preferably with a pretty figure of a timeline with all the values 
> clearly marked out.

I would be happy to add such a diagram, but I have no idea how to do it, 
given the bazillions of edge cases here.

If anyone wants to make such a diagram, I recommend doing it by writing 
code for this tool:

   http://software.hixie.ch/utilities/js/canvas/

...and then sending me the code. :-)

(Ideally, using little parameterised functions for any repeated bits, so 
it's really easy to adjust.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'