[whatwg] Timestamp from video source in order to sync (e.g. expose OGG timestamp to javascript)

Ian Hickson ian at hixie.ch
Tue Aug 17 18:13:21 PDT 2010


I've defined an explicit concept of a "media timeline" and defined how you 
use it, which should make a lot of <video>-related issues better.

On Mon, 17 May 2010, Odin Omdal Hørthe wrote:
>
> I stream conferences using Ogg Theora+Vorbis using Icecast2. I have 
> built a site that shows the video and then automatically shows the 
> slides (as PNG files) as well. I use orbited (COMET) to have the server 
> PUSH my «next» presses on my keyboard.
> 
> The problem is that icecast does heavy buffering, and also the client, 
> so that while I switch the slides, the browser will go from slide 3 to 4 
> WAY too early (from 10 second to 1 minute).
> 
> If I could get the timestamp OR time-since-started-sending/recording 
> from the ogg file in javascript, I'd be able to sync everything.
> 
> There are multiple way to sync this, may even an stream with the 
> slide-data INSIDE the ogg file, however, AFAIK there's also no way of 
> getting out such arbitrary streams.

In theory the new timed track API should let you do this. However, I don't 
really understand why you can't do this in the first place. The 
currentTime shouldn't be changing while things are buffering, so why would 
you be going out of sync?


On Mon, 17 May 2010, David Singer wrote:
> 
> Buffering should not make any difference to how far into a stream a time 
> means.  If the transition from slide 3 to slide 4 happens at 10 minutes 
> in, then as the presentation time ticks from 9:59 to 10:00 you should 
> flip the slide.  It doesn't matter how much data is in any buffers, does 
> it?

Indeed.


On Mon, 17 May 2010, Nikita Eelen wrote:
>
> I think he means something similar to what QuickTime broadcaster and 
> quicktime streaming server does with a delay on a live stream or wowza 
> media server with flash media encoder when using h.264, unless I am 
> misunderstanding something. Is that correct Odin? Not sure how ice cast 
> deals with it but I bet it's a similar issue,

On Tue, 18 May 2010, Odin Omdal Hørthe wrote:
> 
> Yes, I initially used Darwin Streaming Server, but found Icecast2 much 
> better for *my use*. So I use it in the same way. I'm having Icecast 
> buffer 1MB worth of data so that it can burst all that to the client 
> (the browser in this case) so that its own buffering can go faster. So 
> even there we're quite far behind.
> 
> And also, the browsers often stops up a few seconds, and buffers a bit 
> more, and then continue playing (although they have buffered more than a 
> few seconds ahead already!), so then they are drifting even further away 
> from real time.

I would understand it if you couldn't quite determine what the start of 
the stream was, but I don't understand why you wouldn't understand what 
the current time was relative to the start. That is, I understand a fixed 
offset, but I don't see why it would get worse over time. Are you using 
ontimeupdate and currentTime to track the video? Or are you using 
something like setTimeout()?


> However, I think that it's rather hard to find out what the spec means. 
> Because *earliest POSSIBLE*. What is meant by possible? With live 
> streaming it is not possible to go further back in the stream. What do 
> you think? What is meant by this? If it does not help me, then adding a 
> field for getting the _real_ time code data from the video would be very 
> usable.

The "earliest possible position" is just the earliest position in the 
stream or resource that the user agent can ever obtain again.


> It's talked about in this example: 
> <http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#dom-media-starttime>
> 
> > For example, if two clips have been concatenated into one video file, 
> > but the video format exposes the original times for the two clips, the 
> > video data might expose a timeline that goes, say, 00:15..00:29 and 
> > then 00:05..00:38. However, the user agent would not expose those 
> > times; it would instead expose the times as 00:15..00:29 and 
> > 00:29..01:02, as a single video.
> 
> That's well and good, but it would be nice to get the actual time code 
> data for live streaming and these syncing uses if startTime is not the 
> earliest time that exists.
> 
> Justin Dolske's idea looks rather nice:
> > This seems like a somewhat unfortunate thing for the spec, I bet 
> > everyone's going to get it wrong because it won't be common. :( I 
> > can't help but wonder if it would be better to have a startTimeOffset 
> > property, so that .currentTime et al are all still have a timeline 
> > starting from 0, and if you want the "real" time you'd use 
> > .currentTime + .startTimeOffset.

I've added a startTimeOffset attribute that returns a Date representing 
the time corresponding to zero on the media element's timeline (as used by 
currentTime and startTime).

What should we do with video files that have multiple overlapping times? 
Just use the first timeline in the file/stream? I've done that for now.


> > I'd also suspect we'll want the default video controls to normalize 
> > everything to 0 (.currentTime - .startTime), since it would be really 
> > confusing otherwise.
> 
> from <https://bugzilla.mozilla.org/show_bug.cgi?id=498253#c3>

On Tue, 18 May 2010, Robert O'Callahan wrote:
> 
> That's exactly what I've advocated before. I lost the argument, but I 
> forget why, probably because I didn't understand the reasons.

Not sure I follow this bit.


On Tue, 18 May 2010, Silvia Pfeiffer wrote:
> 
> To be honest, it doesn't make much sense to display the "wrong" time in 
> a player. If a video stream starts at 10:30am and goes for 30 min, then 
> a person joining the stream 10 min in should see a time of 10min - or 
> better even 10:40am - which is in sync with what others see that joined 
> at the start. It would be rather confusing if the same position in a 
> video would be linked by one person as "at offset 10min" while another 
> would say "at offset 0min". And since the W3C Media Fragments WG is 
> defining temporal addressing, such diverging pointers will even end up 
> in a URL and how should that be interpreted then?

I've tried, using "should"-level requirements, to require that. (I didn't 
use "must" requirements because I don't know how to make it more precise 
without defining things explicitly in terms of specific video formats.)


On Mon, 24 May 2010, Robert O'Callahan wrote:
> 
> Here's how I think it should work:
> -- currentTime (and related times, such as times in TimeRanges) range from 0
> to 'duration'

It's a little more complicated than that, because the UA is allowed to 
discard old buffered data even if it might never get it back again (as, 
e.g., a DVR might in "live TV" mode), in which case currentTime ranges 
from some non-zero number to duration.

Also, duration can be +Infinity.

I've tried to define in more detail exactly what "zero" on the "current 
playback position" timeline means:

   http://www.whatwg.org/specs/web-apps/current-work/complete.html#defineTimeline

Please let me know if that's not what you meant or wanted.


> -- media resources are allowed to have a non-zero "initial playback time".
> This is what currentTime should be set to on media load. We could create a
> new DOM attribute to expose this.

Done (initialTime).


> -- media resources are allowed to have a "real time offset". This is an 
> optional date+time (in UTC) that corresponds to currentTime=0, exposed 
> as a DOM attribute. Players would be encouraged to use this to display 
> real times, when it's present.

Done, though I haven't included anything about encouraging people to use 
it... Did you mean for user agents or for scripted players? Or both?


> This would be similar in power to what the spec already has. In your 
> example you could either let currentTime=0 be the start of the stream 
> that the user's loading, and use the "real time offset" to get the 
> correct time displayed, or you could let 0 be the real "start", and set 
> the initial playback time to match where the user joined. However, I 
> think describing things the way I just did is simpler and avoids 
> weirdness like the "start time" changing dynamically. It also preserves 
> the invariant that currentTime ranges from 0 to 'duration', which I 
> think players will come to depend on if the cases where it's not true 
> are rare.

I'm not sure what currentTime=0 would mean in Silvia's example with this.


On Mon, 24 May 2010, Philip Jägenstedt wrote:
> 
> What concretely should we change? Should we drop startTime, or redefine 
> it?

startTime was just the earliest time you can seek... which means it's 
redundant with seekable.start(0), I guess, or currentTime if you can't 
seek at all. I removed it. Let me know if you have added support and think 
we should keep it, I'm happy to put it back in. It's primary purpose was 
to help people determine what part of the timeline to draw the seek bar 
for (there's no point showing old times you can never seek to), but 
seekable.start(0) does that too, albeit with slightly more work.


On Mon, 24 May 2010, Robert O'Callahan wrote:
>
> So I would change:
> -- get rid of startTime and the concept of "earliest possible position",
> plus the related dispatching of timeupdate events

I think we still need this as a concept, but I agree we can drop 
startTime.


> -- create a new readonly DOM attribute, say call it "initialTime" that
> returns the default initial playback position for the media resource

Done.


> -- during media resource loading, when metadata loads set the current
> playback position to initialTime

Done.


> -- note that currentTime is always between 0 and 'duration' (if duration is
> known)

That was already the case. It might be further limited, though, to a 
range within that range, if there is an explicit timeline in the media 
resource (rare).


> > Is it necessary to have the offset as an absolute date, or could that 
> > probably odd case be handled in other ways? I can't really see a 
> > browser UI making use of it, so I'd be happy to put it in a data-* 
> > attribute or using microdata.
> 
> The "real time offset" is a property of the media resource (although I 
> suppose we could have it settable via a content attribute as well) so it 
> would need to be supported by the browser as an API on media elements. 
> The question is whether there's enough demand to justify it. I don't 
> know how widely supported this data is in media resource formats; Ogg 
> Skeleton supports it, but I don't know about others.

I have exposed this.


On Mon, 24 May 2010, Philip Jägenstedt wrote:
> 
> So from this I gather that either:
> 
> 1. initialTime is always 0
> 
> or
> 
> 2. duration is not the duration of resource, but the time at the end.
> 
> This seems to be what is already in the spec. Instead of guessing what
> everyone means, here's what I'd want:
> 
> 1. let currentTime always start at 0, regardless of what the timestamps or
> other metadata of the media resource says.
> 
> 2. let currentTime always end at duration.
>
> 3. expose an offset from 0 in startTime or a renamed attribute for cases like
> live streaming so that the client can e.g. sync slides.

I haven't done this, because it would mean requiring support for negative 
times, which seems like it would be a huge source of bugs. (Consider, 
e.g., a streaming server that starts sending you data half-way through the 
resource, but lets you seek back to the start.)


> The difference from what the spec says is that the concept of "earliest
> possible position" is dropped.

I don't think we can do that, unless we accept that in some cases the 
currentTime might _decrease_ even though playback is going _forward_, 
which seems like a very bad idea.


> > > Is it necessary to have the offset as an absolute date, or could 
> > > that probably odd case be handled in other ways? I can't really see 
> > > a browser UI making use of it, so I'd be happy to put it in a data-* 
> > > attribute or using microdata.
> > 
> > The "real time offset" is a property of the media resource (although I 
> > suppose we could have it settable via a content attribute as well) so 
> > it would need to be supported by the browser as an API on media 
> > elements. The question is whether there's enough demand to justify it. 
> > I don't know how widely supported this data is in media resource 
> > formats; Ogg Skeleton supports it, but I don't know about others.
> 
> I don't have a strong opinion, but would want to see a use case for it.

Imagine connecting to a streaming live TV service. You'd presumably want a 
DVR-like seek bar that gives you the actual time corresponding to the 
data, not the time relative to when you connected or when they started 
streaming.


On Mon, 24 May 2010, Robert O'Callahan wrote:
> 
> I think the current spec allows you to seek backwards from the starting 
> point. So would my proposal. Would yours? Would you allow 'seekable' to 
> contain negative times? I think it's slightly simpler to allow 
> currentTime to start at a non-zero position than to allow negative times 

I agree.


On Mon, 24 May 2010, Philip Jägenstedt wrote:
> 
> I think we both agree but aren't understanding each other very well, or I'm
> not thinking very clearly. People will write players assuming that currentTime
> starts at 0 and ends at duration. If this is not the case they will break, so
> an API which makes this not be the case in very few cases isn't very nice.

Agreed. I've made the API clearly say that "duration" is the time at the 
end, even in the case where the start is not actually zero, to sidestep 
this issue somewhat. (The start will almost always be zero, so the 
slightly misleading name seems like a non-issue.)


On Mon, 24 May 2010, David Singer wrote:
>
> I think it rather important that the format define "where you are" in 
> time, precisely so that temporal fragments, or syncing with other 
> material, can work.
> 
> For most video-on-demand, the program starts at zero and runs to its 
> duration.  But for 'streaming', knowing 'where you are' in a stream 
> depends on a lot of things.  The 3G HTTP streaming solution explicitly 
> anchors the timeline, so that two players playing the same program at 
> the same point in it will see the same time, no matter when they tuned 
> it.

Hopefully the spec as written now unambiguously makes use of this.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the whatwg mailing list