[whatwg] HTML5 video: frame accuracy / SMPTE

Gregory Maxwell gmaxwell at gmail.com
Fri Jan 21 14:48:45 PST 2011

On Fri, Jan 21, 2011 at 5:25 PM, Silvia Pfeiffer
<silviapfeiffer1 at gmail.com> wrote:
>> The language I'd prefer is "fast".  Fast may be exact, or it might
>> just be to the nearest keyframe, or something in between. It might
>> just start you over at the beginning of the stream.
> That is putting a restriction on how the browser has to seek -
> something I'd rather leave to the browser in the general case where

No more than 'best' is, I suppose. I think you missed the argument I'm
making: I'm saying that it's perfectly reasonable to assume that "Best
effort" means "exact" in any seekable stream, because exact is best
and best is always possible.  This is the same kind of reasoning
sequence that allows you to conclude that "fast" requires the browser
to use the fastest.

>> One question about inexact seeking is what should the client do when
>> the current playtime is closer to the requested time than what the
>> inexact seek would provide?
> In the case of "fastest", the browser must then not do a seek. In the
> case of "don't care", it's up to the browser if it does the seek or
> not.

That was my thinking, but I find the consistency point raised by Glenn
to be concerning.

>>> * KEYFRAME is keyframe-accurate seeking, so to the previous keyframe
>> What does this mean when a seekable stream doesn't have interior
>> keyframes? Should the client always seek to the beginning? Why is this
>> valuable over a "fast" option?
> Where no keyframes are available, this seek option simply doesn't do
> anything, since obviously there are not keyframes. The point is that
> where this concept exists and people want to take advantage of it,
> this seek should be possible.

I really feel that "keyframe" is far deeper into the codec internals
than should be or currently is provided by rest of the video API.
I've frequently seen content authors and application developers make
incorrect assumptions about key-frames: That they always indicate
scene changes, that they always occur at an exact interval, that two
differently encoded files will have the keyframes in the same places.
Etc.  That these things are sometimes true feeds the mistaken
impressions.  The frametypes inside a codec are really deep internals
that we ought not encourage people to mess with directly.

It seems surprising to me that we'd want to expose something so deeply
internal while the API fails to expose things like chapters and other
metadata which can actually be used to reliably map times to
meaningful high level information about the video.

More information about the whatwg mailing list