[whatwg] HTML5 video: frame accuracy / SMPTE
coenen.rob at gmail.com
Tue Feb 15 13:46:19 PST 2011
Rather than trying to sum up all use cases I think that the media asset
should be fully random accessible and frame accurate to cover any current
asset to go to any point in time.
That way a web developer (or implementers such as the guys of JWPlayer) can
come up with their own mechanisms for stuff such as "chapters" etc. I don't
believe that chapters should be part of the HTML5 spec.
the current spec, if implemented correctly, takes care of this AFAIK?
On Tue, Feb 15, 2011 at 9:10 PM, Kevin Marks <kevinmarks at gmail.com> wrote:
> Returning to this discussion, I think it is lacking in use cases.
> Consider the controllers we are used to - they tend to have frame step,
> chapter step and some kind of scrub bar.
> Frame stepping is used when you want to mark an accurate in or our point,
> or catch a still frame. This needs to be accurate, and it is always local.
> Chapter stepping means 'move me to the next meaningful break point in this
> media. There is a very natural structure for this in almost all professional
> media, and it is definitely worth getting this right. This is a long range
> jump, but it is likely to be a key frame or start of new file segment.
> Scrubbing is when you are dragging the bar back and forth to find a
> particular point. It is intermediate in resolution between the previous two,
> but it needs to be responsive to work - the lag between moving the bar and
> showing something. In many cases decoding only key frames in this state
> makes sense, as this is most responsive, and also likely to catch scene
> boundaries, which are commonly key frames anyway.
yep I agree, very important. Given the fact that the scrub-bar has a fixed
width (max the width of the screen, but usually its just equal to the width
of the video- or smaller- sayeg 640 pixels width) you could reason that it
is physically impossible to move the mouse more precise than 640 positions
left or right while scrubbing. That means you have a resolution of 640 steps
to scrub through the entire asset. My guess is that 99% of all assets will
have way more than 640 keyframes. It would be fantastics if we could somehow
preload 640 keyframes taken at every (asset duration)/640th position. That
way we could provide instant and realtime feedback even while streaming. The
user would already see where the scrub cursor is going to end up even before
the video streaming has started to sync.
I know that this is probably pretty hard to abstract to something than can
be abstracted as part of the HTML5 spec, but I have seen (and made myself-)
mockups in Flash that take this approach and it works pretty neat.
The degenerate case of scrubbing is 'fast-forwarding', where the stream is
> fetched faster than realtime, but again only keyframes are shown.
> Are we sure all of these use cases are represented by the options mentioned
> On Mon, Jan 24, 2011 at 12:49 PM, Robert O'Callahan <robert at ocallahan.org>wrote:
>> On Tue, Jan 25, 2011 at 9:34 AM, Philip Jägenstedt <philipj at opera.com
>> > On Mon, 24 Jan 2011 21:10:21 +0100, Robert O'Callahan <
>> > robert at ocallahan.org> wrote:
>> >> Interesting. It doesn't in Firefox; script always sees a snapshot of a
>> >> consistent state until it returns to the event loop or does something
>> >> modal
>> >> (although audio, and soon video, will continue to play while script
>> >> I'm not sure if the spec should require that ... overall our APIs try
>> >> pretty
>> >> hard not to expose races to JS.
>> > How does that work? Do you take a copy of all properties that could
>> > possibly change during script execution, including ones that create a
>> > object, like buffered and seekable?
>> All script-accessible state exists on the main thread (the thread that
>> the event loop), and is updated via asynchronous messages from decoder and
>> playback threads as necessary. 'buffered' is always in sync since data
>> arrival and eviction from the media data cache happen on the main thread.
>> (That cache can be read from other threads though.)
>> If you instead only make a copy on the first read, isn't it still possible
>> > to get an inconsistent state, e.g. where currentTime isn't in the
>> > ranges?
>> No, this wouldn't happen, although it might be possible for currentTime to
>> be outside the buffered ranges for other reasons.
>> How about HTMLImageElement.complete, which the spec explicitly says can
>> > change during script execution?
>> Interesting, I didn't know about that.
>> In any case, it sounds like either HTMLMediaElement is underspecified or
>> > of us has interpreted in incorrectly, some interop on this point would
>> > nice.
>> Maybe. If the spec is clarified to allow races when accessing media
>> state, I guess it won't be the end of the world, although I predict
>> difficulties. But that's always an easy prediction! :-)
>> The biggest use case is clicking a seek bar and ending up somewhere close
>> > enough, but yes, being able to do fast relative seeking is a nice bonus.
>> > Maybe we should do what many media frameworks do and use a "reference"
>> > parameter, defining what the seek is relative to. Usually you can seek
>> > relative to the beginning, end and current position, but perhaps we
>> > reduce that to just "absolute" and "relative". That's a bit less magic
>> > inspecting currentTime when the method is called.
>> > So far:
>> > seek(t, ref, how);
>> > ref is "absolute" (default) or "relative"
>> > how is "accurate" (default) or "fast"
>> > (or numeric enums, if that's what DOM interfaces usually have)
>> That works.
>> "Now the Bereans were of more noble character than the Thessalonians, for
>> they received the message with great eagerness and examined the Scriptures
>> every day to see if what Paul said was true." [Acts 17:11]
More information about the whatwg