[whatwg] Proposal for a MediaSource API that allows sending media data to a HTMLMediaElement

Thu Jul 14 09:08:41 PDT 2011

On Wed, Jul 13, 2011 at 8:00 PM, Robert O'Callahan <robert at ocallahan.org>wrote:

> On Thu, Jul 14, 2011 at 4:35 AM, Aaron Colwell <acolwell at google.com>wrote:
>
>> I am open to suggestions. My intent was that the browser would not attempt
>> to cache any data passed into append(). It would just demux the buffers that
>> are sent in. When a seek is requested, it flushes whatever it has and waits
>> for more data from append().  If the web application wants to do caching it
>> can use the WebStorage or File APIs. If the browser's media engine needs a
>> certain amount of "preroll" data before it starts playback it can signal
>> this explicitly through new attributes or just use HAVE_FUTURE_DATA
>> & HAVE_ENOUGH_DATA readyStates to signal when it has enough.
>
>
> OK, I sorta get the idea. I think you're defining a new interface to the
> media processing pipeline that integrates with the demuxer and codecs at a
> different level to regular media resource loading. (For example, all the
> browser's built-in logic for seeking and buffering would have to be disabled
> and/or bypassed.)

Yes.

> As such, it would have to be carefully specified, potentially in a
> container- or codec-dependent way, unlike APIs like Blobs which work "just
> like" regular media resource loading and can thus work with any
> container/codec.
>

My hope is that the data passed to append will basically look like the "live
streaming" form of containers like Ogg & WebM so this isn't totally foreign
to the existing browser code. We'd probably have to spec the level of
support for Ogg chaining and multiple WebM segments but I don't think that
should be too bad. Seeking is where the trickiness happens and I was just
planning on making it look like a new "live" stream whose starting timestamp
indicates the actual point seeked to.

I was tempted to create an API that just passed in compressed video/audio
frames and made JavaScript do all of the demuxing, but I thought people
might find that too radical.

>
> I'm not sure what the best way to do this is, to be honest. It comes down
> to the use-cases. If you want to experiment with different seeking
> strategies, can't you just do that in Chrome itself? If you want scriptable
> adaptive streaming (or even if you don't) then I think we want APIs for
> seamless transitioning along a sequence of media resources, or between
> resources loaded in parallel.
>
>
I think the best course of action is for me to get my prototype in a state
where others can play with it and I can demonstrate some of the uses that
I'm trying to enable. I think that will make this a little more concrete.
 I'll keep this list posted on my progress.

Thanks for your help,
Aaron