[whatwg] Extending HTML 5 video for adaptive streaming

Aaron Colwell acolwell at google.com
Thu Jun 30 09:59:03 PDT 2011


Hi,

I've been working on an adaptive streaming prototype that uses JavaScript to
fetch chunks of media and feeds them to the video tag for decoding. The idea
is to let the adaptation algorithm and CDN interactions happen in JavaScript
so that they can evolve without the need for browser changes. I'm looking
for some guidance about the preferred method for adding this type of
functionality. I'm new to this process so please bear with me.

My initial implementation is built around WebM, but I believe this could
work for Ogg & MP4 as well. The basic idea is to initialize the video tag
with stream initialization data (ie WebM info & tracks elements) via the
<video> src attribute and then send media chunks (ie WebM clusters) to the
tag via a new appendData() method on <video>. Here is a simple example of
what I'm talking about.

  <video id="v" autoplay> </video>
  <script>
    function needMoreData(e) {
      e.target.appendData(getNextCluster());
    }

    function onSeeking(e) {
      var video = e.target;
      video.appendData(findClusterForTime(video.currentTime));
    }

    var video = document.getElementById('v');

    video.addEventListener('loadstart', needMoreData);
    video.addEventListener('stalled', needMoreData);
    video.addEventListener('seeking', onSeeking);

    video.src = URL.createObjectURL(createStreamInitBlob());
  </script>

AppendData() expects to recieve a Uint8Array that contains WebM cluster
elements. The first cluster passed to appendData() initializes the starting
playback position. Also after a seeking event fires the first appendData()
updates the current position to the seek point.

I've also been looking at the WebRTC MediaStream API and was wondering if it
makes more sense to create an object similar to the LocalMediaStream object.
This has the benefits of unifying how media streams are handled independent
of whether they come from a camera or a JavaScript based streaming
algorithm. This could also enable sending the media stream through a
Peer-to-peer connection instead of only allowing a camera as a source. Here
is an example of the type of object I'm talking about.

interface GeneratedMediaStream : MediaStream {
  void init(in DOMString type, in UInt8Array init_data);
  void appendData(in DOMString trackId, in UInt8Array data);
  void endOfStream();

  readonly attribute MultipleTrackList audioTracks;
  readonly attribute ExclusiveTrackList videoTracks;
};

type - identifies the type of stream we are generating(ie
video/x-webm-cluster-stream or video/ogg-page-stream)
init_data - Provides initialization data that indicates the number of
tracks, codec configs, etc. (ie WebM info & tracks elements or Ogg header
pages)
trackId - Indicates what track the data is for. If this is an empty string
than multiplexed data is being passed in. If not empty trackId matches an id
of a track in the TrackList objects.
data - media data chunk (ie WebM cluster or Ogg page). Data is expected to
have monotonically increasing timestamps, no gaps, etc.

Here are my questions:
- Is there a preference for appendData() vs new MediaStream object?
- If the MediaStream object is preferred, should this be constructed through
Navigator.getUserMedia()? I'm unclear about what the criteria is for adding
this to Navigator vs allowing direct object construction.
- Are there existing efforts along these lines? If so, please point me to
them.

Thanks for your help,

Aaron


More information about the whatwg mailing list