[whatwg] Thoughts on the media stream bootstrap mechanism

Thu Mar 24 09:03:53 PDT 2011

Hi Rich,

On Mar 15, 2011, at 16:24 , Rich Tibbett wrote:
> Secondly, getUserMedia is restricted to only handle audio/video
> streams. In the original proposal there was potential for us to
> connect and disconnect other device classes, such as USB or RS232
> device types.

I'm mostly interested in audio/video, but even considering only those I've been thinking along similar lines that the current API lacks sufficient hooks for extensibility. Most notably, some devices might expose ways of controlling them and exposing those on a GeneratedStream seems clunky.

> The IDL is as follows:
> 
> [NoInterfaceObject]
> interface Device {
>   const unsigned short WAITING = 0;
>   const unsigned short CONNECTED = 1;
>   const unsigned short DISCONNECTED = 2;
>   const unsigned short ERROR = 3;
>   readonly attribute unsigned short readyState;
> 
>   // event handler attributes
>            attribute Function onconnect;
>            attribute Function ondisconnect;
>            attribute Function onerror;
> 
>   readonly attribute any data;
> 
>   void disconnect();
> }
> 
> // Specific Device Classes with independent constructors
> 
> [Constructor(in DOMString options)]
> interface UserMedia : Device {}
> 
> Here's a quick example for obtaining user media:
> 
> var m = new UserMedia('audio, video');
> m.onconnect = function( evt ) {
>   var ... = evt.target.data; // ... is a GeneratedStream object in a
> UserMedia context
> }

In the examples below I'll stick to the callback-based method of obtaining the object since that minimises the delta with what's currently in the spec, but I don't have a strong opinion between that and your proposal. So for clarity, here's how you get a Device object:

var device;
navigator.getUserMedia("whatever", function (d) { device = d; });

Once you have it, there are a couple improvements that can be made over GeneratedStream.

• It's an EventTarget. This is primarily for the purpose of listening to devicemotion and deviceorientation events (they currently only target window, but that's not a big deal to change). This could work with GeneratedStream, but it seems more logical to have events for "I moved the camera" (and possibly others such as "I changed the focal length" or "autofocus acquired at 2.77m") and for "stream paused" on different objects.

device.addEventListener("deviceorientation", function (ev) {
  // ... move that AR code around
}, true);

• It provides an extension point for device control. Say you're streaming from a camera and you want to take a picture. The chances are high that the camera can take a much better picture than the frame you can grab off its view-finding video stream.

// device is a CameraDevice
device.captureStill(function (file) {
  // ... got my picture
});

We might not be there yet and would probably want to wait a little, but there's plenty more that can be added there.

// silly examples
device.zoom = 2;
device.flash = true;

Again, these could go on GeneratedStream but it seems too conflated. Given that a device exposes a stream, the coding cost is a minimal switch to:

video.src = device.stream;

Additionally, I wonder if it wouldn't be useful to make it possible for the getUserMedia callback to return an array of devices in one go. If you're making a 3D movie (or just 3D videoconferencing) you probably want multiple cameras returned at once (alternatively, it could be a single device exposing two streams). Likewise if you have a sound setup more advanced than just the one mike. Of course, the user could effect multiple requests and grant access to each device one by one, but UI-wise, it's probably a lot simpler to allow her to do it all at once. Especially considering the following:

  1. User wants to add a camera, clicks a button that calls getUserMedia()
  2. Infobar of some kind shows, user picks camera source, checks [always allow]
  3. User wants to add second camera, clicks the same button: same camera is picked
  4. Failure

Multiple simultaneous inputs isn't science fiction nor is it limited to professional contexts. I could easily want to use both back and front cameras on my phone, one with which to film what's going on around me in a documentary, the other to insert a small view of myself as I comment on what I'm seeing. 3D home videos are probably not that far around the corner (yes, it scares me too). It's likely that laptops will ship with arrays of mikes in order to better figure out where you're talking from (spatially) and eliminate all other sources — accessing would be sweet.

I don't much care about the syntax, but I guess we could be looking at something like

navigator.getUserMedia("video multiple", function (devices) {
  // ... show each different view
});

And I guess that's enough braindump for today :)

-- 
Robin Berjon - http://berjon.com/