[whatwg] PeerConnection, MediaStream, getUserMedia(), and other feedback

Wed Jul 13 15:39:40 PDT 2011

In response to off-list feedback, I've renamed StreamTrack to 
MediaStreamTrack to be clearer about its relationship to the other 
interfaces.

On Wed, 1 Jun 2011, Tommy Widenflycht (á~[~Oá~Z®á~[~Xá~[~Xá~Z¤) wrote:
> 
> We are having a bit of discussion regarding the correct behaviour when 
> mandatory arguments are undefined, see this webkit bug for history: 
> https://bugs.webkit.org/show_bug.cgi?id=60622
> 
> Could we have some clarification for the below cases, please: [...]

Hopefully Aryeh and Cameron have sufficiently clarified this; please let 
me know if not.

On Wed, 8 Jun 2011, Per-Erik Brodin wrote:
> 
> The TrackList feature seems to be a good way to control the different 
> components of a Stream. Although it is said that tracks provide a way to 
> temporarily disable a local camera, due to the nature of the 
> ExclusiveTrackList it is still not possible to disable video altogether, 
> i.e. to 'pull down the curtain' in a video conference. I noticed that 
> there is a bug filed on this issue but I do not think the proposed 
> solution there is quite right. There is a state in which no tracks are 
> selected in an ExclusiveTrackList, when the selected index returned is 
> -1. A quick fix would be to allow also setting the active track to -1 in 
> order to deselect all the other tracks.

This is fixed now, hopefully. Let me know if the fix is not sufficient.

(I replaces the videoTracks and audioTracks lists with a single tracks 
list in which you can enable and disable individual tracks.)

> I think a note would be appropriate that although the label on a 
> GeneratedStream is guaranteed to be unique for the conceptual stream, 
> there are situations where one ends up with multiple Stream objects with 
> the same label. For example, if the remote peer adds a stream, then 
> removes it, then adds the same stream again, you would end up with two 
> Stream objects with the same label if a reference to the removed Stream 
> is kept. Also, if the remote peer takes a stream that it receives and 
> sends it back you will end up with a Stream object that has the same 
> label as a local GeneratedStream object.

Done.

> We prefer having a StreamRecorder that you have to stop in order get the
> recorded data (like the previous one, but with asynchronous Blob retrieval)
> and we do not understand the use cases for the current proposal where
> recording continues until the recorder is garbage collected (or the Stream
> ends) and you always get the data from the beginning of the recording. This
> also has to be tied to application quota in some way.

The current design is just the result of needing to define what happens 
when you call getRecordedData() twice in a row. Could you elaborate on 
what API you think we should have?

> The recording example does not seem correct either, it never calls 
> record() and then it calls getRecordedData() directly on the 
> GeneratedStream object.

Fixed.

> Instead of blob: we would like to use stream: for the Stream URLs so 
> that we very early on in the media engine selection can use the protocol 
> scheme to determine how the URL will be handled. Blobs are typically 
> handled in the same way as other media playback. The definition of 
> stream: could be the same as for blob:.

Why can't the UA know which blob: URLs point to streams and which point to 
blobs?

> In addStream(), the readyState of the Stream is not checked to see if it is
> ENDED, in which case adding a stream should fail (perhaps throwing a TypeError
> exception like when passing null).

The problem is that if we do that there'd be a race condition: what 
happens if the stream is ended between the time the script tests whether 
the stream is ended or not and the time the stream is passed to the 
object? I would rather that not be unreliable.

Actually, the spec doesn't currently say anything happens when a stream 
that is being transmitted just ends, either. I guess I should spec that...

...ok, now the spec is clear that an ended stream transmits blackness and 
silence. Same with if some tracks are disabled. (Blackness only if there's 
a video track; silence only if there's an audio track.)

> When a received Stream is removed its readyState is not set to ENDED 
> (and no 'ended' event is dispatched).

I've clarified this so that it is clear that the state change and event do 
happen.

> PeerConnection is an EventTarget but it still uses a callback for the 
> signaling messages and this mixture of events and callbacks is a bit 
> awkward in my opinion. If you would like to change the function that 
> handles signaling messages after calling the constructor you would have 
> to wrap a function call inside the callback to the actual signal 
> handling function, instead of just (re-)setting an onsignal (or 
> whatever) attribute listener (the event could reuse the MessageEvent 
> interface).

When would you change the callback?

My concern with making the callback an event handler is that it leads to a 
set of poor failure modes and complications in the API:

 - It's easy to not register a callback, which makes no sense. There's 
   literally never a use for create a PeerConnection without a signaling 
   channel, as far as I can tell, so making it easier to create one 
   without a callback than with seems like a bad design.

 - It's easy to register multiple callbacks. This equally makes no sense, 
   and would likely only ever be the source of bugs.

 - It makes getting the data more complicated. Instead of passing the 
   callback the string to send, we end up passing an object which has on 
   it one attribute that contains the string.

> Perhaps signalingMessage() could be renamed to 
> add/handle/process/SignalingMessage() or similar to better indicate that 
> it is used to input signaling messages received from the other peer.

processSignalingMessage() works, I guess. I'm not a huge fan of overly 
long names, but Anant suggested making it clearer too, so ok. Done.

> There is a potential problem in the exchange of SDPs in that glare 
> conditions can occur if both peers add streams simultaneously, in which 
> case there will be two different outstanding offers that none of the 
> peers are allowed to respond to according to the SDP offer-answer model. 
> Instead of using one SDP session for all media as the specification 
> suggests, we are handling the offer-answer for each stream separately to 
> avoid such conditions.

Why isn't this handled by the ICE role conflict processing rules? It seems 
like simultaneous ICE restarts would be trivially resolvable by just 
following the rules in the ICE spec. Am I missing something?

On Mon, 13 Jun 2011, Tommy Widenflycht (á~[~Oá~Z®á~[~Xá~[~Xá~Z¤) wrote:
>
> as WebIDL has been extended with a "is nullable operator" ( 
> http://www.w3.org/TR/WebIDL/#idl-nullable-type) I wonder if the 
> MediaStreams draft can make use of it, please?
> 
> Especially navigator.getUserMedia(options, successCallback [, 
> errorCallback ] ) and PeerConnection(configuration, signalingCallback).

This is done now (thanks to heycam).

On Tue, 14 Jun 2011, Per-Erik Brodin wrote:
> 
> Maybe the null-check in step 3 under "When the PeerConnection() 
> constructor is invoked .." should not be there anymore since 
> signalingCallback is not nullable.

Fixed.

On Wed, 22 Jun 2011, Arun Ranganathan wrote:
> > 
> > Summing up, the problem with the current implementation of Blobs is 
> > that once a URI has been generated for them, by design changes are no 
> > longer reflected in the object URL. In a streaming scenario, this is 
> > not what is needed, rather a long-living Blob that can be appended is 
> > needed and 'streamed' to other parts of the browser, e.g. the<video> 
> > or<audio> element.
> >
> > The original use case was:  make an application which will download 
> > media files from a server and cache them locally, as well as playing 
> > them without making the user wait for the entire file to be 
> > downloaded, converted to a blob, then saved and played, however such 
> > an API covers many other use cases such as on-the-fly on-device 
> > decryption of streamed media content (ie live streams either without 
> > end or static large files that to download completely would be a waste 
> > when only the first couple of seconds need to be buffered and 
> > decrypted before playback can begin)
> > 
> > Some suggestions were to modify or create a new type of Blob, the 
> > StreamingBlob which can be changed without its object url changing and 
> > appended to as new data is downloaded or decoded, and using a similar 
> > process to how large files may start to be decoded/played by a browser 
> > before they are fully downloaded. Other suggestions suggested using a 
> > pull API on the Blob so browsers can request for new data 
> > asynchronously, such as in 
> > <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-January/029998.html>
> > 
> > Some problems however that a browser may face is what to do with urls 
> > which are opened twice, and whether the object url should start from 
> > the beginning (which would be needed for decoding encrypted, on-demand 
> > audio) or start from the end (similar to `tail`, for live streaming 
> > events that need decryption, etc.).

I haven't added support for this in the spec, but based on feedback from 
roc recently I renamed Stream to MediaStream to allow Stream to be used 
for precisely the purpose you describe. That would be something to add to 
the File API spec(s), though.

> I actually think moving to a streaming mode for file reads in general is 
> desirable, but I'm not entirely sure extending Blobs is the way to go 
> for *that* use case, which honestly is the main use case I'm interested 
> in.  We may improve upon ideas after this API goes to Last Call for 
> streaming file reads; hopefully we'll do a better job than other 
> non-JavaScript APIs out there :) [1].  Blob objects as they are 
> currently specified live "in memory" and represent "in memory" File 
> objects as well.  A change to the underlying file isn't captured in the 
> Blob snapshot; moreover, if the file moves or is no longer present at 
> time of read, an error event is fired while processing a read operation.  
> The object URL may be dereferenced, but will result in a 404.

Makes sense.

> The Streaming API explored by WHATWG uses the Object URL scheme for 
> videoconferencing use cases [2], and so the scheme itself is suitable 
> for "resources" that are more dynamic than memory-resident Blob objects. 
> Segment-plays/segment dereferencing in general can be handled through 
> media fragments; the scheme can naturally be accompanied by fragment 
> identifiers.

Indeed.

On Thu, 30 Jun 2011, Aaron Colwell wrote:
> 
> I've been working on an adaptive streaming prototype that uses 
> JavaScript to fetch chunks of media and feeds them to the video tag for 
> decoding. The idea is to let the adaptation algorithm and CDN 
> interactions happen in JavaScript so that they can evolve without the 
> need for browser changes. I'm looking for some guidance about the 
> preferred method for adding this type of functionality. I'm new to this 
> process so please bear with me.

Currently this isn't supported.

Can you elaborate on why you'd rather do this by hand instead of using the 
browsers' streaming features?

> I've also been looking at the WebRTC MediaStream API and was wondering 
> if it makes more sense to create an object similar to the 
> LocalMediaStream object. This has the benefits of unifying how media 
> streams are handled independent of whether they come from a camera or a 
> JavaScript based streaming algorithm. This could also enable sending the 
> media stream through a Peer-to-peer connection instead of only allowing 
> a camera as a source. Here is an example of the type of object I'm 
> talking about.
> 
> interface GeneratedMediaStream : MediaStream {
>   void init(in DOMString type, in UInt8Array init_data);
>   void appendData(in DOMString trackId, in UInt8Array data);
>   void endOfStream();
> 
>   readonly attribute MultipleTrackList audioTracks;
>   readonly attribute ExclusiveTrackList videoTracks;
> };

I imagine we'll support something like this eventually, probably by having 
a new interface for author-generated streams, probably something like what 
roc suggests below, but we should probably get basic video and P2P streams 
working before adding something like this.

On Fri, 1 Jul 2011, Robert O'Callahan wrote:
> 
> I think MediaStreams should not be dealing with compressed data except as an
> optimization when access to decoded data is not required anywhere in the
> stream pipeline.

Indeed.

> If you want to do processing of decoded stream data (which I do --- see 
> http://hg.mozilla.org/users/rocallahan_mozilla.com/specs/raw-file/tip/StreamProcessing/StreamProcessing.html), 
> then introducing a decoder inside the stream processing graph creates 
> all sorts of complications.

Right.

> I think the natural way to support the functionality you're looking for 
> is to extend the concept of Blob URLs. Right now you can create a binary 
> Blob, mint a URL for it and set that URL as the source for a media 
> element. The only extension you need is the ability to append data to 
> the Blob while retaining the same URL; you would need to initially mark 
> the Blob as "open" to indicate to URL consumers that the data stream has 
> not ended. That extension would be useful for all sorts of things 
> because you can use those Blob URLs anywhere. An alternative would be to 
> create a new kind of object representing an appendable sequence of Blobs 
> and create an API to mint URLs for it.

Right.

On Wed, 6 Jul 2011, Shwetank Dixit wrote:
> On Fri, 18 Mar 2011 19:32:49 +0530, Lachlan Hunt wrote:
> > On 2011-03-18 05:45, Ian Hickson wrote:
> > > On Thu, 16 Sep 2010, Jonathan Dixon wrote:
> > > > Further, it could be useful to provide a way to query the video 
> > > > source as to whether the camera is oriented relative to the screen 
> > > > (if the underlying system knows; consider a phone device with both 
> > > > a main camera and self-view camera). This is needed to drive the 
> > > > decision on whether to do this horizontal flip or not. In fact, 
> > > > such an application may want to somehow indicate a preference for 
> > > > the self-view camera when multiple cameras are present in the 
> > > > selection list. c.f. a movie-making app which would prefer the 
> > > > outward facing camera.
> > > 
> > > In getUserMedia() the input is extensible; we could definitely add 
> > > "prefer-user-view" or "prefer-environment-view" flags to the method 
> > > (with better names, hopefully, but consider that 'rear' and 'front' 
> > > are misleading terms -- the front camera on a DSLR faces outward 
> > > from the user, the front camera on a mobile phone faces toward the 
> > > user). The user still has to OK the use of the device, though, so 
> > > maybe it should just be left up to the user to pick the camera? 
> > > They'll need to be able to switch it on the fly, too, which again 
> > > argues to make this a UA feature.
> > 
> > We could just add flags to the options string like this:
> > 
> > "video;view=user, audio" or "video;view=environment, audio"
> > 
> > It's worth pointing out that The HTML Media Capture draft from the DAP 
> > WG uses the terms "camera" and "camcorder" for this purpose, but I 
> > find these terms to be very ambiguous and inappropriate, and so we 
> > should not use them here.
>
> Just wanted to know whether there is any consensus on this or not? 
> Mobile phones are coming out with dual cameras (front and back facing) 
> and depending on the use case, the developer might want access to either 
> the front or back one. (For example, for a simple camera app, a back 
> facing will do, but for a web conferencing app, the front facing will be 
> required). At least, the developer should be able to specify which one 
> to enable by default, which then can be changed the user if needed.

The spec currently uses "video user" and "video environment".

> Another question is flash. As far as I have seen, there seems to be no 
> option to specify whether the camera needs to use flash or not. Is this 
> decision left up to the device? (If someone is making an app which is 
> just clicking a picture of the person, then it would be nice to have the 
> camera use flash in low light conditions).

getUserMedia() returns a video stream, so it wouldn't use a flash.

On Wed, 6 Jul 2011, Rob Manson wrote:
>
> There are also tablet devices with stereo cameras on the back and single 
> on the front too.  Stereo will become increasingly common.

getUserMedia() is intended to support returning multiple video tracks for 
this kind of thing. A video track could also just be natively 3D.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'