[whatwg] PeerConnection, MediaStream, getUserMedia(), and other feedback

Tue Jul 19 01:17:55 PDT 2011

On 2011-07-14 00:39, Ian Hickson wrote:
> In response to off-list feedback, I've renamed StreamTrack to
> MediaStreamTrack to be clearer about its relationship to the other
> interfaces.

Perhaps now that there is no longer any relation to tracks on the media 
elements we could also change Track to something else, maybe Component. 
I have had people complaining to me that Track is not really a good name 
here.

> On Wed, 8 Jun 2011, Per-Erik Brodin wrote:
>>
>> The TrackList feature seems to be a good way to control the different
>> components of a Stream. Although it is said that tracks provide a way to
>> temporarily disable a local camera, due to the nature of the
>> ExclusiveTrackList it is still not possible to disable video altogether,
>> i.e. to 'pull down the curtain' in a video conference. I noticed that
>> there is a bug filed on this issue but I do not think the proposed
>> solution there is quite right. There is a state in which no tracks are
>> selected in an ExclusiveTrackList, when the selected index returned is
>> -1. A quick fix would be to allow also setting the active track to -1 in
>> order to deselect all the other tracks.
>
> This is fixed now, hopefully. Let me know if the fix is not sufficient.
>
> (I replaces the videoTracks and audioTracks lists with a single tracks
> list in which you can enable and disable individual tracks.)

Good. Could we still keep audio and video in separate lists though? It 
makes it easier to check the number of audio or video components and you 
can avoid loops that have to check the kind for each iteration if you 
only want to operate on one media type. I also think that it would be 
easier to construct new MediaStream objects from individual components 
rather than temporarily disabling the ones you do not want to copy to 
the new MediaStream object and then re-enabling them again afterwards. 
It is also unclear to me what happens to a LocalMediaStream object that 
is currently being consumed in that case.

Why should the label the same as the parent on the newly constructed 
MediaStream object? If you send two MediaStream objects constructed from 
the same LocalMediaStream over a PeerConnection there needs to be a way 
to separate them on the receiving side. I also think it is a bit 
unfortunate that we now have a 'label' property on the track objects 
that means something else than the 'label' property on MediaStream, 
perhaps 'description' would be a more suitable name for the former.

>> We prefer having a StreamRecorder that you have to stop in order get the
>> recorded data (like the previous one, but with asynchronous Blob retrieval)
>> and we do not understand the use cases for the current proposal where
>> recording continues until the recorder is garbage collected (or the Stream
>> ends) and you always get the data from the beginning of the recording. This
>> also has to be tied to application quota in some way.
>
> The current design is just the result of needing to define what happens
> when you call getRecordedData() twice in a row. Could you elaborate on
> what API you think we should have?

What I am thinking of is something similar to what was proposed in
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-March/030921.html
although that does not take quota into account. Preferably quota should 
be expressed in media time and that is heavily dependent on the format 
being used and, regardless of any codecs, I still think that the format 
has to be specified somehow. Perhaps it would be best to push recording 
to v2 since this does not seem to be the primary use case for people 
currently showing the most interest in this part of the spec.

>> Instead of blob: we would like to use stream: for the Stream URLs so
>> that we very early on in the media engine selection can use the protocol
>> scheme to determine how the URL will be handled. Blobs are typically
>> handled in the same way as other media playback. The definition of
>> stream: could be the same as for blob:.
>
> Why can't the UA know which blob: URLs point to streams and which point to
> blobs?

I was not saying that it would not be possible to keep track of which 
blob: URLs that point to blobs and which point to streams just that we 
want to avoid doing that in the early stage of the media engine 
selection. In my opinion a stream is quite the opposite of a blob 
(unknown, perhaps infinite length vs. fixed length) so when printing the 
URLs for debugging purposes it would also be much nicer to have two 
different protocol schemes. If I remember correctly the discussions 
leading up to the renaming of createBlobURL to createObjectURL assumed 
that there would be stream: URLs.

> Actually, the spec doesn't currently say anything happens when a stream
> that is being transmitted just ends, either. I guess I should spec that...
>
> ...ok, now the spec is clear that an ended stream transmits blackness and
> silence. Same with if some tracks are disabled. (Blackness only if there's
> a video track; silence only if there's an audio track.)

OK, I guess that could work. In that case you have to manually remove 
the stream from being sent over a PeerConnection if it unexpectedly 
ends. Should playing back an ended stream locally in a video element 
also produce silence and blackness then? We have been assuming network 
error which has the nice consequence that you can discard ended streams 
that are only reachable by their stream: URLs even if the URLs are not 
revoked since trying to play back such a stream: URL is 
indistinguishable from trying to play back an invalid stream: URL, in 
that case.

>> PeerConnection is an EventTarget but it still uses a callback for the
>> signaling messages and this mixture of events and callbacks is a bit
>> awkward in my opinion. If you would like to change the function that
>> handles signaling messages after calling the constructor you would have
>> to wrap a function call inside the callback to the actual signal
>> handling function, instead of just (re-)setting an onsignal (or
>> whatever) attribute listener (the event could reuse the MessageEvent
>> interface).
>
> When would you change the callback?

If you would like to send the signaling messages peer-to-peer over the 
data channel, once it is established.

>
> My concern with making the callback an event handler is that it leads to a
> set of poor failure modes and complications in the API:
>
>   - It's easy to not register a callback, which makes no sense. There's
>     literally never a use for create a PeerConnection without a signaling
>     channel, as far as I can tell, so making it easier to create one
>     without a callback than with seems like a bad design.

I think this is the case for many APIs. For example, creating an 
EventSource without registering any listener for incoming events equally 
does not make sense. With the same reasoning we should force such a 
handler to be passed to the EventSource constructor. I think the author 
will figure out and learn rather quickly which events that need to be 
handled.

>   - It's easy to register multiple callbacks. This equally makes no sense,
>     and would likely only ever be the source of bugs.

Yes it would be possible but I am not sure that it would be any easier 
to make such a mistake than to make any other mistake that would cause 
the application to malfunction. I was under the impression that most 
people use attribute event listeners anyway.

>   - It makes getting the data more complicated. Instead of passing the
>     callback the string to send, we end up passing an object which has on
>     it one attribute that contains the string.

Yeah, if getting 'event.data' is more complicated than just getting 
'data'. To get the incoming stream from a StreamEvent (should probably 
be renamed to MediaStreamEvent for consistency) the author would have to 
know how to do this anyway.

>> There is a potential problem in the exchange of SDPs in that glare
>> conditions can occur if both peers add streams simultaneously, in which
>> case there will be two different outstanding offers that none of the
>> peers are allowed to respond to according to the SDP offer-answer model.
>> Instead of using one SDP session for all media as the specification
>> suggests, we are handling the offer-answer for each stream separately to
>> avoid such conditions.
>
> Why isn't this handled by the ICE role conflict processing rules? It seems
> like simultaneous ICE restarts would be trivially resolvable by just
> following the rules in the ICE spec. Am I missing something?

This problem is not related to ICE but rather to the SDP offer-answer 
model which is separate from the ICE processing. The problem is that SDP 
offer-answer does not allow you to respond to an offer when you have an 
outstanding offer for the same set of streams. The way we are avoiding 
this is by sending SDP fragments (only the SDP lines related to the 
affected stream) rather than the full SDP each time.

//Per-Erik