[whatwg] PeerConnection, MediaStream, getUserMedia(), and other feedback

Tue Aug 2 02:05:55 PDT 2011

On 2011-07-26 07:30, Ian Hickson wrote:
> On Tue, 19 Jul 2011, Per-Erik Brodin wrote:
>>
>> Perhaps now that there is no longer any relation to tracks on the media
>> elements we could also change Track to something else, maybe Component.
>> I have had people complaining to me that Track is not really a good name
>> here.
>
> I'm happy to change the name if there's a better one. I'm not sure
> Component is any better than Track though.

OK, let's keep Track until someone comes up with a better name then.

>> Good. Could we still keep audio and video in separate lists though? It
>> makes it easier to check the number of audio or video components and you
>> can avoid loops that have to check the kind for each iteration if you
>> only want to operate on one media type.
>
> Well in most (almost all?) cases, there'll be at most one audio track and
> at most one video track, which is why I didn't put them in separate lists.
> What use cases did you have in mind where there would be enough tracks
> that it would be better for them to be separate lists?

Yes, you're right, but even with zero or one track it's more convenient 
to have them separate because that way you can more easily check if the 
stream contains any audio and/or video tracks and check the number of 
tracks of each kind. I also think it will be problematic if we would 
like to add another kind at a later stage if all tracks are in the same 
list since people will make assumptions that audio and video are the 
only kinds.

>> I also think that it would be easier to construct new MediaStream
>> objects from individual components rather than temporarily disabling the
>> ones you do not want to copy to the new MediaStream object and then
>> re-enabling them again afterwards.
>
> Re-enabling them afterwards would re-include them in the copies, too.

Why is this needed? If a new MediaStream object is constructed from 
another MediaStream I think it would be simpler to just let that be a 
clone of the stream with all tracks present (with the enabled/disabled 
states independently set).

> The main use case here is temporarily disabling a video or audio track in
> a video conference. I don't understand how your proposal would work for
> that. Can you elaborate?

A new MediaStream object is created from the video track of a 
LocalMediaStream to be used as self-view. The LocalMediaStream can then 
be sent over PeerConnection and the video track disabled without 
affecting the MediaStream being played back locally in the self-view. In 
addition, my proposal opens up for additional use cases that require 
combining tracks from different streams, such as recording a 
conversation (a number of audio tracks from various streams, local and 
remote combined to a single stream).

>> It is also unclear to me what happens to a LocalMediaStream object that
>> is currently being consumed in that case.
>
> Not sure what you mean. Can you elaborate?

I was under the impression that, if a stream of audio and video is being 
sent to one peer and then another peer joins but only audio should be 
sent, then video would have to be temporarily disabled in the first 
stream in order to construct a new MediaStream object containing only 
the audio track. Again, it would be simpler to construct a new 
MediaStream object from just the audio track and send that.

>> Why should the label the same as the parent on the newly constructed
>> MediaStream object?
>
> The label identifies the source of the media. It's the same source, so,
> same label.

I agree, but usually you have more than one source in a MediaStream and 
if you construct a new MediaStream from it which doesn't contain all of 
the sources from the parent I don't think the label should be the same. 
By the way, what happens if you call getUserMedia() twice and get the 
same set of sources both times, do you get the same label then? What if 
the user selects different sources the second time?

>> If you send two MediaStream objects constructed from the same
>> LocalMediaStream over a PeerConnection there needs to be a way to
>> separate them on the receiving side.
>
> What's the use case for sending the same feed twice?

If the labels are the same then that should indicate that it's 
essentially the same stream and there should be no need to send it 
twice. If the streams are not composed of the same underlying sources 
then you may want to send them both and the labels should differ.

>> I also think it is a bit unfortunate that we now have a 'label' property
>> on the track objects that means something else than the 'label' property
>> on MediaStream, perhaps 'description' would be a more suitable name for
>> the former.
>
> In what sense do they mean different things? I don't understand the
> problem here. Can you elaborate?

As Tommy pointed out, label on MediaStream is an identifier for the 
stream whereas label och MediaStreamTrack is a description of the source.

>>> The current design is just the result of needing to define what
>>> happens when you call getRecordedData() twice in a row. Could you
>>> elaborate on what API you think we should have?
>>
>> What I am thinking of is something similar to what was proposed in
>> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-March/030921.html
>
> That doesn't answer the question of what happens if you call stop() twice.

Nothing will happen the second time since recording has already stopped.

> (Also, having to call a method and hook an event so that you can read an
> attribute seems like a rather round-about way of getting data. Is calling
> a method with a callback not simpler?)

When the event has been fired you can read the attribute whenever you 
want to get the blob, how many times you want. I prefer that over having 
stop() take a callback argument.

> Quota doesn't seem particularly important here. It's not like you can
> really do lasting damage. It would just be a DOS attack, like creating a
> Web page with an infinite number of 10000x10000 canvases. We can just let
> the "hardware limitation" clause handle it.

In a video blog recording application it would be nice to be able to 
present to the user how much more can be recorded and not just handle it 
as a hardware limitation, since that could mean dropping the entire 
recording.

>> I was not saying that it would not be possible to keep track of which
>> blob: URLs that point to blobs and which point to streams just that we
>> want to avoid doing that in the early stage of the media engine
>> selection. In my opinion a stream is quite the opposite of a blob
>> (unknown, perhaps infinite length vs. fixed length) so when printing the
>> URLs for debugging purposes it would also be much nicer to have two
>> different protocol schemes. If I remember correctly the discussions
>> leading up to the renaming of createBlobURL to createObjectURL assumed
>> that there would be stream: URLs.
>
> You wouldn't be able to remove that logic, since http: URLs would still
> have the same needs. You can have finite and infinite http: resources,
> just like you can have finite and infinite blob: resources. I don't really
> see the problem here. Indeed, with blob:, it's trivial to find out if the
> resource is finite or not; with http: you might not know until the whole
> finite resource is downloaded.
>
> If there is something I'm missing here please do let me know.

The differentiation is not between finite and infinite resources but 
rather between playback media resources and conversational media 
resources. blob: and http: are both handled by the playback media engine 
whereas stream: is handled by the conversational media engine. We would 
like to be able to determine which engine to use by simply looking at 
the URL.

>>>> PeerConnection is an EventTarget but it still uses a callback for
>>>> the signaling messages and this mixture of events and callbacks is a
>>>> bit awkward in my opinion. If you would like to change the function
>>>> that handles signaling messages after calling the constructor you
>>>> would have to wrap a function call inside the callback to the actual
>>>> signal handling function, instead of just (re-)setting an onsignal
>>>> (or whatever) attribute listener (the event could reuse the
>>>> MessageEvent interface).
>>>
>>> When would you change the callback?
>>
>> If you would like to send the signaling messages peer-to-peer over the
>> data channel, once it is established.
>
> That seems like a disaster waiting to happen. The UDP data channel is
> unreliable, the signaling channel has to be reliable. Worse, the UDP data
> channel might go down at any second, and then the user agent would try to
> re-establish it using the signaling channel.

You can provide a reliable channel on top of the unreliable channel and 
monitor the PeerConnection state so that you know when to fall back to 
server-relayed signaling. One reason to do this would be to improve the 
signaling latency which can be of importance in applications that, for 
example, trigger format renegotiation due to change in video display size.

>>>    - It's easy to not register a callback, which makes no sense. There's
>>>      literally never a use for create a PeerConnection without a signaling
>>>      channel, as far as I can tell, so making it easier to create one
>>>      without a callback than with seems like a bad design.

>> For example, creating an EventSource without registering any listener
>> for incoming events equally does not make sense.
>
> Actually, it does. One operation mode for EventSource is to have events
> with different names, each triggering a different event listener.

An EventSource without any event listener seems rather useless to me. 
Even if you can assign multiple handlers for events with different 
names, all those handlers could still be provided as arguments to the 
constructor, right? That would ensure that nobody can create an 
EventSource without registering at least one event listener.

>>>> There is a potential problem in the exchange of SDPs in that glare
>>>> conditions can occur if both peers add streams simultaneously, in
>>>> which case there will be two different outstanding offers that none
>>>> of the peers are allowed to respond to according to the SDP
>>>> offer-answer model. Instead of using one SDP session for all media
>>>> as the specification suggests, we are handling the offer-answer for
>>>> each stream separately to avoid such conditions.
>>>
>>> Why isn't this handled by the ICE role conflict processing rules? It
>>> seems like simultaneous ICE restarts would be trivially resolvable by
>>> just following the rules in the ICE spec. Am I missing something?
>>
>> This problem is not related to ICE but rather to the SDP offer-answer
>> model which is separate from the ICE processing. The problem is that SDP
>> offer-answer does not allow you to respond to an offer when you have an
>> outstanding offer for the same set of streams.
>
> As far as I can tell, your interpretation is incorrect. This is entirely
> related to ICE, and ICE, as far as I can tell, defines this exact case in
> its role conflict resolution.
>
> The only time this can happen is if you have both ends do an ICE restart
> at exactly the same time. The offer from each ICE agent will be received
> by the other as if it was the response, and thus there will be a role
> conflict and the ICE role conflict resolution process will kick in. No?

No, an ICE role conflict is not the same thing as a glare condition in 
SDP offer-answer.

//Per-Erik