[whatwg] PeerConnection feedback
Ian Hickson
ian at hixie.ch
Fri Dec 2 16:00:50 PST 2011
I include below, for posterity, some feedback to which I will not be
replying, as it relates to the PeerConnection and media streams section of
the specification which has since been moved to the WebRTC working group
at the W3C.
I encourage anyone who is interested in that particular topic to follow
the aforementioned group.
On Tue, 26 Jul 2011, Mark Callow wrote:
> On 26/07/2011 14:30, Ian Hickson wrote:
> > On Thu, 14 Jul 2011 04:09:40 +0530, Ian Hickson <ian at hixie.ch> wrote:
> > > > > >
> > > > > > Another question is flash. As far as I have seen, there seems
> > > > > > to be no option to specify whether the camera needs to use
> > > > > > flash or not. Is this decision left up to the device? (If
> > > > > > someone is making an app which is just clicking a picture of
> > > > > > the person, then it would be nice to have the camera use flash
> > > > > > in low light conditions).
> > > > >
> > > > > getUserMedia() returns a video stream, so it wouldn't use a
> > > > > flash.
> > >
> > > Wouldn't it make sense to have a provision for flash separately
> > > then? I think a lot of apps would like just a picture instead of
> > > video, and in those cases, flash would be required. Maybe a seperate
> > > provision in the spec which defines whether to use flash, and if so,
> > > for how many miliseconds. Is that doable?
>
> There is a lot more that could be done than simply triggering the flash.
> See /The Frankencamera: An Experimental Platform for Computational
> Photography/ <http://graphics.stanford.edu/papers/fcam/> and The FCAM
> API <http://fcam.garage.maemo.org/>.
On Tue, 26 Jul 2011, Tommy Widenflycht (á~[~Oá~Z®á~[~Xá~[~Xá~Z¤) wrote:
> On Tue, Jul 26, 2011 at 07:30, Ian Hickson <ian at hixie.ch> wrote:
> > >
> > > If you send two MediaStream objects constructed from the same
> > > LocalMediaStream over a PeerConnection there needs to be a way to
> > > separate them on the receiving side.
> >
> > What's the use case for sending the same feed twice?
>
> There's no proper use case as such but the spec allows this.
> >
> > > I also think it is a bit unfortunate that we now have a 'label'
> > > property on the track objects that means something else than the
> > > 'label' property on MediaStream, perhaps 'description' would be a
> > > more suitable name for the former.
> >
> > In what sense do they mean different things? I don't understand the
> > problem here. Can you elaborate?
>
> label on a MediaStream is a unique identifier, while the label on a
> MediaStreamTrack is just a description like "Logitech Vision Pro", "Line
> In" or "Built-in Mic". I too find this a bit odd.
>
> [...]
>
> If I may make an analogy to the real world: plumbing.
>
> Each fork of a MediaStream is a new joint in the pipe, my suggestion
> introduces a tap at each joint. No matter how you open and close the tap
> at the end (or middle); if any previous tap is closed there's nothing
> coming through. The spec currently removes and add the entire pipe after
> the changed joint.
>
> > > Also some follow-up questions regarding the new TrackLists:
> > >
> > > What should happen when a track fails? Should the entire stream
> > > fail, the MSTrack silently be removed or the MSTrack disassociated
> > > with the track (and thus becoming a do-nothing object)?
> >
> > What do you mean by "fails"?
>
> Yanking the USB cable to the camera for example. This should imho stop
> the MS, not just silently send black video.
>
> > > What should happen when a stream with two or more video tracks is
> > > associated to a <video> tag? Just render the first enabled one?
> >
> > Same as if you had a regular video file with multiple tracks.
>
> And that is? Sorry, this might be written down somewhere and I have
> missed it.
On Thu, 28 Jul 2011, Stefan Håkansson LK wrote:
> >On Tue, Jul 26, 2011 at 07:30, Ian Hickson <ian at hixie.ch> wrote:
> >>
> >> > If you send two MediaStream objects constructed from the same
> >> > LocalMediaStream over a PeerConnection there needs to be a way to
> >> > separate them on the receiving side.
> >>
> >> What's the use case for sending the same feed twice?
> >
> >There's no proper use case as such but the spec allows this.
>
> The question is how serious a problem this is. If you want to fork, and
> make both (all) versions available at the peer, would you not transmit
> the full stream and fork at the receiving end for efficiency reasons?
> And if you really want to fork at the sender, one way to separate them
> is to use one PeerConnection per fork.
On Tue, 2 Aug 2011, Per-Erik Brodin wrote:
> On 2011-07-26 07:30, Ian Hickson wrote:
> > On Tue, 19 Jul 2011, Per-Erik Brodin wrote:
> > >
> > > Perhaps now that there is no longer any relation to tracks on the
> > > media elements we could also change Track to something else, maybe
> > > Component. I have had people complaining to me that Track is not
> > > really a good name here.
> >
> > I'm happy to change the name if there's a better one. I'm not sure
> > Component is any better than Track though.
>
> OK, let's keep Track until someone comes up with a better name then.
>
> > > Good. Could we still keep audio and video in separate lists though?
> > > It makes it easier to check the number of audio or video components
> > > and you can avoid loops that have to check the kind for each
> > > iteration if you only want to operate on one media type.
> >
> > Well in most (almost all?) cases, there'll be at most one audio track
> > and at most one video track, which is why I didn't put them in
> > separate lists. What use cases did you have in mind where there would
> > be enough tracks that it would be better for them to be separate
> > lists?
>
> Yes, you're right, but even with zero or one track it's more convenient
> to have them separate because that way you can more easily check if the
> stream contains any audio and/or video tracks and check the number of
> tracks of each kind. I also think it will be problematic if we would
> like to add another kind at a later stage if all tracks are in the same
> list since people will make assumptions that audio and video are the
> only kinds.
>
> > > I also think that it would be easier to construct new MediaStream
> > > objects from individual components rather than temporarily disabling
> > > the ones you do not want to copy to the new MediaStream object and
> > > then re-enabling them again afterwards.
> >
> > Re-enabling them afterwards would re-include them in the copies, too.
>
> Why is this needed? If a new MediaStream object is constructed from
> another MediaStream I think it would be simpler to just let that be a
> clone of the stream with all tracks present (with the enabled/disabled
> states independently set).
>
> > The main use case here is temporarily disabling a video or audio track
> > in a video conference. I don't understand how your proposal would work
> > for that. Can you elaborate?
>
> A new MediaStream object is created from the video track of a
> LocalMediaStream to be used as self-view. The LocalMediaStream can then
> be sent over PeerConnection and the video track disabled without
> affecting the MediaStream being played back locally in the self-view. In
> addition, my proposal opens up for additional use cases that require
> combining tracks from different streams, such as recording a
> conversation (a number of audio tracks from various streams, local and
> remote combined to a single stream).
>
> > > It is also unclear to me what happens to a LocalMediaStream object
> > > that is currently being consumed in that case.
> >
> > Not sure what you mean. Can you elaborate?
>
> I was under the impression that, if a stream of audio and video is being
> sent to one peer and then another peer joins but only audio should be
> sent, then video would have to be temporarily disabled in the first
> stream in order to construct a new MediaStream object containing only
> the audio track. Again, it would be simpler to construct a new
> MediaStream object from just the audio track and send that.
>
> > > Why should the label the same as the parent on the newly constructed
> > > MediaStream object?
> >
> > The label identifies the source of the media. It's the same source,
> > so, same label.
>
> I agree, but usually you have more than one source in a MediaStream and
> if you construct a new MediaStream from it which doesn't contain all of
> the sources from the parent I don't think the label should be the same.
> By the way, what happens if you call getUserMedia() twice and get the
> same set of sources both times, do you get the same label then? What if
> the user selects different sources the second time?
>
> > > If you send two MediaStream objects constructed from the same
> > > LocalMediaStream over a PeerConnection there needs to be a way to
> > > separate them on the receiving side.
> >
> > What's the use case for sending the same feed twice?
>
> If the labels are the same then that should indicate that it's
> essentially the same stream and there should be no need to send it
> twice. If the streams are not composed of the same underlying sources
> then you may want to send them both and the labels should differ.
>
> > > I also think it is a bit unfortunate that we now have a 'label'
> > > property on the track objects that means something else than the
> > > 'label' property on MediaStream, perhaps 'description' would be a
> > > more suitable name for the former.
> >
> > In what sense do they mean different things? I don't understand the
> > problem here. Can you elaborate?
>
> As Tommy pointed out, label on MediaStream is an identifier for the
> stream whereas label och MediaStreamTrack is a description of the
> source.
>
> > > > The current design is just the result of needing to define what
> > > > happens when you call getRecordedData() twice in a row. Could you
> > > > elaborate on what API you think we should have?
> > >
> > > What I am thinking of is something similar to what was proposed in
> > > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-March/030921.html
> >
> > That doesn't answer the question of what happens if you call stop()
> > twice.
>
> Nothing will happen the second time since recording has already stopped.
>
> > (Also, having to call a method and hook an event so that you can read
> > an attribute seems like a rather round-about way of getting data. Is
> > calling a method with a callback not simpler?)
>
> When the event has been fired you can read the attribute whenever you
> want to get the blob, how many times you want. I prefer that over having
> stop() take a callback argument.
>
> > Quota doesn't seem particularly important here. It's not like you can
> > really do lasting damage. It would just be a DOS attack, like creating
> > a Web page with an infinite number of 10000x10000 canvases. We can
> > just let the "hardware limitation" clause handle it.
>
> In a video blog recording application it would be nice to be able to
> present to the user how much more can be recorded and not just handle it
> as a hardware limitation, since that could mean dropping the entire
> recording.
>
> > > I was not saying that it would not be possible to keep track of
> > > which blob: URLs that point to blobs and which point to streams just
> > > that we want to avoid doing that in the early stage of the media
> > > engine selection. In my opinion a stream is quite the opposite of a
> > > blob (unknown, perhaps infinite length vs. fixed length) so when
> > > printing the URLs for debugging purposes it would also be much nicer
> > > to have two different protocol schemes. If I remember correctly the
> > > discussions leading up to the renaming of createBlobURL to
> > > createObjectURL assumed that there would be stream: URLs.
> >
> > You wouldn't be able to remove that logic, since http: URLs would
> > still have the same needs. You can have finite and infinite http:
> > resources, just like you can have finite and infinite blob: resources.
> > I don't really see the problem here. Indeed, with blob:, it's trivial
> > to find out if the resource is finite or not; with http: you might not
> > know until the whole finite resource is downloaded.
> >
> > If there is something I'm missing here please do let me know.
>
> The differentiation is not between finite and infinite resources but
> rather between playback media resources and conversational media
> resources. blob: and http: are both handled by the playback media engine
> whereas stream: is handled by the conversational media engine. We would
> like to be able to determine which engine to use by simply looking at
> the URL.
>
> > > > > PeerConnection is an EventTarget but it still uses a callback
> > > > > for the signaling messages and this mixture of events and
> > > > > callbacks is a bit awkward in my opinion. If you would like to
> > > > > change the function that handles signaling messages after
> > > > > calling the constructor you would have to wrap a function call
> > > > > inside the callback to the actual signal handling function,
> > > > > instead of just (re-)setting an onsignal (or whatever) attribute
> > > > > listener (the event could reuse the MessageEvent interface).
> > > >
> > > > When would you change the callback?
> > >
> > > If you would like to send the signaling messages peer-to-peer over
> > > the data channel, once it is established.
> >
> > That seems like a disaster waiting to happen. The UDP data channel is
> > unreliable, the signaling channel has to be reliable. Worse, the UDP
> > data channel might go down at any second, and then the user agent
> > would try to re-establish it using the signaling channel.
>
> You can provide a reliable channel on top of the unreliable channel and
> monitor the PeerConnection state so that you know when to fall back to
> server-relayed signaling. One reason to do this would be to improve the
> signaling latency which can be of importance in applications that, for
> example, trigger format renegotiation due to change in video display
> size.
>
> > > > - It's easy to not register a callback, which makes no sense.
> > > > There's literally never a use for create a PeerConnection without
> > > > a signaling channel, as far as I can tell, so making it easier to
> > > > create one without a callback than with seems like a bad design.
> > >
> > > For example, creating an EventSource without registering any
> > > listener for incoming events equally does not make sense.
> >
> > Actually, it does. One operation mode for EventSource is to have
> > events with different names, each triggering a different event
> > listener.
>
> An EventSource without any event listener seems rather useless to me.
> Even if you can assign multiple handlers for events with different
> names, all those handlers could still be provided as arguments to the
> constructor, right? That would ensure that nobody can create an
> EventSource without registering at least one event listener.
>
> > > > > There is a potential problem in the exchange of SDPs in that
> > > > > glare conditions can occur if both peers add streams
> > > > > simultaneously, in which case there will be two different
> > > > > outstanding offers that none of the peers are allowed to respond
> > > > > to according to the SDP offer-answer model. Instead of using one
> > > > > SDP session for all media as the specification suggests, we are
> > > > > handling the offer-answer for each stream separately to avoid
> > > > > such conditions.
> > > >
> > > > Why isn't this handled by the ICE role conflict processing rules?
> > > > It seems like simultaneous ICE restarts would be trivially
> > > > resolvable by just following the rules in the ICE spec. Am I
> > > > missing something?
> > >
> > > This problem is not related to ICE but rather to the SDP
> > > offer-answer model which is separate from the ICE processing. The
> > > problem is that SDP offer-answer does not allow you to respond to an
> > > offer when you have an outstanding offer for the same set of
> > > streams.
> >
> > As far as I can tell, your interpretation is incorrect. This is
> > entirely related to ICE, and ICE, as far as I can tell, defines this
> > exact case in its role conflict resolution.
> >
> > The only time this can happen is if you have both ends do an ICE
> > restart at exactly the same time. The offer from each ICE agent will
> > be received by the other as if it was the response, and thus there
> > will be a role conflict and the ICE role conflict resolution process
> > will kick in. No?
>
> No, an ICE role conflict is not the same thing as a glare condition in
> SDP offer-answer.
On Wed, 27 Jul 2011, Rob Manson wrote:
>
> This is definitely not intended as criticism of any of the work going
> on. It's intended as constructive feedback that hopefully provides
> clarification on a key use case and it's supporting requirements.
>
> "Access to live/raw audio and video stream data from both local
> and remote sources in a consistent way"
>
> I've spent quite a bit of time trying to follow a clear thread of
> requirements/solutions that provide API access to raw stream data (e.g.
> audio, video, etc.). But I'm a bit concerned this is falling in the gap
> between the DAP and RTC WGs. If this is not the case then please point
> me to the relevant docs and I'll happily get back in my box 8)
>
> Here's how the thread seems to flow at the moment based on public
> documents.
>
> On the DAP page [1] the mission states:
> "the Device APIs and Policy Working Group is to create
> client-side APIs that enable the development of Web Applications
> and Web Widgets that interact with devices services such as
> Calendar, Contacts, Camera, etc"
>
> So it seems clear that this is the place to start. Further down that
> page the "HTML Media Capture" and "Media Capture" APIs are listed.
>
> HTML Media Capture (camera/microphone interactions through HTML forms)
> initially seems like a good candidate, however the intro in the latest
> PWD [2] clearly states:
> "Providing streaming access to these capabilities is outside of
> the scope of this specification."
>
> Followed by a NOTE that states:
> "The Working Group is investigating the opportunity to specify
> streaming access via the proposed <device> element."
> The link on the "proposed <device> element" [3] links to a "no
> longer maintained" document that then redirects to the top level of the
> whatwg "current work" page [4]. On that page the most relevant link is
> the video conferencing and peer-to-peer communication section [5].
> More about that further below.
>
> So back to the DAP page to follow explore the other Media Capture API
> (programmatic access to camera/microphone) [1] and it's latest PWD [6].
>
> The abstract states:
>
> "This specification defines an Application Programming Interface
> (API) that provides access to the audio, image and video capture
> capabilities of the device."
>
> And the introduction states:
>
> "The Capture API defines a high-level interface for accessing
> the microphone and camera of a hosting device. It completes the
> HTML Form Based Media Capturing specification [HTMLMEDIACAPTURE]
> with a programmatic access to start a parametrized capture
> process."
>
> So it seems clear that this is not related to streams in any way either.
>
> The Notes column for this API on the DAP page [1] also states:
> "Programmatic API that completes the form based approach
> Need to check if still interest in this
> How does it relate with the Web RTC Working Group?"
>
> Is there an updated position on this?
>
> So if you then head over to the WebRTC WG's charter [7] it states:
> "...to define client-side APIs to enable Real-Time
> Communications in Web browsers.
>
> These APIs should enable building applications that can be run
> inside a browser, requiring no extra downloads or plugins, that
> allow communication between parties using audio, video and
> supplementary real-time communication, without having to use
> intervening servers..."
> So this is clearly focused upon peer-to-peer communication
> "between" systems and the stream related access is naturally just
> treated as an ancillary requirement. The scope section then states:
> "Enabling real-time communications between Web browsers require
> the following client-side technologies to be available:
>
> - API functions to explore device capabilities, e.g. camera,
> microphone, speakers (currently in scope for the Device APIs &
> Policy Working Group)
> - API functions to capture media from local devices (camera and
> microphone) (currently in scope for the Device APIs & Policy
> Working Group)
> - API functions for encoding and other processing of those media
> streams,
> - API functions for establishing direct peer-to-peer
> connections, including firewall/NAT traversal
> - API functions for decoding and processing (including echo
> cancelling, stream synchronization and a number of other
> functions) of those streams at the incoming end,
> - Delivery to the user of those media streams via local screens
> and audio output devices (partially covered with HTML5)"
>
> So this is where I really start to feel the gap growing. The DAP is
> pointing to RTC saying not sure how if our Camera/Microphone APIs are
> being superseded by the work in the RTC...and the RTC then points back
> to say it will be relying on work in the DAP. However the RTCs
> Recommended Track Deliverables list does include:
> "Media Stream Functions, Audio Stream Functions and Video Stream
> Functions"
>
> So then it's back to the whatwg MediaStream and LocalMediaStream current
> work [8]. Following this through you end up back at the <audio> and
> <video> media element with some brief discussion about media data [9].
>
> Currently the only API that I'm aware of that allows live access to the
> audio data through the <audio> tag is the relatively proprietary Mozilla
> Audio Data API [10].
>
> And while the video stream data can be accessed by rendering each frame
> into a canvas 2d graphics context and then using getImageData to extract
> and manipulate it from there [11], this seems more like a work around
> than an elegantly designed solution.
>
> As I said above, this is not intended as a criticism of the work that
> the DAP WG, WebRTC WG or WHATWG are doing. It's intended as
> constructive feedback to highlight that the important use case of
> "Access to live/raw audio and video stream data from both local and
> remote sources" appears to be falling in the gaps between the groups.
>
> From my perspective this is a critical use case for many advanced web
> apps that will help bring them in line with what's possible in the
> native single vendor stack based apps at the moment (e.g. iPhone &
> Android). And it's also critical for the advancement of web standards
> based AR applications and other computer vision, hearing and signal
> processing applications.
>
> I understand that a lot of these specifications I've covered are in very
> formative stages and that requirements and PWDs are just being drafted
> as I write. And that's exactly why I'm raising this as a single and
> consolidated perspective that spans all these groups. I hope this goes
> some way towards "Access to live/raw audio and video stream data from
> both local and remote sources" being treated as an essential and core
> use case that binds together the work of all these groups. With a clear
> vision for this and a little consolidated work I think this will then
> also open up a wide range of other app opportunities that we haven't
> even thought of yet. But at the moment it really feels like this is
> being treated as an assumed requirement and could end up as a poorly
> formed second class bundle of semi-related API hooks.
>
> For this use case I'd really like these clear requirements to be
> supported:
> - access the raw stream data for both audio and video in similar ways
> - access the raw stream data from both remote and local streams in
> similar ways
> - ability to inject new data or the transformed original data back into
> streams and presented audio/video tags in a consistent way
> - all of this be optimised for performance to meet the demands of live
> signal processing
>
> PS: I've also cc'd in the mozilla dev list as I think this directly
> relates to the current "booting to the web" thread [12]
>
> [1] http://www.w3.org/2009/dap/
> [2] http://www.w3.org/TR/2011/WD-html-media-capture-20110414/#introduction
> [3] http://dev.w3.org/html5/html-device/
> [4] http://www.whatwg.org/specs/web-apps/current-work/complete/#devices
> [5] http://www.whatwg.org/specs/web-apps/current-work/complete/#auto-toc-9
> [6] http://www.w3.org/TR/2010/WD-media-capture-api-20100928/
> [7] http://www.w3.org/2011/04/webrtc-charter.html
> [8] http://www.whatwg.org/specs/web-apps/current-work/complete/video-conferencing-and-peer-to-peer-communication.html#mediastream
> [9] http://www.whatwg.org/specs/web-apps/current-work/complete/the-iframe-element.html#media-data
> [10] https://wiki.mozilla.org/Audio_Data_API
> [11] https://developer.mozilla.org/En/Manipulating_video_using_canvas
> [12] http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/7668a9d46a43e482#
On Fri, 12 Aug 2011, Darin Fisher wrote:
>
> Putting implementation details aside, I agree that it is a bit
> unfortunate to refer to a stream as a blob. So far, blobs have always
> referred to static, fixed-size things.
>
> This function was originally named createBlobURL, but it was renamed
> createObjectURL precisely because we imagined it being useful to pass
> things that were not blobs to it. It seems reasonable that passing a
> Foo object to createObjectURL might mint a different URL type than what
> we would mint for a Bar object.
>
> It could also be the case that using blob: for referring to Blobs was
> unfortunate. Maybe we do not really need separate URL schemes for
> static, fixed size things and streams.
On Mon, 15 Aug 2011, Harald Alvestrand wrote:
>
> Back in ancient history (late 90s, I think), when I wrote the first
> version of stuff that eventually merged into RFC 4395, "New URI
> schemes", I thought the set of operations an URI supported was pretty
> important.
>
> In fact the text of RFC 4395 says:
>
> 2.4. Definition of Operations
>
> As part of the definition of how a URI identifies a resource, a URI
> scheme definition SHOULD define the applicable set of operations that
> may be performed on a resource using the URI as its identifier. A
> model for this is HTTP; an HTTP resource can be operated on by GET,
> POST, PUT, and a number of other operations available through the
> HTTP protocol. The URI scheme definition should describe all
> well-defined operations on the URI identifier, and what they are
> supposed to do.
>
> Some URI schemes don't fit into the "information access" paradigm of
> URIs. For example, "telnet" provides location information for
> initiating a bi-directional data stream to a remote host; the only
> operation defined is to initiate the connection. In any case, the
> operations appropriate for a URI scheme should be documented.
>
> Note: It is perfectly valid to say that "no operation apart from GET
> is defined for this URI". It is also valid to say that "there's only
> one operation defined for this URI, and it's not very GET-like". The
> important point is that what is defined on this scheme is described.
>
> So if that consideration is still of concern, the next question is of
> course "are there operations that make sense for a stream that don't
> make sense for (current uses of) blob:, or vice versa"?
>
> If "blob:" was intended to mean "reference to internal object, hand it
> to APIs, the APIs will tell you if they don't like them", that
> consideration may not be that important.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list