[whatwg] Peer-to-peer communication, video conferencing, <device>, and related topics

Thu Mar 17 21:45:06 PDT 2011

When replying to this e-mail please only quote the parts to which you are 
responding, and adjust the subject line accordingly.

This e-mail is a reply to about a year's worth of feedback collected on 
the topics of peer-to-peer communication, video conferencing, the <device> 
element, and related topics. This feedback was used to update the spec 
recently, greatly expanding on the placeholder that had previously 
sketched a proposal for how these features should work. (This e-mail does 
not include replies to most of the feedback received after the change to 
the spec. I'll be replying to the bulk of this more recent feedback in a 
separate e-mail soonish.)

Here is a high-level overview of the changes; for specific rationales, 
please see the detailed responses to the e-mails below.

 * <device> has been replaced with a Geolocation-style API for requesting 
   user access to local media devices (such as cameras).

 * locally-generated streams can be paused and resumed.

 * the ConnectionPeer interface has been replaced with a PeerConnection 
   interface that interacts directly with ICE and its dependencies.

 * PeerConnection has been streamlined (compared to ConnectionPeer), e.g. 
   there is no longer a feature for direct file transfer or for reliable 
   text messaging.

 * the wire format for the unreliable data channel has been specified.

 * the spec has been brought up to date with recent developments in other 
   Web specs such as File API and WebIDL.

On Mon, 22 Mar 2010, Mark Frohnmayer wrote:
> 
> I am currently engaged by InstantAction to develop a minimum-footprint 
> web API prototype plugin for real-time networked games.  The purpose of 
> this work is to propose to this standards process a path for real-time 
> networked client/server and peer-to-peer games and applications to live 
> as first-class citizens in the browser app sandbox.

With the introduction of PeerConnection and WebSockets, it looks like 
we'll soon be there!

On Mon, 3 May 2010, Mark Frohnmayer wrote:
> 
> For the purposes of discussion there seem to be two distinct issues - 
> peer introduction, where two clients establish a direct connection via 
> some trusted third party, and data transmission protocol which could 
> range from raw UDP to higher-level protocols like XML-RPC over HTTP.

I'm not sure how XML-RPC over HTTP would work in this context, but yes.

For peer introduction I've allowed scripts to use whatever existing 
mechanisms they would like to use (e.g. XMLHttpRequest) to communicate 
via a server component. For data transmission, I've used a combination of 
ICE and a custom trivial UDP-based format specified in the spec.

> For real-time games, specific concerns include flow control, protocol 
> overhead and retransmission policy; thus most real-time games implement 
> custom network protocols atop UDP.  Peer introduction is also important 
> - responsiveness can often be improved and hosting bandwidth costs 
> reduced by having peers connect directly.  For other p2p apps (chat, 
> etc), specific control of flow and data retransmission may be less (or 
> not) important, but peer introduction is still relevant.

Agreed.

> In reading of the current state of the spec's p2p section, it appears to 
> be poorly suiting to real-time gaming applications, as well as 
> potentially over-scoped for specific p2p applications.

Could you elaborate on the specific concerns you had and on whether the 
latest changes have addressed them?

> This leads me to wonder about (1) the viability of including peer 
> introduction into WebSocket as an alternative to a high-level peer to 
> peer interface in the spec

As far as I can tell, WebSockets isn't really relevant here. The wire 
format for WebSockets has a very different set of design criteria. The API 
for WebSockets is similar to the API I've used for PeerConnection, but 
that's just a matter of keeping Web APIs consistent and isn't really 
specific to PeerConnection or WebSockets.

> (2) including a lower-level unreliable protocol mode, either as part or 
> distinct from WebSocket,

Done.

> and (3) who, if anyone, is currently driving the p2p section of the 
> spec.

The browser vendors are always the people who drive all these things. 
Without implementations, specs are irrelevant.

On Tue, 4 May 2010, Erik Möller wrote:
> 
> I'm an old gamedev recently turned browserdev so this is of particular 
> interest to me, especially as I'm currently working on WebSockets. 
> WebSockets is a nice step towards multiplayer games in browsers and will 
> be even better once binary frames are speced out but as Mark says 
> (depending on the nature of the game) gamedevs are most likely going to 
> want to make their own UDP based protocol (in client-server models as 
> well). Has there been any discussions on how this would fit under 
> WebSockets?

There has not, as far as I'm aware.

PeerConnection could be used by a server as well, of course.

> Opera Unite can be mentioned as an interesting side note, it does peer 
> introduction as well as subnet peer detection, but again that's TCP 
> only.

I'm not familiar with Opera Unite; does it have anything to teach us here?

On Tue, 4 May 2010, Julien Cayzac wrote:
> 
> I've been reading lately about the new proposed <device> element, and 
> was wondering if it was needed at all.
>
> IMHO, a video originating from an attached camera is not different from 
> a video originating from the network, so <video> could be used here.

<video> doesn't represent a video originating from the network, it 
represents the "sink" (as opposed to "source") which displays the video. 
The video from the network is represented by a URL. URLs aren't the idea 
mechanism to manipulate objects in JS, which is why I went with <device> 
originally, and now getUserMedia().

> Displaying the webcam in a page could be done like this:
> 
> <video autoplay controls>
>     <source src="webcam:640,480,25" /> <!-- 640x480, 25fps -->
>     <source src="webcam:320,240,*" /> <!-- will be tried if the webcam
> doesn't support the above settings -->
>     <source src="mire.mp4" /> <!-- no webcam attached? show this video
> instead -->
> </video>

The problem with this is that it doesn't have very clear security 
properties. How do you get permission to use the webcam: protocol from 
the user? How do you pass such permissions to other origins?

> Same could be done with <audio> for adding microphone support

(Note that <audio> and <video> are the same element except for the minor 
detail of how they are rendered. Both can handle both audio and video.)

> in both cases the browser should notify the user the page is requesting 
> permission to access these devices.

How?

> Now, I am aware HTMLMediaElement doesn't offer any methods to actually 
> query the data it serves or to get notified as more incoming data gets 
> received, which makes my proposal useless. Still, such methods could be 
> used in other scenarios, like a browser-based video editing app, so 
> adding them would make sense in my opinion.

Video editing is going to be an interesting case to deal with, but I think 
a convincing case could be made arguing that while <video> is the display 
component of such a use case, the editing component has to be a separate 
object that is then plugged into <video>. Otherwise, how can you display 
multiple views on the same edited video, for instance?

On Tue, 4 May 2010, Julien Cayzac wrote:
> 
> It was in my message: "in both cases the browser should notify the user 
> the page is requesting permission to access these devices". The same is 
> done today with the geolocation feature, for instance. The user has to 
> give access permission to the page, in a browser-dependent way.

Geolocation works via a JS API, though, not a URL. I don't understand how 
they could be treated in equivalent ways.

On Fri, 21 May 2010, Nicklas Sandgren wrote:
>
> As mentioned in the draft, the peer-to-peer API must rely on underlying 
> protocols/mechanisms to establish the connections and to transport the 
> streams. What are the thoughts regarding these protocols, and has there 
> been any discussion around this topic?

Over the past year there has been remarkably little discussion on this 
topic on this list. From discussing the issue with various people, reading 
comments on lists such as rtc-web, and listening to comments at events 
such as the RTC workshop that Google hosted last year, I came to the 
conclusion that those with an opinion on the matter mostly wanted to see a 
model that was as close to SIP as makes sense, so as to enable gatewaying 
from the Web to legacy (and future) SIP devices. So I ended up going with 
ICE implemented pretty close to what SIP does, with some extension hooks 
for the future if we need them. This allows pretty straight-forward 
gatewaying to SIP while supporting the key use cases for the Web.

> An alternative approach could be to define APIs for managing streams 
> only, and leave session set up as well as additional functionality 
> (file, text, image share) to the application using the means already 
> available such as XMLHttpRequest and WebSocket. The session set up would 
> in this scenario not rely on a third party server, but rather be handled 
> by the server that serves the current web application. This would remove 
> the need for agreeing on formats for client and server configuration 
> strings or protocols to talk to third-party servers.

Relaying through a server is both hugely expensive on the service 
provider, and introduces a high latency that is quite problematic. I think 
it's important that we support direct peer-to-peer video if we can.

> You could also debate how often peer-to-peer media streams will actually 
> work. Aren't FWs and NATs going to give problems in many cases?

ICE + TURN handles most of these cases, thankfully.

> Maybe it would be better to design for a situation where the media 
> always go via a server. Additional benefits are that WS could be used 
> for media transport, and that the media could be transcoded if the codec 
> capabilities of the clients do not match.

Transcoding is even more expensive and introduces even more latency.

On Thu, 27 May 2010, Mark Frohnmayer wrote:
> 
> To answer the question of problem in p2p regarding FWs and NATs, the 
> libjingle folks report that 92% of participants are able to connect 
> directly: 
> http://code.google.com/apis/talk/libjingle/important_concepts.html#connections 
> with the remainder using simple message relay servers.

That's very helpful data, thanks.

On Thu, 27 May 2010, James Salsman wrote:
> 
> Why is relying on TCP for reliable delivery inferior to asking 
> applications to re-implement reliable transmission?

For much of the data discussed here -- media streams and game data in 
particular -- reliable transmission is not a requirement. In fact, low 
latency is far more important, and TCP does not provide for optimally low 
latency compared to solutions based on UDP.

On Thu, 27 May 2010, James Salsman wrote:
> 
> Would it be appropriate to allow selection between reliable delivery 
> involving delay and unreliable delivery with the shorter delay 
> characteristics of UDP by allowing the user to select between the 
> TCP-based asynchronous HTTP/HTTPS multipart/form-encoded POST of input 
> type=file accept=audio as per http://www.w3.org/TR/device-upload and use 
> UDP for synchronous or asynchronous device element I/O?

On Sat, 29 May 2010, Mark Frohnmayer wrote:
> 
> I can see use cases for both methods -- a voice mail, server based 
> application could use a simple form submit upload, but a live voice 
> conferencing app would need real-time access more like in the Capture 
> API that the W3C DAP group has published: 
> http://www.w3.org/TR/capture-api/ .  As they've laid it out, capture of 
> audio/video is decoupled from the network transmission/streaming of the 
> captured data, which makes sense.  The media file data captured could 
> then be sliced into blobs and bounced off a server via WebSocket or sent 
> to peers via the peer to peer API.  Again here it would make sense to me 
> to pattern the p2p API more closely to WebSocket (i.e. send packets of 
> bytes) than a higher level approach that tries to multiplex streams of 
> data.

As the spec stands, both are indeed possible.

On Sun, 30 May 2010, James Salsman wrote:
>
> It's hard for me to take http://www.w3.org/TR/capture-api/#formatdata 
> seriously.  There are no references to open codecs or codec parameters; 
> the only audio codec specified is audio/x-wav, which is a 
> Microsoft-defined union type (RIFF) with a huge number of different 
> possible instance types, including only a few poor quality open vocoders 
> and audio codecs by contemporary performance/bandwidth standards.  
> Where is speex or ogg vorbis?  Where are their quality and bit rate 
> parameters?  Why is http://www.w3.org/TR/capture-api/#future empty when 
> most of the normative sections say, "No exceptions"?  Where is the 
> compatibility with existing file transfer standards?  The security 
> section doesn't contemplate permissions revocation.
> 
> If audio were segmented into separate files as per 
> http://www.w3.org/TR/capture-api/#captureaudiooptions how would that 
> affect real-time performance on mobile devices?  Are these files 
> required to have sequence numbers?  With phase vocoder time shifting, 
> UDP delivery as per http://dev.w3.org/html5/html-device/#stream-api 
> would be far superior in quality and intelligibility under packet loss 
> or delay, assuming they went with an open audio codec (or, even better, 
> allowed a choice of speex or ogg vorbis.)

I fear the PeerConnection stuff doesn't mention phase vocoder time 
shifting, codecs (open or otherwise), or sequence numbers, so it may not 
meet your needs either! If there's anything specific you would like 
defined, please let me know. Bear in mind though that we currently cannot 
realistically mandate a specific modern codec, since user agent vendors 
are still in disagreement regarding which codec to implement.

On Mon, 31 May 2010, Robin Berjon wrote:
> 
> When a specification is fully complete, mature, and stable, we tend to 
> release it.

Do you keep it captive before then? :-)

On Mon, 31 May 2010, Mark Frohnmayer wrote:
> 
> To be clear I'm not advocating for one particular capture API or codec; 
> rather I'm advocating that capture and record not be tied to network 
> transport, and separately that the p2p network transport be flexible, 
> low-level, low-overhead and have a minimal attack surface (suitable for 
> real-time game data as well as audio/video).

Does the new text address this to your satisfaction?

On Tue, 1 Jun 2010, Erik Möller wrote:
>
> The majority of the on-line games of today use a client/server model 
> over UDP and we should try to give game developers the tools they 
> require to create browser based games. For many simpler games a TCP 
> based protocol is exactly what's needed but for most real-time games a 
> UDP based protocol is a requirement. Games typically send small updates 
> to its server at 20-30Hz over UDP and can with the help of entity 
> interpolation and if required entity extrapolation cope well with 
> intermittent packet loss.

Does PeerConnection address this use case to your satisfaction?

Note that currently it does not support binary data, but I've built in an 
extension mechanism to make this easy to add in the future.

On Tue, 1 Jun 2010, John Tamplin wrote:
> 
> But there is so much infrastructure that would have to be enabled to use 
> UDP from a web app.  How would proxies be handled?  Even if specs were 
> written and implementations available, how many years would it be before 
> corporate proxies/firewalls supported WebSocket over UDP?
>
> I am all for finding a way to get datagram communication from a web app, 
> but I think it will take a long time and shouldn't hold up current 
> WebSocket work.

Agreed, the two are independent problems.

On Wed, 2 Jun 2010, Erik Möller wrote:
> 
> No it can't be UDP, it'll have to be something layered on top of UDP. 
> One of the game guys I spoke to last night said "Honestly, I wish we 
> just had real sockets.  It always seems like web coding comes down to 
> reinventing a very old wheel in a far less convenient or efficient 
> manner." To some extent I agree with him, but there's the security 
> aspect we have to take into account or we'll see someone hacking the CNN 
> website and injecting a little javascript and we'll have the DDOS attack 
> of the century on our hands.

For the data UDP media stream in PeerConnection I tried to make it as pure 
UDP as I could, while still being safe and still being extensible. The 
packets are (doubly) obfuscated to prevent cross-protocol attacks, and you 
can only send data to an end-point that negotiated a key via SDP 
offer/answer and participated in ICE to select how the packets are routed, 
but beyond that it's as raw as I could make it. Hopefully it's enough.

> The reason I put down "Socket is bound to one address", "Reliable 
> handshake", "Reliable close handshake" and "Sockets open sequentially" 
> was for that exact reason, to try to make it "DOS and tamper safe". The 
> "Sockets open sequentially" means that if you allocate two sockets to 
> the same server the second socket will wait for the first one to 
> complete its handshake before attempting to connect.

I haven't done this, but since the other server has to participate in the 
ICE processing, and can delay the start of that indefinitely, it seems 
that we're safe here.

On Tue, 1 Jun 2010, Ben Garney wrote:
> 
> To be clear, for games, the key win is the lossy delivery. That is what 
> enables the game to make intelligent decisions about dealing with packet 
> loss, out of order delivery, etc.

Specifically, the key win is low latency. Lossy delivery is just the 
acceptable cost that that implies.

On Tue, 1 Jun 2010, Philip Taylor wrote:
>
> There are lots of features that seem very commonly desired in games: a
> mixture of reliable and unreliable and reliable-but-unordered channels
> (movement updates can be safely dropped but chat messages must never
> be), automatic fragmentation of large messages, automatic aggregation
> of small messages, flow control to avoid overloading the network,
> compression, etc. And there's lots of libraries that build on top of
> UDP to implement protocols halfway towards TCP in order to provide
> those features:
> http://msdn.microsoft.com/en-us/library/bb153248(VS.85).aspx,
> http://opentnl.sourceforge.net/doxydocs/fundamentals.html,
> http://www.jenkinssoftware.com/raknet/manual/introduction.html,
> http://enet.bespin.org/Features.html, etc.

I guess the question is how much of this do we want to build into the 
platform, vs allowing libraries to build on this again (as those above).

> UDP sockets seem like a pretty inadequate solution for the use case of 
> realtime games - everyone would have to write their own higher-level 
> networking libraries (probably poorly and incompatibly) in JS to provide 
> the features that they really want. Browsers would lose the ability to 
> provide much security, e.g. flow control to prevent 
> intentional/accidental DOS attacks on the user's network, since they 
> would be too far removed from the application level to understand what 
> they should buffer or drop or notify the application about.

I've designed the UDP data channel to be extensible so that we can easily 
add this kind of thing in the future if we find that it would be useful in 
many cases, but I haven't added it yet because it seems premature to 
design a whole protocol for this kind of thing without having tested the 
basics first.

On Tue, 1 Jun 2010, Scott Hess wrote:
> 
> Unix domain sockets allow you to pass file descriptors between 
> processes.  It might be interesting to pass a WebSocket endpoint across 
> a WebSocket.  If the clients can punch through NATs, it becomes a direct 
> peer-to-peer connection, otherwise it gets proxied through the server.  
> Probably makes implementations excessively complicated, though.  
> UDP-style would be easier (no need to worry about data received by the 
> server after it initiates pushing the endpoint to the other client - 
> just drop it on the floor).

I don't really know how that would work, but it sounds intruiging. :-) 
Maybe in a future version we can work out a way to transmit a MessagePort 
object (from the MessageChannel feature) over the network...

On Thu, 3 Jun 2010, Erik Möller wrote:
> On Wed, 02 Jun 2010 19:48:05 +0200, Philip Taylor wrote:
> >
> > So they seem to suggest things like:
> > - many games need a combination of reliable and unreliable-ordered and
> > unreliable-unordered messages.
> 
> One thing to remember here is that browsers have other means for 
> communication as well. I'm not saying we shouldn't support reliable 
> messages over UDP, but just pointing out the option. I believe for 
> example World of Warcraft uses this strategy and sends reliable traffic 
> over TCP while movement and other real-time data goes over UDP.

That would indeed make sense.

> > - many games need to send large messages (so the libraries do 
> > automatic fragmentation).
> 
> Again, this is probably because games have no other means of 
> communication than the NW-library. I'd think these large reliable 
> messages would mostly be files that need to be transferred 
> asynchronously for which browsers already have the tried and tested 
> XMLHttpRequest.

Are the large messages always reliable messages?

> > - many games need to efficiently send tiny messages (so the libraries 
> > do automatic aggregation).
> 
> This is probably true for many other use-cases than games, but at least 
> in my experience games typically use a bit-packer or range-coder to 
> build the complete packet that needs to be sent. But again, it's a 
> matter of what level you want to place the interface.

This seems relatively easy to layer on top of the current protocol in the 
spec, but if we find it commonly used we can also add it explicitly as an 
extension.

> > Perhaps also:
> > - Cap or dynamic limit on bandwidth (you don't want a single web page
> > flooding the user's network connection and starving all the TCP
> > connections)

Not really sure what the spec should say about this.

> > - Protection against session hijacking
> 
> Great

The spec uses an encryption mechanism to prevent this.

> > - Protection against an attacker initiating a legitimate socket with a 
> > user and then redirecting it (with some kind of IP (un)hijacking) to a 
> > service behind the user's firewall (which isn't a problem when using 
> > TCP since the service will ignore packets when it hasn't done the TCP 
> > handshake; but UDP services might respond to a single packet from the 
> > middle of a websocket stream, so every single packet will have to be 
> > careful not to be misinterpreted dangerously by unsuspecting 
> > services).

The packets are masked so that you couldn't do anything but DOS attacks in 
this kind of scenario. (And you can do those already with TCP.)

On Fri, 4 Jun 2010, James May wrote:
>
> Couldn't SCTP/DCCP (or a variant) over UDP (for NAT compatibility) work?
> 
> They seem both seem to be session oriented while loosening the other 
> restrictions of TCP,

Wouldn't that be overkill? I guess it depends on what the use cases are 
exactly.

On Thu, 10 Jun 2010, Erik Möller wrote:
>
> As discussed the following features/limitations are suggested: -Same API 
> as WebSockets

I don't see how that would work. I've made them as similar as possible, 
but I don't think it makes sense to go further.

> with the possible addition of an attribute that allows the 
> application developer to find the path MTU of a connected socket.

What's the use case?

> -Max allowed send size is 65,507 bytes.

Currently 65470, to handle the various headers used (see the spec).

> -Socket is bound to one remote address at creation and stays connected 
> to that host for the duration of its lifetime.

I've specced it in such a way that ICE could rebind the connection later; 
is that ok?

> -IP Broadcast/Multicast addresses are not valid remote addresses and 
> only a set range of ports are valid.

I've left this up to the ICE layer.

> -Reliable handshake with origin info (Connection timeout will trigger 
> close event.)

Not sure what the handshake should do here. Could you elaborate?

Also there's currently no origin protection for peer-to-peer stuff (there 
is for the STUN/TURN part; the origin is the long-term credential). We 
could certainly add something; how should it work? What are the attack 
scenarios we should consider?

> -Automatic keep-alives (to detect force close at remote host and keep 
> NAT traversal active)

I've left that up to the ICE layer.

> -Reliable close handshake

This can be done over the signaling layer independent of the UDP channel.

> -Sockets open sequentially (like current DOS protection in WebSockets) 
> or perhaps have a limit of one socket per remote host.
> -Cap on number of open sockets per host and global user-agent limit.

UDP doesn't really have sockets, so I don't really know how to do this.

> Some additional points that were suggested on this list were: -Key 
> exchange and encryption If you do want to have key exchange and 
> encryption you really shouldn't reinvent the wheel but rather use a 
> secure WebSocket connection in addition to the UDP-WebSocket. Adding key 
> exchange and encryption to the UDP-WebSocket is discouraged.

Not really sure what this means.

> -Client puzzles to reduce connection depletion/CPU depletion attacks to 
> the handshake. If the goal is to prevent DOS attacks on the accepting 
> server this seems futile. Client puzzles only raises the bar ever so 
> slightly for an attacker so this is also discouraged.

Could you elaborate on this?

> -Packet delivery notification to be a part of the API. Again this is 
> believed to be better left outside the UDP-WebSockets spec and 
> implemented in javascript if the application developer requires it.

Agreed.

On Thu, 10 Jun 2010, Mark Frohnmayer wrote:
> 
> I'd recommend doing some real-world testing for max packet size.  Back 
> when the original QuakeWorld came out it started by sending a large 
> connect packet (could be ~8K) and a good number of routers would just 
> drop those packets unconditionally.  The solution (iirc) was to keep all 
> packet sends below the Ethernet max of 1500 bytes.  I haven't verified 
> this lately to see if that's still the case, but it seems real-world 
> functionality should be considered.

Indeed. If there is a real-world limit to UDP packet size beyond what the 
specs suggest, then we should definitely adjust the spec accordingly.

> > -Packet delivery notification to be a part of the API.  Again this is 
> > believed to be better left outside the UDP-WebSockets spec and 
> > implemented in javascript if the application developer requires it.
> 
> I'd propose that doing this in the javascript level would result in 
> unnecessary extra overhead (sequence numbering, acknowledgements) that 
> could easily be a part of the underlying protocol.  Having implemented 
> multiple iterations of a high-level networking API, the notification 
> function is a critical, low-overhead tool for making effective 
> higher-level data guarantees possible.

I don't understand why it makes a difference if it's part of the JS or the 
underlying protocol in this case.

On Fri, 11 Jun 2010, Erik Möller wrote:
> > 
> > I'd recommend doing some real-world testing for max packet size.  
> > Back when the original QuakeWorld came out it started by sending a 
> > large connect packet (could be ~8K) and a good number of routers would 
> > just drop those packets unconditionally.  The solution (iirc) was to 
> > keep all packet sends below the Ethernet max of 1500 bytes.  I haven't 
> > verified this lately to see if that's still the case, but it seems 
> > real-world functionality should be considered.
> 
> Absolutely, that's why the path-MTU attribute was suggested. The ~64k 
> limit is an absolute limit though at which sends can be rejected 
> immediately without even trying.

Could you elaborate on this use case?

> > If WebSocket supports an encrypted and unencrypted mode, why would the 
> > real-time version not support data security and integrity?
> 
> The reasoning was that if you do need data security and integrity the secure
> websocket over TCP uses the same state-of-the-art implementation as the
> browsers already have implemented. Secure connections over UDP would either
> require a full TCP over UDP implementation (to use TLS) or a second
> implementation that would need to be maintained. That implementation would be
> either a very complex piece or software or clearly inferior to that users are
> accustomed to.
> So what's a good use-case where you want a secure connection over UDP and
> cannot use a second TLS connection?

Games, if you want to prevent some forms of cheating. I don't necessarily 
agree that we have to do anything as complex as TLS (or DTLS) though. 
Encrypting the data stream gets us a long way there; we can add some 
integrity protection and replay protection reasonably easily too. Since we 
have a (presumed secure) signaling channel, a lot of the complexity of 
(e.g.) DTLS is unnecessary.

On Wed, 9 Jun 2010, Rob Evans wrote:
>
> We currently run a news service that requires users log on to access our 
> data and market reports (practically all major banking institutions in 
> the world use us). I could envisage either a thumb-print reader allowing 
> us to authenticate the user biometrically, or providing each user with a 
> USB thumb stick that contains a unique identifier of some sort that when 
> read using the device element could be used to authenticate them like a 
> sort-of "web-dongle".
> 
> One of our big issues is plagiarism and password sharing amongst our 
> clients. This type of access would allow us to lock down secure content 
> without having to install applications on the client computers which as 
> you can imagine, is a no-no when dealing with banks!

That's an interesting use case, but I'm not sure <device> would make sense 
as a way to address it. Probably best to do this as some kind of <input 
type=password> extension, or reuse the personal device features that some 
browsers have for user certs.

On Mon, 27 Dec 2010, Seth Brown wrote:
>
> I'm currently working on integrating serial port connected hardware with 
> a web application I'm developing. The only solution is for the user to 
> install local adapter software. This defeats the purpose of using a web 
> app over desktop software.
> 
> In order for web applications to be gain traction over desktop software, 
> they must be able to interface with usb/RS232. I believe the security 
> trade off is worth it.

I encourage you to try going through the process for adding features:

   http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F

In particular, I'm not sure there is enough browser vendor interest in 
RS232 access currently to make it viable for the spec to support that 
case. (Much as I would personally love to see it.)

> I also believe that the working group should make the device element
> spec a high priority. If you don't google will probably implement
> their own version for chrome OS(it will be necessary in a browser
> based OS model).

Actually browser extensions of that nature are an important part of the 
process. If Google's Chrome OS team do add such features, that would be 
very helpful in evaluating what the right direction is for the spec.

(I haven't included responses here for the many e-mails discussing various 
ways to expose devices to Web pages involving file:// URLs, ACLS, and 
various other ideas. I strongly recommend following the steps in the wiki 
page above, starting with collecting use cases and following that up with 
getting some browsers to implement some experiments in this space. So 
far, it seems the browser vendors are quite reluctant.)

On Sat, 12 Jun 2010, Bjartur Thorlacius wrote:
>
> What is the use case for <device>? IÂ´ve searched the archives, read a 
> thread, Googled around and read a document on the subject, but still I 
> havenÂ´t found a use case for it other than it is going to magically 
> increase user privacy.
> 
> <input type=file accept="audio/*"> is a standard and compatible way to 
> request audio stream(s). I canÂ´t see any drawbacks to using <input> for 
> audiovisual streams at least.

<input type=file> is something you can submit in a form submission. It has 
to be finite. I don't think that really works for video conferencing 
streams, which is the use case here.

On Wed, 16 Jun 2010, Bjartur Thorlacius wrote:
>
> Are file inputs defined to be more buffered than <device>s? Where?

The <input> spec, by virtue of the way it is defined to return finite 
complete files, is entirely buffered. <device> is gone, but getUserMedia() 
is defined to return an infinite stream, which cannot be entirely 
buffered.

> IMO a streaming capability should rather be added to <form> than adding 
> a brand-new <device> element.

I don't understand how this fits with <form>. It seems more similar to 
Geolocation; we tried adding that to <form> 5 or so years ago and ended up 
giving up and the use case was readdressed using an API.

On Tue, 6 Jul 2010, James Salsman wrote:
> 
> Is there any reason not to protect both them with the same privacy and 
> security authorization dialogs, and render them mostly the same, except 
> for audio/* and video/* <input> you might have record/pause/play/submit 
> while <device> would have record/pause?  For image/* the differences are 
> less clear to me: perhaps <input> would have a viewfinder, (expandable 
> on mobiles) shutter button, a filesystem browser button, and an 
> (optional?) submit button, but an image/* <device> might only have a 
> viewfinder and a shutter button. For the case of a camera, it would seem 
> to me that the buffered approach is also superior, but the unbuffered 
> <device> approach would allow audio and video teleconferencing.

In practice it seems authors would like more control over the UI than 
that. For example, video conferencing systems don't want to show any of 
those buttons, typically.

> Also, someone said it would be a good idea to mute audio input and 
> output from any hidden tab.  I think this may be a reasonable user 
> option (people who listen to podcasts might not like it), and wonder if 
> microphone or other audio input should be muted from any window and tab 
> without top-most focus and exposure.  Does anyone have thoughts on that 
> question?

That seems like a UA UI issue.

On Tue, 6 Jul 2010, Bjartur Thorlacius wrote:
>
> What about <form autosubmit> to request data to be automagically sent as 
> soon as data is input and thus eliminate the buffering problem?

It's not clear to me how that would work. In a video conferencing 
environment, you really want to have the video stream be sent to the peer, 
not to a server.

On Wed, 15 Sep 2010, Nicklas Sandgren wrote:
> 
> A typical video chat application would contain some view finder code 
> similar to the example in the Working Draft document:
> 
> <p>To start chatting, select a video camera: <device type=media onchange="update(this.data)"></p>
> <video autoplay></video>
> <script>
> function update(stream) {
>    document.getElementsByTagName('video')[0].src = stream.url;  }
> </script>
> 
> But assuming that the Stream is a combination of both audio and video media this is
> actually not what you want in a video chat, because you will also play back your own 
> audio to yourself.
> 
> To solve this in our implementation we defined two fragments, "audio" and "video", 
> for the Stream url. The application then can address a specific media component 
> in the Stream like this:
> 
> document.getElementsByTagName('video')[0].src = stream.url + "#video";
> 
> Is there some other way to solve this?

Just mute the <video> element.

On Thu, 16 Sep 2010, Jonathan Dixon wrote:
>
> On a related note, another requirement for a view finder window in a 
> chat application would be to horizontally flip the self view video 
> stream, so the user sees a mirror image. I'm not sure where this might 
> fit in the proposed APIs.

That's a rendering issue, not an API issue (style="transform:scaleX(-1)" 
should do it once that spec is more widely implemented).

> Further, it could be useful to provide a way to query the video source 
> as to whether the camera is oriented relative to the screen (if the 
> underlying system knows; consider a phone device with both a main camera 
> and self-view camera). This is needed to drive the decision on whether 
> to do this horizontal flip or not. In fact, such an application may want 
> to somehow indicate a preference for the self-view camera when multiple 
> cameras are present in the selection list. c.f. a movie-making app which 
> would prefer the outward facing camera.

Interesting.

In getUserMedia() the input is extensible; we could definitely add 
"prefer-user-view" or "prefer-environment-view" flags to the method (with 
better names, hopefully, but consider that 'rear' and 'front' are 
misleading terms -- the front camera on a DSLR faces outward from the 
user, the front camera on a mobile phone faces toward the user). The user 
still has to OK the use of the device, though, so maybe it should just be 
left up to the user to pick the camera? They'll need to be able to switch 
it on the fly, too, which again argues to make this a UA feature.

Similarly for exposing the kind of stream: we could add to GeneratedStream 
an attribute that reports this kind of thing. What is the most useful way 
of exposing this information?

On Wed, 22 Sep 2010, Rich Tibbett wrote:
>
> Would it be possible to provide JS-based method to capture an individual 
> frame from a <video> element?

This is possible via <canvas>.

On Tue, 23 Nov 2010, Anne van Kesteren wrote:
> On Fri, 19 Nov 2010 19:50:42 +0100, Per-Erik Brodin 
> <per-erik.brodin at ericsson.com> wrote:
> > We are about to start implementing stream.record() and StreamRecorder. 
> > The spec currently says that â€œthe file must be in a format supported 
> > by the user agent for use in audio and video elementsâ€ which is a 
> > reasonable restriction. However, there is currently no way to set the 
> > output format of the resulting File that you get from recorder.stop(). 
> > It is unlikely that specifying a default format would be sufficient if 
> > you in addition to container formats and codecs consider resolution, 
> > color depth, frame rate etc. for video and sample size and rate, 
> > number of channels etc. for audio.
> > 
> > Perhaps an argument should be added to record() that specifies the 
> > output format from StreamRecorder as a MIME type with parameters? 
> > Since record() should probably throw when an unsupported type is 
> > supplied, it would perhaps be useful to have a canRecordType() or 
> > similar to be able to test for supported formats.
> 
> But if we want interoperability for streams, also taking into account 
> P2P messaging, we need a single format. Otherwise users with different 
> browsers could end up not being able to communicate.

We are indeed going to eventually need a single format. That's mostly why 
I haven't added anything regarding formats yet. (Well, we need at least 
one common format per browser pair, for every browser pair. If we get that 
instead of a single format, we would need to add something for sure.)

On Sat, 27 Nov 2010, Kevin Marks wrote:
>
> For Audio at least, supporting uncompressed should be possible and 
> uncontroversial, as there are clearly no patent issues here. Anyone 
> serious about recording and processing audio would not consider 
> recording compressed audio nowadays.

If you want to record audio, as opposed to streaming it, your best plan is 
probably the <input type=file accept=audio/wave> solution. Anything based 
on the streaming solution runs the very real risk of lost packets, at 
which point really how much compression you have is likely not that relevant.

However, we will probably end up supporting uncompressed recording using 
the getUserMedia() mechanism also, 

On Fri, 26 Nov 2010, Per-Erik Brodin wrote:
>
> A Stream can be treated as an abstract representation of a media stream. 
> When a Stream is to be transported over a peer-to-peer connection, the 
> format can be negotiated between the peers. In the current 
> ConnectionPeer API, such format negotiation would be transparent to the 
> API. If we would specify a single resolution for video, for example, 
> that resolution may be to high for some mobile devices to encode in 
> real-time. A mismatch in supported formats is just one reason why a 
> peer-to-peer transport may fail, but that doesn't mean that the peers 
> can't communicate. When relaying through a server you can interoperate 
> with anything.

Indeed. As specced, the PeerConnection API uses SDP Offer/Answer to 
negotiate this kind of thing.

> If you are referring to sendFile(file) on ConnectionPeer, the file may 
> just as well come from the user's hard drive via <input type=file> and 
> thus it will be up to the application to ensure that whatever is sent to 
> the other peer is usable there.

Indeed. Incidentally, I dropped that feature for now, since you can just 
relay a file through the server, as latency in that case is far less 
important than reliability.

On Wed, 1 Dec 2010, Saurabh Jain wrote:
> 
> We need access to Bluetooth devices using the Device element. Without 
> Bluetooth access some of the use cases, specially in the mobile device 
> domain would not be achievable.

On Thu, 2 Dec 2010, Diogo Resende wrote:
>
> What about having the possibility to "use" a device other than a video? 
> Maybe a specific hardware. I agree about not having a distinction on the 
> hardware stack being used, but there should be a way for an app to be 
> able to access an USBx/BT/FW device.

What is the use case you had in mind, specifically?

On Fri, 3 Dec 2010, Silvia Pfeiffer wrote:
>
> IMO it's not so much about how the device is connected, but rather
> what the device is: e.g. if it's a storage device then it should come
> up as a storage device and not as a USB or FW device - the latter
> doesn't really say what its use is.
> 
> It would be more interesting to hear more about what uses we are
> seeing for the <device> element about what external devices it should
> support than about what types of ports external devices should be able
> to be hooked up through.
> 
> I can, e.g. think of input devices such as microphone, scanner,
> camera, and output devices such as headphones/speakers (no need to
> distinguish, probably), external displays, storage.
> 
> There probably are two dimensions to think about: is it an
> input/output device, and what type of data does it provide/take.
> Possibly also what format that data comes in/is given in.

On Thu, 2 Dec 2010, Anne van Kesteren wrote:
> 
> That is only interesting for devices that are commonly used. For the 
> long tail you need some kind of open-ended API.

As with the serial port suggestion earlier, I recommend experimenting with 
this in browsers. We need implementation experience before we can really 
say what the right way to do this is (and whether to do it at all).

On Thu, 2 Dec 2010, Diogo Resende wrote:
> 
> For example, a medical device may have no interest to the OS but a web 
> app (just like a desktop app) could get some interesting information and 
> perhaps send some instructions. Maybe some API like the geolocation..

The Geolocation API is very geolocation-specific; it seems unlikely 
browsers would want to make a "medical device API". How do we expose 
devices to the Web otherwise?

On Mon, 20 Dec 2010, Stephen Bannasch wrote:
> 
> But I think I can make a powerful case that being able to create 
> web-applications that can integrate easily with I/O devices that extend 
> your senses is a wonderful area for innovation.

Agreed, if we can do this it would be great. It's not clear how to do it, 
though.

On Mon, 24 Jan 2011, Anne van Kesteren wrote:
> 
> There is a plan of allowing direct assigning to IDL attributes besides 
> creating URLs.
> 
> I.e. being able to do:
> 
>  audio.src = blob
> 
> (The src content attribute would then be something like "about:objecturl".)
> 
> I am not sure if that API should work differently from creating URLs and 
> assigning those, but we could consider it.

Could you elaborate on this plan?

On Tue, 25 Jan 2011, David Flanagan wrote:
> 
> Adam's use case--to be able to download, play and cache audio data at 
> the same time--seems like a pretty compelling one.  I think this is 
> fundamentally an issue with the Blob API, not the URL API.  Blobs just 
> seem like they ought to stream.  When you get a blob in the onprogress 
> handler of an XHR2, you ought to be able to fire up a FileReader on it 
> and have it automatically read from the blob as the XHR2 writes to the 
> blob.  But currently (I think) you have to slice the blob to get only 
> the new bytes and start a new FileReader each time onprogress is called.  
> (Or wait for onload, of course.)

A Blob has a fixed size determined when the Blob is created. However, one 
could use Stream the way you describe, if we adjust XHR2 to expose a 
Stream as well.

On Wed, 26 Jan 2011, Patrik Persson J wrote:
>
> We have done some experimentation with the ConnectionPeer API. We have 
> an initial implementation of a subset of the API, using ICE (RFC 5245) 
> for the peer-to-peer handshaking.  Our implementation is 
> WebKit/GTK+/gstreamer-based, and we of course intend to submit it to 
> WebKit, but the implementation is not quite ready for that yet.
> 
> More information about our work so far can be found here: 
> https://labs.ericsson.com/developer-community/blog/beyond-html5-peer-peer-conversational-video

Great!

> However, we have bumped into some details that we'd like to discuss
> here right away.  The following is our mix of proposals and questions.
> 
> 1. We propose adding a readyState attribute, to decouple the
>    onconnect() callback from any observers (such as the UI).
> 
>       const unsigned short CONNECTING = 0;
>       const unsigned short CONNECTED = 1;
>       const unsigned short CLOSED = 2;
>       readonly attribute unsigned short readyState;

Done, though with not quite the states you describe.

> 2. We propose replacing the onstream event with custom events of type
>    RemoteStreamEvent, to distinguish between adding and removing
>    streams.
> 
>       attribute Function onstreamadded;   // RemoteStreamEvent
>       attribute Function onstreamremoved; // RemoteStreamEvent
>       ...
>       interface RemoteStreamEvent : Event {
>          readonly attribute Stream stream;
>       };
> 
>    The 'stream' attribute indicates which stream was added/removed.

Done.

> 3. We propose renaming addRemoteConfiguration to
>    setRemoteConfiguration.  Our understanding of the ConnectionPeer is
>    that it provides a single-point-to-single-point connection; hence,
>    only one remote peer configuration is to be set, rather than many
>    to be added.
> 
>       void setRemoteConfiguration(in DOMString configuration, in optional DOMString remoteOrigin);

ICE can send many messages over time, so it seems to make sense to support 
a more general signaling channel mechanic than either what the spec used 
to have or what you propose above. I've tried to make the spec more 
closely align with what ICE needs here.

> 4. We propose swapping the ConnectionPeerConfigurationCallback
>    callback parameters. The current example seems to use only one (the
>    second one).  Swapping them allows clients that care about 'server'
>    to do so, and clients that ignore it (such as the current example)
>    to do so too.
> 
>       [Callback=FunctionOnly, NoInterfaceObject]
>       interface ConnectionPeerConfigurationCallback {
>          void handleEvent(in DOMString configuration, in ConnectionPeer server);
>       };

Done.

> 5. Should a size limit to text messages be specified? Text messages
>    with UDP-like behavior (unimportant=true) can't really be reliably
>    split into several UDP packets.  For such long chunks of data, file
>    transfer seems like a better option anyway.

I've added a limit based on theoretical UDP limits, but I expect we will 
be bringing this down to match reality soon.

> In addition to the above there is a need to add support for identifying 
> streams (so that the receiving end can use the right element for 
> rendering)

Done.

> and for influencing the media format.

I have punted on this for now, but I expect we'll have to revisit this.

On Thu, 27 Jan 2011, Adam Malcontenti-Wilson wrote:
>
> I was noticing how you were suggesting to change addRemoteConfiguration 
> to setRemoteConfiguration as it appears as a 
> single-point-to-single-point connection, is this part of the current 
> specification or could single-point-to-multiple-points connections (or 
> "clouds") be implemented using the same API in the future? This would be 
> a big bandwidth saver for users in "group chats" that would make some 
> sense to use add rather than set (and perhaps have another optional 
> parameter to replace rather than append or add).

Could you elaborate on your use case and possibly how you expect it to 
look like on the wire?

On Mon, 31 Jan 2011, Stefan Håkansson LK wrote this use case:
>
> A simple video chat service has been developed. In the service the users 
> are logged on to the same chat web server. The web server publishes 
> information about user login status, pushing updates to the web apps in 
> the browsers. By clicking on an online peer user name, a 1-1 video chat 
> session between the two browsers is initiated. The invited peer is 
> presented with a choice of joining or rejecting the session.
> 
> The web author developing the application has decided to display a 
> self-view as well as the video from the remote side in rather small 
> windows, but the user can change the display size during the session. 
> The application also supports if a participant (for a longer or shorter 
> time) would like to stop sending audio (but keep video) or video (keep 
> audio) to the other peer ("mute").
>
> Any of the two participants can at any time end the chat by clicking a 
> button.
> 
> In this specific case two users are using lap-tops in their respective 
> homes. They are connected to the public Internet with a desktop browser 
> using WiFi behind NATs. One of the users has an ADSL connection to the 
> home, and the other fiber access. Most of the time headsets are used, 
> but not always.

All of this except selectively muting audio vs video is currently 
possible in the proposed API.

The simplest way to make selective muting possible too would be to change 
how the pause/resume thing works in GeneratedStream, so that instead of 
pause() and resume(), we have individual controls for audio and video. 
Something like:

   void muteAudio();
   void resumeAudio();
   readonly attribute boolean audioMuted;
   void muteVideo();
   void resumeViduo();
   readonly attribute boolean videoMuted;

Alternatively, we could just have mutable attributes:

   attribute boolean audioEnabled;
   attribute boolean videoEnabled;

Any opinions on this?

> !Requirement. The user must:            !Comment                               !
> --------------------------------------------------------------------------------
> !give explicit consent before a device  !                                      !
> !can be used to capture audio or video  !                                      !
> --------------------------------------------------------------------------------
> !be able to in an intuitive way revoke  !                                      !
> !and change capturing permissions       !                                      !
> --------------------------------------------------------------------------------
> !be able to easily understand that audio!                                      !
> !or video is being captured             !                                      !
> --------------------------------------------------------------------------------
> !be informed about that an invitation to!                                      !
> !a peer video chat session has been     !                                      !
> !received                               !                                      !
> --------------------------------------------------------------------------------
> !be able to accept or reject an         !                                      !
> !invitation to a peer video chat session!                                      !
> --------------------------------------------------------------------------------
> !be able to stop a media stream from    !                                      !
> !being transmitted                      !                                      !
> --------------------------------------------------------------------------------

All of this seems possible in the current API.

> !It must be possible to update presence !Event. Out of scope for RTC-Web?      !
> !info from web server and make web      !                                      !
> !application aware.                     !                                      !
> --------------------------------------------------------------------------------
> !It must be possible to propagate       !Out of scope for RTC-Web?             !
> !intention to start a chat session from !                                      !
> !one web app (via server), and make     !                                      !
> !receiving web application aware.       !                                      !
> !Likewise, the receiving web application!                                      !
> !must be able to propagate its accept/  !                                      !
> !reject to the initiating web app.      !                                      !

These seem out of scope for HTML, and more something the app would 
implement.

> !The web application must be able to use!Provided the user has given consent.  !
> !cams and mics as input devices.        !                                      !
> --------------------------------------------------------------------------------
> !The web application must be able to    !I.e. how they are routed. To e.g. both!
> !control how streams generated by input !a self-view and a peer                !
> !devices are used                       !                                      !
> --------------------------------------------------------------------------------
> !The web application must be able to    !Use the audio and video elements?     !
> !control how streams are rendered and   !                                      !
> !displayed                              !                                      !
> --------------------------------------------------------------------------------
> !The web application must be able to    !                                      !
> !initiate sending of streams to a peer  !                                      !

These are all supported.

> !The web application must be able to    !If the video is going to be displayed !
> !define the media format to be used for !in a large window, use higher bit-    !
> !the streams sent to a peer.            !rate/resolution. Should media settings!
> !                                       !be allowed to be changed during a     !
> !                                       !session (at e.g. window resize)?      !

Shouldn't this be automatic and renegotiated dynamically via SDP 
offer/answer?

> !The web application must be made aware !Event.                                !
> !of whether set up of stream sending was!                                      !
> !successful or not                      !                                      !
> --------------------------------------------------------------------------------
> !The web application must be made aware !Event. To be able to (with or without !
> !when a stream from a peer is received  !user involvement) accept or reject,   !
> !                                       !and to connect the stream to the right!
> !                                       !display/rendering element.            !
> --------------------------------------------------------------------------------
> !The web application must be made aware !Event.                                !
> !of when a stream from a peer is not    !                                      !
> !received any more                      !                                      !
> --------------------------------------------------------------------------------
> !The web application in a session must  !                                      !
> !be able to terminate all incoming and  !                                      !
> !outgoing streams                       !                                      !

All possible.

> !The browser must be able to have an    !Out of scope for RTC-Web? Use WS or   !
> !always on connection with the web      !S-SE?                                 !
> !server to be able to receive presence  !                                      !
> !updates and chat initiations           !                                      !

That's just "AJAX".

> !The browser must be able to use mics   !                                      !
> !and cams as input devices              !                                      !
> --------------------------------------------------------------------------------
> !The browser must be able to send       !                                      !
> !streams (includes the associated       !                                      !
> !processing like coding, framing, etc.) !                                      !
> !to a peer in presence of NATs.         !                                      !
> --------------------------------------------------------------------------------
> !The browser must be able to receive    !                                      !
> !streams (associated processing) from   !                                      !
> !peers and render them                  !                                      !

That's all possible with the API.

> !Streams being transmitted must be      !Do not starve other traffic (e.g. on  !
> !subject to rate control                !ADSL link)                            !

Not sure whether this requires any thing special. Could you elaborate?

> !When there is both incoming and        !Headsets not always used              !
> !outgoing audio streams, echo           !                                      !
> !cancellation must be provided to avoid !                                      !
> !disturbing echo during conversation    !                                      !

That seems like a UA quality-of-implementation issue. I wouldn't want to 
require Web apps to have to implement this!

> !Synchronization between audio and video!                                      !
> !must be supported                      !                                      !

If there's one stream, that's automatic, no?

> !The user must be informed that the     !                                      !
> !communication has ceased               !                                      !

Both the UA and the Web app have the option to do this.

> !The web application must be made aware !To be able to inform user and take    !
> !of that the connection with the server !action. Out of scope for RTC-Web?     !
> !has been dropped                       !                                      !
> --------------------------------------------------------------------------------
> !The web application must be made aware !To be able to inform user and take    !
> !of when streams from a peer are no     !action (one of the peers still has    !
> !longer received                        !connection with the server)           !
> --------------------------------------------------------------------------------
> !The browser must detect when no streams!                                      !
> !are received from a peer               !                                      !

These aren't really yet supported in the API, but I intend for us to add 
this kind of thing at the same time sa we add similar metrics to <video> 
and <audio>. To do this, though, it would really help to have a better 
idea what the requirements are. What information should be available? 
"Packets received per second" (and "sent", maybe) seems like an obvious 
one, but what other information can we collect?

On Wed, 2 Feb 2011, Tab Atkins Jr. wrote:
>
> The file input gained the @accept attribute a little while ago, to 
> indicate what type of file should be accepted.  It has three special 
> values, "image/*", "video/*", and "audio/*".
> 
> I believe one intent of these special values is that browsers may offer 
> the user the ability to capture an image/video/audio with the webcam/mic 
> and automatically set it as the value of the <input>, without the user 
> having to create an intermediary file themselves.
> 
> The spec doesn't give any indication of this, though, and I've surprised 
> some people (browser devs, internally) when I tell them about @accept 
> after they ask me about access the webcam/mic.
> 
> Could we get a note added to the File Input section describing this 
> intention?

Done.

On Tue, 8 Feb 2011, Rich Tibbett wrote:
>
> [1] http://www.w3.org/TR/capture-api/

The parameter part of this seems unnecessary. Why would we not just always 
offer a camera?

The API part of this seems reasonable, but should probably be merged with 
the File API spec. Having lots of small specs makes the platform feel very 
fragmented and makes it much harder for us to update things consistently.

On Tue, 15 Feb 2011, Leandro Graciá Gil wrote:
> 
> Looking at the current state of the specification I see there is no 
> mention about the expected lifetime of the stream objects, or to say it 
> in another way, the period in which a page can access the selected 
> device data. We would like to propose that the user can explicitly 
> invalidate an existing stream so that any further access would require a 
> new confirmation by the user.

The spec is hopefully clear that this is indeed allowed now.

On Wed, 16 Feb 2011, Anne van Kesteren wrote:
> 
> This is just a thought. Instead of acquiring a Stream object 
> asynchronously there always is one available showing transparent black 
> or some such. E.g. navigator.cameraStream. It also inherits from 
> EventTarget. Then on the Stream object you have methods to request 
> camera access which triggers some asynchronous UI. Once granted an 
> appropriately named event is dispatched on Stream indicating you now 
> have access to an actual stream. When the user decides it is enough and 
> turns of the camera (or something else happens) some other appropriately 
> named event is dispatched on Stream again turning it transparent black 
> again.

This is a very interesting idea.

On Wed, 16 Feb 2011, Andrei Popescu wrote:
> 
> I thought we were all trying to avoid asynchronous UI (dialogs, 
> infobars, popups, etc), which is a solution that does not scale very 
> well when many different APIs require it. This was one of the main 
> reasons for trying a different approach.

Whatever we do with this we'll have some sort of async UI, I think.

On Wed, 23 Feb 2011, John Knottenbelt wrote:
> 
> I agree that clicking on the <device> element to bring up an async 
> authorisation request works well because it corresponds strongly to the 
> user's will to start and stop access to the device. However, I think 
> that we should not be trying to save the user a click, because that 
> would risk bothering the user with a dialog before they have made their 
> intention to grant access to the device explicit. The principal 
> application of <device> is to grant access to webcam devices which, I 
> think, is sensitive enough to warrant the user having to initiate the 
> authorisation process.

While I think this is in principle true (and is the original design behind 
<device>), in practice I think it makes UIs feel rather unnatural. Video 
conferencing systems today don't require the user to explicitly select the 
camera each time, for instance. Instead, the model of prompting the user 
with a non-modal bubble prompt (a la the Firefox 4 Geolocation prompt) 
allows the user to permanently grant a trusted site permission, while 
still allowing the user to ignore an annoying site's requests without 
having to click anywhere to dismiss the request.

(A number of people wrote e-mails with proposals but without providing 
rationales or use cases for particular design decisions. I examined the 
proposals, but have not included them here since I could not determine 
which parts of these proposals were intended to be substantial and which 
were intended to be merely supporting infrastructure. If there are 
specific ideas that were proposed that I have not addressed, please do 
reraise them, pointing out the relevant parts.)

On Mon, 28 Feb 2011, Harald Alvestrand wrote:
> 
> I would very much want to avoid having the "record to file/buffer" be a 
> fundamental part of the microphone abstraction, since it's irrelevant to 
> my application (if anything should be recorded, it's the conversation, 
> not the output from the microphone), so I think we should try to find a 
> model where a microphone is an object that provides a data stream, and 
> that data stream can be connected to a different object that acts as a 
> recorder; if I don't need recording, I should not have to instantiate a 
> recorder.

I have used this model in the spec (separating the recorder from the 
stream).

On Mon, 14 Mar 2011, Lachlan Hunt wrote:
> 
> The IDL for GeneratedStream should indicate that it inherits from the 
> Stream interface.

Thanks, fixed.

> The API includes both readystatechange event, as well as independent 
> events for play, paused and ended.  This redundancy is unnecessary. This 
> is also inconsistent with the design of the HTMLMediaElement API, which 
> does not include a readystatechange event in favour on separate events 
> only.

I've dropped readystatechange.

I expect to drop play and pause events if we move to the model described 
above that pauses and resumes audio and video separately.

> The API does not provide any way for the author to indicate a preferred 
> camera.  Some devices include multiple cameras, particularly phones with 
> front and rear facing cameras, and some use cases are better suited to 
> one or the other.
>
> For example, an augmented reality application that takes advantage of 
> geolocation and device orientation in order to overlay the video stream 
> with things (e.g. push pins to identify points of interest, like on 
> Google Street View). Such an application would be better suited for 
> working with the rear facing camera.  But with a video conferencing 
> application, the front facing camera would be more suitable.
> 
> It might therefore be useful for the author to provide a hint to this 
> effect, which would allow the UA to more intelligently select the 
> default camera for the user.

I've commented on this earlier in this mail. I would very much like to 
hear whether the proposal above is satisfactory for this use case.

> There are some use cases for which it would be useful to know the 
> precise orientation of the camera, such as augmented reality 
> applications.  The camera orientation may be independent of the device's 
> orientation, and so the existing device orientation API may not be 
> sufficient.
> 
> In the simple case, the front and rear cameras face in opposite 
> directions, and so if an augmented reality application was built 
> assuming it was using the rear camera, but the user instead granted 
> access to the front camera, the calculations would be 180Ëš out.
> 
> Some devices may also provide cameras that may allow the camera to be 
> rotated independently of the device itself, and so camera orientation 
> information would need to allow for this.  I don't know the best way to 
> provide this, and there likely to be issues about whether the camera 
> orientation should be relative to the device itself, or relative to 
> fixed Earth coordinates (North, East, Up), like the existing device 
> orientation API.

It seems like the best way to extend this would be to have the Device 
Orientation API apply to GeneratedStream objects, either by just having 
the events also fire on GeneratedStream objects, or by having the API be 
based on a pull model rather than a push model and exposing an object on 
GeneratedStream objects as well as Window objects.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'