[whatwg] Speech input element

Wed May 19 02:11:58 PDT 2010

Has anyone spent any time imagining what a microphone/video-camera API that
supports the video conference use case might look like?  If so, it'd be
great to see a link.

My guess is that it's going to be much more complicated and much more
invasive security wise.  Looking at Bjorn's proposal, it seems as though it
fairly elegantly supports the use cases while avoiding the need for explicit
permission requests (i.e. infobars, modal dialogs, etc) since permission is
implicitly granted every time it's used and permission is revoked when, for
example, the window loses focus.

I'd be very excited if a WG took a serious look at
microphone/video-camera/etc, but I suspect that speech to text is enough of
a special case (in terms of how it's often implemented in hardware and in
terms of security) that it won't be possible to fold into a more general
microphone/video-camera/etc API without losing ease of use, which is pretty
central the use cases listed in Bjorn's doc.

J

On Wed, May 19, 2010 at 9:30 AM, Anne van Kesteren <annevk at opera.com> wrote:

> On Wed, 19 May 2010 10:22:54 +0200, Satish Sampath <satish at google.com>
> wrote:
>
>> I don't really see how the problem is the same as with synchronous
>>> XMLHttpRequest. When you do a synchronous request nothing happens to the
>>> event loop so an alert() dialog could never happen. I think you want
>>> recording to continue though. Having a simple dialog stop video
>>> conferencing
>>> for instance would be annoying. It's only script execution that needs to
>>> be paused. I'm also not sure if I'd really want recording to stop while
>>> looking at a page in a different tab. Again, if I'm in a conference call I'm
>>> almost always doing tasks on the side. E.g. looking up past discussions,
>>> scrolling through a document we're discussing, etc.
>>>
>>
>> Can you clarify how the speech input element (as described in the current
>> API sketch) is related to video conferencing or a conference call, since
>> it doesn't really stream audio to any place other than potentially a speech
>> recognition server and feeds the result back to the element?
>>
>
> Well, as indicated in the other thread I'm not sure whether this is the
> best way to do it. Usually we start with a lower-level API (i.e. microphone
> input) and build up from there. But maybe I'm wrong and speech input is a
> case that needs to be considered separately. It would still not be like
> synchronous XMLHttpRequest though.
>
>
>
> --
> Anne van Kesteren
> http://annevankesteren.nl/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100519/1eeed2c7/attachment-0002.htm>