I agree. The application should be able to choose a source for speech

commands, or give the user a choice of options for a speech source. It

also provides a much better separation of APIs, allowing the

development of a speech API that doesn't depend on or interfere in any

way with the development of a microphone/audio input device API.<br><br><div class="gmail_quote">2009/12/3 Diogo Resende <span dir="ltr"><<a href="mailto:dresende@thinkdigital.pt">dresende@thinkdigital.pt</a>></span><br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I agree 100%. Still, I think the access to the mic and the speech<br>

recognition could be separated.<br>

<br>

--<br>

<div class="im">Diogo Resende <<a href="mailto:dresende@thinkdigital.pt">dresende@thinkdigital.pt</a>><br>

</div>ThinkDigital<br>

<div><div></div><div class="h5"><br>

On Thu, 2009-12-03 at 12:06 +0000, Bjorn Bringert wrote:<br>

> On Wed, Dec 2, 2009 at 10:20 PM, Jonas Sicking <jonas@sicking.cc> wrote:<br>

> > On Wed, Dec 2, 2009 at 11:17 AM, Bjorn Bringert <<a href="mailto:bringert@google.com">bringert@google.com</a>> wrote:<br>

> >> I agree that being able to capture and upload audio to a server would<br>

> >> be useful for a lot of applications, and it could be used to do speech<br>

> >> recognition. However, for a web app developer who just wants to<br>

> >> develop an application that uses speech input and/or output, it<br>

> >> doesn't seem very convenient, since it requires server-side<br>

> >> infrastructure that is very costly to develop and run. A<br>

> >> speech-specific API in the browser gives browser implementors the<br>

> >> option to use on-device speech services provided by the OS, or<br>

> >> server-side speech synthesis/recognition.<br>

> ><br>

> > Again, it would help a lot of you could provide use cases and<br>

> > requirements. This helps both with designing an API, as well as<br>

> > evaluating if the use cases are common enough that a dedicated API is<br>

> > the best solution.<br>

> ><br>

> > / Jonas<br>

><br>

> I'm mostly thinking about speech web apps for mobile devices. I think<br>

> that's where speech makes most sense as an input and output method,<br>

> because of the poor keyboards, small screens, and frequent hands/eyes<br>

> busy situations (e.g. while driving). Accessibility is the other big<br>

> reason for using speech.<br>

><br>

> Some ideas for use cases:<br>

><br>

> - Search by speaking a query<br>

> - Speech-to-speech translation<br>

> - Voice Dialing (could open a tel: URI to actually make the call)<br>

> - Dialog systems (e.g. the canonical pizza ordering system)<br>

> - Lightweight JavaScript browser extensions (e.g. Greasemonkey /<br>

> Chrome extensions) for using speech with any web site, e.g, for<br>

> accessibility.<br>

><br>

> Requirements:<br>

><br>

> - Web app developer side:<br>

>    - Allows both speech recognition and synthesis.<br>

>    - Easy to use API. Makes simple things easy and advanced things possible.<br>

>    - Doesn't require web app developer to develop / run his own speech<br>

> recognition / synthesis servers.<br>

>    - (Natural) language-neutral API.<br>

>    - Allows developer-defined application specific grammars / language models.<br>

>    - Allows multilingual applications.<br>

>    - Allows easy localization of speech apps.<br>

><br>

> - Implementor side:<br>

>    - Easy enough to implement that it can get wide adoption in browsers.<br>

>    - Allows implementor to use either client-side or server-side<br>

> recognition and synthesis.<br>

><br>

</div></div></blockquote></div><br>