[whatwg] Web API for speech recognition and synthesis
workmad3 at gmail.com
Thu Dec 3 05:16:25 PST 2009
I agree. The application should be able to choose a source for speech
commands, or give the user a choice of options for a speech source. It also
provides a much better separation of APIs, allowing the development of a
speech API that doesn't depend on or interfere in any way with the
development of a microphone/audio input device API.
2009/12/3 Diogo Resende <dresende at thinkdigital.pt>
> I agree 100%. Still, I think the access to the mic and the speech
> recognition could be separated.
> Diogo Resende <dresende at thinkdigital.pt>
> On Thu, 2009-12-03 at 12:06 +0000, Bjorn Bringert wrote:
> > On Wed, Dec 2, 2009 at 10:20 PM, Jonas Sicking <jonas at sicking.cc> wrote:
> > > On Wed, Dec 2, 2009 at 11:17 AM, Bjorn Bringert <bringert at google.com>
> > >> I agree that being able to capture and upload audio to a server would
> > >> be useful for a lot of applications, and it could be used to do speech
> > >> recognition. However, for a web app developer who just wants to
> > >> develop an application that uses speech input and/or output, it
> > >> doesn't seem very convenient, since it requires server-side
> > >> infrastructure that is very costly to develop and run. A
> > >> speech-specific API in the browser gives browser implementors the
> > >> option to use on-device speech services provided by the OS, or
> > >> server-side speech synthesis/recognition.
> > >
> > > Again, it would help a lot of you could provide use cases and
> > > requirements. This helps both with designing an API, as well as
> > > evaluating if the use cases are common enough that a dedicated API is
> > > the best solution.
> > >
> > > / Jonas
> > I'm mostly thinking about speech web apps for mobile devices. I think
> > that's where speech makes most sense as an input and output method,
> > because of the poor keyboards, small screens, and frequent hands/eyes
> > busy situations (e.g. while driving). Accessibility is the other big
> > reason for using speech.
> > Some ideas for use cases:
> > - Search by speaking a query
> > - Speech-to-speech translation
> > - Voice Dialing (could open a tel: URI to actually make the call)
> > - Dialog systems (e.g. the canonical pizza ordering system)
> > Chrome extensions) for using speech with any web site, e.g, for
> > accessibility.
> > Requirements:
> > - Web app developer side:
> > - Allows both speech recognition and synthesis.
> > - Easy to use API. Makes simple things easy and advanced things
> > - Doesn't require web app developer to develop / run his own speech
> > recognition / synthesis servers.
> > - (Natural) language-neutral API.
> > - Allows developer-defined application specific grammars / language
> > - Allows multilingual applications.
> > - Allows easy localization of speech apps.
> > - Implementor side:
> > - Easy enough to implement that it can get wide adoption in browsers.
> > - Allows implementor to use either client-side or server-side
> > recognition and synthesis.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the whatwg