[whatwg] Web API for speech recognition and synthesis
Bjorn Bringert
bringert at google.com
Thu Dec 3 04:06:05 PST 2009
On Wed, Dec 2, 2009 at 10:20 PM, Jonas Sicking <jonas at sicking.cc> wrote:
> On Wed, Dec 2, 2009 at 11:17 AM, Bjorn Bringert <bringert at google.com> wrote:
>> I agree that being able to capture and upload audio to a server would
>> be useful for a lot of applications, and it could be used to do speech
>> recognition. However, for a web app developer who just wants to
>> develop an application that uses speech input and/or output, it
>> doesn't seem very convenient, since it requires server-side
>> infrastructure that is very costly to develop and run. A
>> speech-specific API in the browser gives browser implementors the
>> option to use on-device speech services provided by the OS, or
>> server-side speech synthesis/recognition.
>
> Again, it would help a lot of you could provide use cases and
> requirements. This helps both with designing an API, as well as
> evaluating if the use cases are common enough that a dedicated API is
> the best solution.
>
> / Jonas
I'm mostly thinking about speech web apps for mobile devices. I think
that's where speech makes most sense as an input and output method,
because of the poor keyboards, small screens, and frequent hands/eyes
busy situations (e.g. while driving). Accessibility is the other big
reason for using speech.
Some ideas for use cases:
- Search by speaking a query
- Speech-to-speech translation
- Voice Dialing (could open a tel: URI to actually make the call)
- Dialog systems (e.g. the canonical pizza ordering system)
- Lightweight JavaScript browser extensions (e.g. Greasemonkey /
Chrome extensions) for using speech with any web site, e.g, for
accessibility.
Requirements:
- Web app developer side:
- Allows both speech recognition and synthesis.
- Easy to use API. Makes simple things easy and advanced things possible.
- Doesn't require web app developer to develop / run his own speech
recognition / synthesis servers.
- (Natural) language-neutral API.
- Allows developer-defined application specific grammars / language models.
- Allows multilingual applications.
- Allows easy localization of speech apps.
- Implementor side:
- Easy enough to implement that it can get wide adoption in browsers.
- Allows implementor to use either client-side or server-side
recognition and synthesis.
--
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
More information about the whatwg
mailing list