[whatwg] Web API for speech recognition and synthesis

Fri Dec 11 06:05:00 PST 2009

Thanks for the discussion - cool to see more interest today also
(http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024453.html)

I've hacked up a proof-of-concept JavaScript API for speech
recognition and synthesis. It adds a navigator.speech object with
these functions:

void listen(ListenCallback callback, ListenOptions options);
void speak(DOMString text, SpeakCallback callback, SpeakOptions options);

The implementation uses an NPAPI plugin for the Android browser that
wraps the existing Android speech APIs. The code is available at
http://code.google.com/p/speech-api-browser-plugin/

There are some simple demo apps in
http://code.google.com/p/speech-api-browser-plugin/source/browse/trunk/android-plugin/demos/
including:

- English to Spanish speech-to-speech translation
- Google search by speaking a query
- The obligatory pizza ordering system
- A phone number dialer

Comments appreciated!

/Bjorn

On Fri, Dec 4, 2009 at 2:51 PM, Olli Pettay <Olli.Pettay at helsinki.fi> wrote:
> Indeed the API should be something significantly simpler than X+V.
> Microsoft has (had?) support for SALT. That API is pretty simple and
> provides speech recognition and TTS.
> The API could be probably even simpler than SALT.
> IIRC, there was an extension for Firefox to support SALT (well, there was
> also an extension to support X+V).
>
> If the platform/OS provides ASR and TTS, adding a JS API for it should
> be pretty simple. X+V tries to handle some logic using VoiceXML FIA, but
> I think it would be more web-like to give pure JS API (similar to SALT).
> Integrating visual and voice input could be done in scripts. I'd assume
> there would be some script libraries to handle multimodal input integration
> - especially if there will be touch and gestures events too etc. (Classic
> multimodal map applications will become possible in web.)
>
> But this all is something which should be possibly designed in or with W3C
> multimodal working group. I know their current architecture is way more
> complex, but X+X, SALT and even Multimodal-CSS has been discussed in that
> working group.
>
>
> -Olli
>
>
>
> On 12/3/09 2:50 AM, Dave Burke wrote:
>>
>> We're envisaging a simpler programmatic API that looks familiar to the
>> modern Web developer but one which avoids the legacy of dialog system
>> languages.
>>
>> Dave
>>
>> On Wed, Dec 2, 2009 at 7:25 PM, João Eiras <joaoe at opera.com
>> <mailto:joaoe at opera.com>> wrote:
>>
>>    On Wed, 02 Dec 2009 12:32:07 +0100, Bjorn Bringert
>>    <bringert at google.com <mailto:bringert at google.com>> wrote:
>>
>>        We've been watching our colleagues build native apps that use
>> speech
>>        recognition and speech synthesis, and would like to have JavaScript
>>        APIs that let us do the same in web apps. We are thinking about
>>        creating a lightweight and implementation-independent API that lets
>>        web apps use speech services. Is anyone else interested in that?
>>
>>        Bjorn Bringert, David Singleton, Gummi Hafsteinsson
>>
>>
>>    This exists already, but only Opera supports it, although there are
>>    problems with the library we use for speech recognition.
>>
>>    http://www.w3.org/TR/xhtml+voice/
>>
>>  http://dev.opera.com/articles/view/add-voice-interactivity-to-your-site/
>>
>>    Would be nice to revive that specification and get vendor buy-in.
>>
>>
>>
>>    --
>>
>>    João Eiras
>>    Core Developer, Opera Software ASA, http://www.opera.com/
>>
>>
>
>

-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902