[whatwg] Web API for speech recognition and synthesis

Fri Dec 11 13:45:32 PST 2009

(Sending this 2nd time. Hopefully whatwg list doesn't bounce it back.)

On 12/11/09 6:05 AM, Bjorn Bringert wrote:
> Thanks for the discussion - cool to see more interest today also
> (http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024453.html)
>
> I've hacked up a proof-of-concept JavaScript API for speech
> recognition and synthesis. It adds a navigator.speech object with
> these functions:
>
> void listen(ListenCallback callback, ListenOptions options);
> void speak(DOMString text, SpeakCallback callback, SpeakOptions options);

So if I read the examples correctly you're not using grammars anywhere.
I wonder how well does that work in real world cases. Of course if
the speech recognizer can handle everything well without grammars, the
result validation could be done in JS after the result is got from the
recognizer. But I think having support for grammars simplifies coding
and can make speech dialogs somewhat more manageable.

W3C has already standardized things like
http://www.w3.org/TR/speech-grammar/ and
http://www.w3.org/TR/semantic-interpretation/
and the latter one works quite nicely with JS.

Again, I think this kind of discussion should happen in W3C multimodal 
WG. Though, I'm not sure how actively or how openly that working group 
works atm.

-Olli

>
> The implementation uses an NPAPI plugin for the Android browser that
> wraps the existing Android speech APIs. The code is available at
> http://code.google.com/p/speech-api-browser-plugin/
>
> There are some simple demo apps in
> http://code.google.com/p/speech-api-browser-plugin/source/browse/trunk/android-plugin/demos/
> including:
>
> - English to Spanish speech-to-speech translation
> - Google search by speaking a query
> - The obligatory pizza ordering system
> - A phone number dialer
>
> Comments appreciated!
>
> /Bjorn
>
> On Fri, Dec 4, 2009 at 2:51 PM, Olli Pettay<Olli.Pettay at helsinki.fi>  wrote:
>> Indeed the API should be something significantly simpler than X+V.
>> Microsoft has (had?) support for SALT. That API is pretty simple and
>> provides speech recognition and TTS.
>> The API could be probably even simpler than SALT.
>> IIRC, there was an extension for Firefox to support SALT (well, there was
>> also an extension to support X+V).
>>
>> If the platform/OS provides ASR and TTS, adding a JS API for it should
>> be pretty simple. X+V tries to handle some logic using VoiceXML FIA, but
>> I think it would be more web-like to give pure JS API (similar to SALT).
>> Integrating visual and voice input could be done in scripts. I'd assume
>> there would be some script libraries to handle multimodal input integration
>> - especially if there will be touch and gestures events too etc. (Classic
>> multimodal map applications will become possible in web.)
>>
>> But this all is something which should be possibly designed in or with W3C
>> multimodal working group. I know their current architecture is way more
>> complex, but X+X, SALT and even Multimodal-CSS has been discussed in that
>> working group.
>>
>>
>> -Olli
>>
>>
>>
>> On 12/3/09 2:50 AM, Dave Burke wrote:
>>>
>>> We're envisaging a simpler programmatic API that looks familiar to the
>>> modern Web developer but one which avoids the legacy of dialog system
>>> languages.
>>>
>>> Dave
>>>
>>> On Wed, Dec 2, 2009 at 7:25 PM, João Eiras<joaoe at opera.com
>>> <mailto:joaoe at opera.com>>  wrote:
>>>
>>>     On Wed, 02 Dec 2009 12:32:07 +0100, Bjorn Bringert
>>>     <bringert at google.com<mailto:bringert at google.com>>  wrote:
>>>
>>>         We've been watching our colleagues build native apps that use
>>> speech
>>>         recognition and speech synthesis, and would like to have JavaScript
>>>         APIs that let us do the same in web apps. We are thinking about
>>>         creating a lightweight and implementation-independent API that lets
>>>         web apps use speech services. Is anyone else interested in that?
>>>
>>>         Bjorn Bringert, David Singleton, Gummi Hafsteinsson
>>>
>>>
>>>     This exists already, but only Opera supports it, although there are
>>>     problems with the library we use for speech recognition.
>>>
>>>     http://www.w3.org/TR/xhtml+voice/
>>>
>>>   http://dev.opera.com/articles/view/add-voice-interactivity-to-your-site/
>>>
>>>     Would be nice to revive that specification and get vendor buy-in.
>>>
>>>
>>>
>>>     --
>>>
>>>     João Eiras
>>>     Core Developer, Opera Software ASA, http://www.opera.com/
>>>
>>>
>>
>>
>
>
>