[whatwg] Speech input element

Tue May 18 01:52:53 PDT 2010

On Tue, May 18, 2010 at 8:02 AM, Anne van Kesteren <annevk at opera.com> wrote:
> On Mon, 17 May 2010 15:05:22 +0200, Bjorn Bringert <bringert at google.com>
> wrote:
>>
>> Back in December there was a discussion about web APIs for speech
>> recognition and synthesis that saw a decent amount of interest
>>
>> (http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-December/thread.html#24281).
>> Based on that discussion, we would like to propose a simple API for
>> speech recognition, using a new <input type="speech"> element. An
>> informal spec of the new API, along with some sample apps and use
>> cases can be found at:
>>
>> http://docs.google.com/Doc?docid=0AaYxrITemjbxZGNmZzc5cHpfM2Ryajc5Zmhx&hl=en.
>>
>> It would be very helpful if you could take a look and share your
>> comments. Our next steps will be to implement the current design, get
>> some feedback from web developers, continue to tweak, and seek
>> standardization as soon it looks mature enough and/or other vendors
>> become interested in implementing it.
>
> I wonder how it relates to the <device> proposal already in the draft. In
> theory that supports microphone input too.

It would be possible to implement speech recognition on top of a
microphone input API. The most obvious approach would be to use
<device> to get an audio stream, and send that audio stream to a
server (e.g. using WebSockets). The server runs a speech recognizer
and returns the results.

Advantages of the speech input element:

- Web app developers do not need to build and maintain a speech
recognition service.

- Implementations can choose to use client-side speech recognition.
This could give reduced network traffic and latency (but probably also
reduced recognition accuracy and language support). Implementations
could also use server-side recognition by default, switching to local
recognition in offline or low bandwidth situations.

- Using a general audio capture API would require APIs for things like
audio encoding and audio streaming. Judging from the past results of
specifying media features, this may be non-trivial. The speech input
element turns all audio processing concerns into implementation
details.

- Implementations can have special UI treatment for speech input,
which may be different from that for general audio capture.

Advantages of using a microphone API:

- Web app developers get complete control over the quality and
features of the speech recognizer. This is a moot point for most
developers though, since they do not have the resources to run their
own speech recognition service.

- Fewer features to implement in browsers (assuming that a microphone
API would be added anyway).

-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902