[whatwg] Speech input element

Anne van Kesteren annevk at opera.com
Wed May 19 00:50:36 PDT 2010

On Tue, 18 May 2010 10:52:53 +0200, Bjorn Bringert <bringert at google.com>  
> On Tue, May 18, 2010 at 8:02 AM, Anne van Kesteren <annevk at opera.com>  
> wrote:
>> I wonder how it relates to the <device> proposal already in the draft.  
>> In theory that supports microphone input too.
> It would be possible to implement speech recognition on top of a
> microphone input API. The most obvious approach would be to use
> <device> to get an audio stream, and send that audio stream to a
> server (e.g. using WebSockets). The server runs a speech recognizer
> and returns the results.
> Advantages of the speech input element:
> - Web app developers do not need to build and maintain a speech
> recognition service.
> - Implementations can choose to use client-side speech recognition.
> This could give reduced network traffic and latency (but probably also
> reduced recognition accuracy and language support). Implementations
> could also use server-side recognition by default, switching to local
> recognition in offline or low bandwidth situations.
> - Using a general audio capture API would require APIs for things like
> audio encoding and audio streaming. Judging from the past results of
> specifying media features, this may be non-trivial. The speech input
> element turns all audio processing concerns into implementation
> details.
> - Implementations can have special UI treatment for speech input,
> which may be different from that for general audio capture.

I guess I don't really see why this cannot be added on top of the <device>  
element. Maybe it is indeed better though to separate the too. The reason  
I'm mostly asking is that one reason we went with <device> rather than  
<input> is that the result of the user operation is not something that  
will partake in form submission. Now obviously a lot of use cases today  
for form controls do not partake in form submission but are handled by  
script, but all the controls that are there can be used as part of form  
submission. <input type=speech> does not seem like it can.

> Advantages of using a microphone API:
> - Web app developers get complete control over the quality and
> features of the speech recognizer. This is a moot point for most
> developers though, since they do not have the resources to run their
> own speech recognition service.
> - Fewer features to implement in browsers (assuming that a microphone
> API would be added anyway).

Right, and I am pretty positive we will add a microphone API. What e.g.  
could be done is that you have a speech recognition object of some sorts  
that you can feed the audio stream that comes out of <device>. (Or indeed  
you feed the stream to a server via WebSocket.)

Anne van Kesteren

More information about the whatwg mailing list