[whatwg] Some questions and ideas about the "Speech for HTML Input Elements" proposal.

Mon Jun 21 04:25:34 PDT 2010

Hi James,

> 1. I'm thinking about the possibility of a UA to offload the speech recognition
> task to an external local service or application, such as an input method or
> a text service or even a browser extension. I'm just wondering if the user
> interaction flow would still be same when it happens outside the UA. For
> example if an input method supports voice input, it may generate voice input
> results as well as some other events(eg. fake keyboard or mouse events)
> during an speech session.

Are there any reasons to have fake keyboard events in this case? As an
example I see how copy/paste works in various UAs - for each key that
the user presses, the input element receives a 'keypress' event and an
'input' event, whereas when the user copies a full word and pastes it
into the input element a single 'input' event is fired with no
'keypress' events.

> 2. Besides just input, is it possible to perform other actions via
> speech? For example to activate a button or clear the content
> in the input element? It would be cool if some rules can be defined
> to trigger different actions or even javascript callbacks by speech.

The speech input API proposal includes a 'Speech Control' (in the
Future work section) which is aimed at such of action oriented speech
input. The control's event handler can decide on the voice actions to
support that are relevant for that web page.

> 3. How to manage the speech input focus? What will happen
> if there are multiple elements accept speech in a page? Is it
> possible to traverse among them only by speech?

The current proposal is to activate speech input by user action, for
e.g. a click on the speech button. The user traverses among them just
like how they would do today.

> 4. Is it possible to extends this proposal to other input mechanism?
> Like handwriting or even visual(gesture) recognition input? Even if it's
> not necessary for now, we may need to consider the potential
> impact when we want to add this kind of thing in the future.

This sounds like a good idea, especially for the hand held/mobile
devices where gesture input can be quite useful.

> 5. I'm just wondering if it's better to use the speech related
> properties as hints to the UA rather than requirements. It
> should be ok for a UA to provide speech input feature for input
> elements without the speech property and for another UA to
> simply ignore those properties.

If a UA is capable of speech input, it definitely makes sense to
support speech input for all input elements in all web pages. That
would be similar to an IME without any changes required in any web
page. The speech input API proposal however is aimed at web
applications which can make use of speech specific events and data. We
have described some use cases in the"Why is a speech-specific API
needed?" section of the doc.

--
Cheers
Satish