[whatwg] Speech input element

Thu May 20 09:42:27 PDT 2010

> --------
On Thu, 20 May 2010 14:18:56 +0100, Bjorn Bringert <bringert at google.com> wrote:
> On Thu, May 20, 2010 at 1:32 PM, Anne van Kesteren <annevk at opera.com> wrote:
> >> On Thu, 20 May 2010 14:29:16 +0200, Bjorn Bringert <bringert at google.com>
> >>
> >> It should be possible to drive <input type="speech"> with keyboard
> >> input, if the user agent chooses to implement that. Nothing in the API
> >> should require the user to actually speak. I think this is a strong
> >> argument for why <input type="speech"> should not be replaced by a
> >> microphone API and a separate speech recognizer, since the latter
> >> would be very hard to make accessible. (I still think that there
> >> should be a microphone API for applications like audio chat, but
> >> that's a separate discussion).
> >
> > So why not implement speech support on top of the existing input types?

> > Speech-driven keyboards certainly get you some of the benefits of
> <input type="speech">, but they give the application developer less
> control and less information than a speech-specific API. Some
> advantages of a dedicated speech input type:
	It's more important that users have control (e.g. on whether they want
to input text by voice or typing) than devs. Devs don't know the needs
of every single user of their forms.

	Also, I don't see any new speech-specific 
> - Application-defined grammars. This is important for getting high
> recognition accuracy in with limited domains.
	This may be true, but does this require a new type? I really don't know.
> - Allows continuous speech recognition where the app gets events on
> speech endpoints.
Please describe how exactly this is different from continuous text input.
> - Doesn't require the input element to have keyboard focus while speaking.
	Neither does <input type="text"> if the user chooses to input text into it
with voice. It requires "microphone focus" (termed "activated" in draft).
Anything else is a usibility issue in the app, not in the form spec.
> - Doesn't require a visible text input field.
	HTML does not (or at least shouldn't) define how elements will be presented.
In especial; it does not mandate a visual interface if the user doesn't want
one. See also: CSS.

	Also the spec clearly states that "The user can click the element to move
back to the not activated state." So the draft suggests a visible input element,
assuming that this was an informal note and not a requirement.

>From the draft on <http://docs.google.com/Doc?docid=0AaYxrITemjbxZGNmZzc5cHpfM2Ryajc5Zmhx&hl=en>:
> Web search by voice
> Speech translation
<input type="text> for client-side recognition, <input type="audio"> for server-side.
> Speech-enabled webmail client
Commandline interface with pronounceable commands (as is recommended for commandline
interfaces in generl anyway.
> VoiceXML interpreter
I don't see how XML interpreters relate to speech-based HTML forms.
Or my definition of "interpreter" doesn't match yours (I don't write English natively).

--- code sample from draft ---
<html>
<script type="text/javascript">
function startSearch(event) {
Â  var query = event.target.value;
Â Â document.getElementById("q").value = query;
Â Â // use AJAX search API to get results forÂ 
Â Â // q.value and put in #search_results.
}
</script><body>

<form name="search_form">
<input type="text" name="q" id="q">
<input type="speech" grammar="builtin:search" onchange="startSearch">
</form>

<div id="search_results"></div>

</body></html>
--- end of code sample ---
How is listening for changes on one element and moving them to another element
and then submitting the form better than e.g.
--- code sample ---
<html>
<!-- tell browser that form is a search box -->
<link rel="search" href="#search"> 
<body>
<form id="search"> <!-- or name="search" -->
	<input type="search" name="q" id="q">
</form>
</body>
</html>
--- end of code sample ---
Works sans scripting, scripted submit can be used if scripting is supported.
I'd understand it if it linked to some SRGS stuff, but it doesn't.
Also it brakes the @type attribute of <input> so you had to add /another/
attribute to tell browsers what type of information is expected to input
into the <input>. Speech isn't a type of information. It's a way to input
information.

Really, you should be using CSS and JavaScript if your want
fine-grained control over the user-interaction (for human users that'll
use the form). Feel free to add speech recognition capabilites to
JavaScript and improve CSS styling of voice media.

If you wanted to integrate HTML forms and SRGS, that shouldn't brake <input
type"">.