[whatwg] Speech input element

Wed May 19 01:36:26 PDT 2010

On Wed, May 19, 2010 at 12:50 AM, Anne van Kesteren <annevk at opera.com> wrote:
> On Tue, 18 May 2010 10:52:53 +0200, Bjorn Bringert <bringert at google.com> wrote:
>>...
>> Advantages of the speech input element:
>>
>> - Web app developers do not need to build and maintain a speech
>> recognition service.

But browser authors would, and it's not clear they will do so in a
cross-platform, compatible way.  Client devices with limited cache
memory sizes and battery power aren't very good at the Viterbi beam
search algorithm, which isn't helped much by small caches because it's
mostly random reads across wide memory spans.

>> - Implementations can have special UI treatment for speech input,
>> which may be different from that for general audio capture.
>
> I guess I don't really see why this cannot be added on top of the <device>
> element. Maybe it is indeed better though to separate the two. The reason
> I'm mostly asking is that one reason we went with <device> rather than
> <input> is that the result of the user operation is not something that will
> partake in form submission....

That's not a good reason.  Audio files are uploaded with <input
type=file> all the time, but it wasn't until Flash made it possible
that browser authors started considering the possibilities of
microphone upload, even though they were urged to address the issue a
decade ago:

> From: Tim Berners-Lee <timbl at w3.org>
> Date: Fri, 31 Mar 2000 16:37:02 -0500
>...
> This is a question of getting browser manufacturers to
> implement what is already in HTML....  HTML 4 does already
> include a way of requesting audio input.  For instance,
> you can write:
>
> <INPUT name="audiofile1" type="file" accept="audio/*">
>
> and be prompted for various means of audio input (a recorder,
> a mixing desk, a file icon drag and drop receptor, etc).
> Here "file" does not mean "from a disk" but "large body of
> data with a MIME type".
>
> As someone who used the NeXT machine's "lip service" many
> years ago I see no reason why browsers should not implement
> both audio and video and still capture in this way.   There
> are many occasions that voice input is valuable. We have speech
> recognition systems in the lab, for example, and of course this
> is very much needed....  So you don't need to convince me of
> the usefulness.
>
> However, browser writers have not implemented this!
>
> One needs to encourage this feature to be implemented, and
> implemented well.
>
> I hope this helps.
>
> Tim Berners-Lee

Further back in January, 2000, that same basic feature request had
been endorsed by more than 150 people, including:

    * Michael Swaine - in his article, "Sounds like..." -
webreview.com/pub/98/08/21/frames  - mswaine at swaine.com - well-known
magazine columnist for and long-time editor-in-chief of Dr. Dobb's
Journal
    * David Turner and Keith Ross of Institut Eurecom - in their
paper, "Asynchronous Audio Conferencing on the Web" -
www.eurecom.fr/~turner/papers/aconf/abstract.html -
{turner,ross}@eurecom.fr
    * Integrating Speech Technology in Language Learning SIG -
dbs.tay.ac.uk/instil - and InSTIL's ICARE committee, both chaired by
Lt. Col. Stephen LaRocca - gs0416 at exmail.usma.army.mil - a language
instructor at the U.S. Military Academy
    * Dr. Goh Kawai - goh at kawai.com - a researcher in the fields of
computer aided language instruction and speech recognition, and
InSTIL/ICARE founding member - www.kawai.com/goh
    * Ruth Ross - ruth at earthlab.com - IEEE Learning Technologies
Standards Committee - www.earthlab.com/RCR
    * Phil Siviter - Phil.Siviter at brighton.ac.uk - IEEE LTSC -
www.it.bton.ac.uk/staff/pfs/research.htm
    * Safia Barikzai - S.Barikzai at sbu.ac.uk - IEEE LTSC - www.sbu.ac.uk/barikzai
    * Gene Haldeman - gene at gene-haldeman.com - Computer Professionals
for Social Responsibility, Ethics Working Group
    * Steve Teicher - steve-teicher at att.net - University of Central
Florida; CPSR Education Working Group
    * Dr. Melissa Holland - mholland at arl.mil - team leader for the
U.S. Army Research Laboratory's Language Technology Group
    * Tull Jenkins - jenkinst at atsc.army.mil - U.S. Army Training
Support Centers

However, W3C decided not to move forward with the implementation
details at http://www.w3.org/TR/device-upload because they were said
to be "device dependent," which was completely meaningless, really.

Regards,
James Salsman