<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Dec 3, 2009, at 4:06 AM, Bjorn Bringert wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>On Wed, Dec 2, 2009 at 10:20 PM, Jonas Sicking <<a href="mailto:jonas@sicking.cc">jonas@sicking.cc</a>> wrote:<br><blockquote type="cite">On Wed, Dec 2, 2009 at 11:17 AM, Bjorn Bringert <<a href="mailto:bringert@google.com">bringert@google.com</a>> wrote:<br></blockquote><blockquote type="cite"><blockquote type="cite">I agree that being able to capture and upload audio to a server would<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">be useful for a lot of applications, and it could be used to do speech<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">recognition. However, for a web app developer who just wants to<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">develop an application that uses speech input and/or output, it<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">doesn't seem very convenient, since it requires server-side<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">infrastructure that is very costly to develop and run. A<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">speech-specific API in the browser gives browser implementors the<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">option to use on-device speech services provided by the OS, or<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">server-side speech synthesis/recognition.<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Again, it would help a lot of you could provide use cases and<br></blockquote><blockquote type="cite">requirements. This helps both with designing an API, as well as<br></blockquote><blockquote type="cite">evaluating if the use cases are common enough that a dedicated API is<br></blockquote><blockquote type="cite">the best solution.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">/ Jonas<br></blockquote><br>I'm mostly thinking about speech web apps for mobile devices. I think<br>that's where speech makes most sense as an input and output method,<br>because of the poor keyboards, small screens, and frequent hands/eyes<br>busy situations (e.g. while driving). Accessibility is the other big<br>reason for using speech.<br></div></blockquote><div>Accessibility is already handle through ARIA and the host platforms accessibility features.</div><br><blockquote type="cite"><div><br>Some ideas for use cases:<br><br>- Search by speaking a query<br>- Speech-to-speech translation<br>- Voice Dialing (could open a tel: URI to actually make the call)<br>- Dialog systems (e.g. the canonical pizza ordering system)<br>- Lightweight JavaScript browser extensions (e.g. Greasemonkey /<br>Chrome extensions) for using speech with any web site, e.g, for<br>accessibility.<br></div></blockquote><div><br></div><div>I am unsure why the site should be directly responsible for things like audio based accessibility. What do you believe a site should be doing itself manually vs. the accessibility services provided by the host OS?</div><br><blockquote type="cite"><div><br>Requirements:<br><br>- Web app developer side:<br> - Allows both speech recognition and synthesis.<br></div></blockquote><div>ARIA (in conjunction with the OS accessibility services) already provides the accessibility focused text to speech (unsure about recognition side)</div><blockquote type="cite"><div><font class="Apple-style-span" color="#000000"><br></font> - Doesn't require web app developer to develop / run his own speech<br>recognition / synthesis servers.<br></div></blockquote><div>This would seem to be "use the OS services"</div><blockquote type="cite"><div><font class="Apple-style-span" color="#000000"><br></font>- Implementor side:<br> - Easy enough to implement that it can get wide adoption in browsers.<br></div></blockquote><div>These services are not simple -- any implementation would seem to be a significant amount of work, especially if you want to a) actually be good at it and b) interact with the host OS's native accessibility features.</div><br><blockquote type="cite"><div> - Allows implementor to use either client-side or server-side<br>recognition and synthesis.<br></div></blockquote><div>I honestly have no idea what you mean by this.</div><div><br></div>--Oliver</div><div><br></div></body></html>