<div>I've been working on a web app which reads text in a web page, highlighting each word as it is read. For this to be possible, a Text-To-Speech API is needed which is able to:</div><div>(1) generate the speech audio from some text, and�</div>
<div>(2) include the time indicies for when each of the words in the text is spoken.�</div><div><br></div><div>Microsoft has its�Sapi.SpVoice API via�ActiveXObject which does (1) but not (2) apparently. There are web services�(usable in conjunction with HTML5 Audio) which also do (1) such as the�<a href="http://www.ispeech.org/api">iSpeech API</a>�and Google Translate's TTS <<a href="http://translate.google.com/translate_tts?q=Hello%2C+World&tl=en">http://translate.google.com/translate_tts?q=Hello%2C+World&tl=en</a>>, but none that I have found which do (2). In any case, web services aren't�preferable�since they require that the audio be transferred over the network which could take a significant amount of time.</div>
<div><br></div><div>Is anyone aware of any work done to develop a standard TTS API for the Web? Operating systems already have this functionality built-in, and it's a shame that web apps can't make use of it. If Google Gears were alive, it would've been a good place to prototype this, but alas�</div>
<div><br></div><div>Thanks,</div><div>Weston</div>