<div>I&#39;ve been working on a web app which reads text in a web page, highlighting each word as it is read. For this to be possible, a Text-To-Speech API is needed which is able to:</div><div>(1) generate the speech audio from some text, and </div>


<div>(2) include the time indicies for when each of the words in the text is spoken. </div><div><br></div><div>Microsoft has its Sapi.SpVoice API via ActiveXObject which does (1) but not (2) apparently. There are web services (usable in conjunction with HTML5 Audio) which also do (1) such as the <a href="http://www.ispeech.org/api">iSpeech API</a> and Google Translate&#39;s TTS &lt;<a href="http://translate.google.com/translate_tts?q=Hello%2C+World&amp;tl=en">http://translate.google.com/translate_tts?q=Hello%2C+World&amp;tl=en</a>&gt;, but none that I have found which do (2). In any case, web services aren&#39;t preferable since they require that the audio be transferred over the network which could take a significant amount of time.</div>


<div><br></div><div>Is anyone aware of any work done to develop a standard TTS API for the Web? Operating systems already have this functionality built-in, and it&#39;s a shame that web apps can&#39;t make use of it. If Google Gears were alive, it would&#39;ve been a good place to prototype this, but alas…</div>


<div><br></div><div>Thanks,</div><div>Weston</div>