[whatwg] Codecs for <audio> and <video>

Wed Jul 8 02:20:16 PDT 2009

On Tue, 07 Jul 2009 22:45:41 +0200, Charles Pritchard <chuck at jumis.com>  
wrote:

> On 7/7/09 1:10 PM, Philip Jagenstedt wrote:
>> On Tue, 07 Jul 2009 17:52:29 +0200, Charles Pritchard <chuck at jumis.com>  
>> wrote:
>>
>>> Philip Jagenstedt wrote:
>>>> For all of the simpler use cases you can already generate sounds  
>>>> yourself with a data uri. For example, with is 2 samples of silence:  
>>>> "data:audio/wav;base64,UklGRigAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQQAAAAAAAAA".
>>> Yes you can use this method, and with the current audio tag and  
>>> autobuffer, it may work to some degree.
>>>
>>> It does not produce smooth transitions.
> At some point, a Blob / Stream API could make things like this easier.
>> If the idea is to write a Vorbis decoder in JavaScript that would be  
>> quite cool in a way, but for vendors already implementing Vorbis it  
>> wouldn't really add anything. A pure JS-implementation of any modern  
>> audio codec would probably be a ridiculous amount of code and slow, so  
>> I doubt it would be that useful in practice.
>>
> Well I'd like to disagree, and reiterate my prior arguments.  Vorbis  
> decoders have been written in ActionScript and in Java.
> They are not ridiculous, in size, nor in CPU usage. They can play audio  
> streams, smoothly, and the file size is completely
> tolerable. And the idea is codec neutrality, a Vorbis decoder is just  
> one example.

OK, I won't make any assumptions of the size/speed of such an  
implementation until I see one.

>> For some use cases you could use 2 audio elements in tandem, mixing new  
>> sound to a new data URI when the first is nearing the end (although  
>> sync can't be guaranteed with the current API). But yes, there are  
>> things which can only be done by a streaming API integrating into the  
>> underlying media framework.
> Yes, the current API is inadequate. data: encoding is insufficient.
>> Here's the list of propsed features right out of a comment block in the  
>> spec:
> This list of features can be written without a spec, using <canvas>,
> using a raw data buffer, and using ECMAScript.
>
> A few of these features may need hardware level support, or a fast  
> computer.
> The <audio> tag would be invisible, and the <canvas> tag would
> provide the user interface.
>> Your use cases probably fall under audio filters and synthesis. I  
>> expect that attention will turn to gradually more complex use cases  
>> when the basic API we have now is implemented and stable cross-browser  
>> and cross-platform.
> Yes, some of these use cases qualify as filters, some qualify as  
> synthesis.
> I'm proposing that simple filters and synthesis can be accomplished with  
> modern
> ECMAScript virtual machines and a raw data buffer. My use cases are  
> qualified to current capabilities.
>
> Apart from those use cases, I'm proposing that a raw data buffer will  
> allow for
> codec neutrality.
>
> There are dozens of minor audio codecs, some simpler than others, some  
> low bitrate,
> that could be programmed in ECMAScript and would run just fine with  
> modern ECMAScript VMs.
>
> Transcoding lossy data is a sub-optimal solution. Allowing for arbitrary  
> <audio>
> codecs is a worthwhile endeavor. ECMAScript can detect if playback is  
> too slow.
>
> Additionally, in some cases, the programmer could work-around broken  
> codec implementations.
> It's forward-looking, it allows real backward compatibility and  
> interoperability across browsers.
>
> <canvas> allows for arbitrary, programmable video, <audio> should allow
> for programmable audio. Then, we can be codec neutral in our media  
> elements.

While stressing that I don't think this should go into the spec until  
there's a proof-of-concept implementation that does useful stuff, is the  
idea to set audio.src=new MySynthesizer() and play()? (MySynthesizer would  
need to implement some standard interface.) You also have the question of  
push vs pull, i.e. does the audio source request data from the synthesizer  
when needed or does the synthesizer need to run a loop pushing audio data?

-- 
Philip Jägenstedt
Core Developer
Opera Software