[whatwg] Codecs for <audio> and <video>

Tue Jul 7 13:45:41 PDT 2009

On 7/7/09 1:10 PM, Philip Jagenstedt wrote:
> On Tue, 07 Jul 2009 17:52:29 +0200, Charles Pritchard 
> <chuck at jumis.com> wrote:
>
>> Philip Jagenstedt wrote:
>>> For all of the simpler use cases you can already generate sounds 
>>> yourself with a data uri. For example, with is 2 samples of silence: 
>>> "data:audio/wav;base64,UklGRigAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQQAAAAAAAAA". 
>>>
>> Yes you can use this method, and with the current audio tag and 
>> autobuffer, it may work to some degree.
>>
>> It does not produce smooth transitions. 
At some point, a Blob / Stream API could make things like this easier.
> If the idea is to write a Vorbis decoder in JavaScript that would be 
> quite cool in a way, but for vendors already implementing Vorbis it 
> wouldn't really add anything. A pure JS-implementation of any modern 
> audio codec would probably be a ridiculous amount of code and slow, so 
> I doubt it would be that useful in practice.
>
Well I'd like to disagree, and reiterate my prior arguments.  Vorbis 
decoders have been written in ActionScript and in Java.
They are not ridiculous, in size, nor in CPU usage. They can play audio 
streams, smoothly, and the file size is completely
tolerable. And the idea is codec neutrality, a Vorbis decoder is just 
one example.

> For some use cases you could use 2 audio elements in tandem, mixing 
> new sound to a new data URI when the first is nearing the end 
> (although sync can't be guaranteed with the current API). But yes, 
> there are things which can only be done by a streaming API integrating 
> into the underlying media framework.
Yes, the current API is inadequate. data: encoding is insufficient.
> Here's the list of propsed features right out of a comment block in 
> the spec:
This list of features can be written without a spec, using <canvas>,
using a raw data buffer, and using ECMAScript.

A few of these features may need hardware level support, or a fast computer.
The <audio> tag would be invisible, and the <canvas> tag would
provide the user interface.
> Your use cases probably fall under audio filters and synthesis. I 
> expect that attention will turn to gradually more complex use cases 
> when the basic API we have now is implemented and stable cross-browser 
> and cross-platform.
Yes, some of these use cases qualify as filters, some qualify as synthesis.
I'm proposing that simple filters and synthesis can be accomplished with 
modern
ECMAScript virtual machines and a raw data buffer. My use cases are 
qualified to current capabilities.

Apart from those use cases, I'm proposing that a raw data buffer will 
allow for
codec neutrality.

There are dozens of minor audio codecs, some simpler than others, some 
low bitrate,
that could be programmed in ECMAScript and would run just fine with 
modern ECMAScript VMs.

Transcoding lossy data is a sub-optimal solution. Allowing for arbitrary 
<audio>
codecs is a worthwhile endeavor. ECMAScript can detect if playback is 
too slow.

Additionally, in some cases, the programmer could work-around broken 
codec implementations.
It's forward-looking, it allows real backward compatibility and 
interoperability across browsers.

<canvas> allows for arbitrary, programmable video, <audio> should allow
for programmable audio. Then, we can be codec neutral in our media elements.