[whatwg] Codecs for <audio> and <video>
chuck at jumis.com
Wed Jul 8 09:24:42 PDT 2009
On 7/8/09 2:20 AM, Philip Jagenstedt wrote:
> On Tue, 07 Jul 2009 22:45:41 +0200, Charles Pritchard
> <chuck at jumis.com> wrote:
>> At some point, a Blob / Stream API could make things like this easier.
>>> quite cool in a way, but for vendors already implementing Vorbis it
>>> wouldn't really add anything. A pure JS-implementation of any modern
>>> audio codec would probably be a ridiculous amount of code and slow,
>>> so I doubt it would be that useful in practice.
>> Well I'd like to disagree, and reiterate my prior arguments. Vorbis
>> decoders have been written in ActionScript and in Java.
>> They are not ridiculous, in size, nor in CPU usage. They can play
>> audio streams, smoothly, and the file size is completely
>> tolerable. And the idea is codec neutrality, a Vorbis decoder is just
>> one example.
> OK, I won't make any assumptions of the size/speed of such an
> implementation until I see one.
Well, again, there exist implementations running on Sun/Oracle's Java
VM and the Flash VM.
These two use byte-code packaging, so the file size is under 100k,
source would also weigh under 100k.
>> Transcoding lossy data is a sub-optimal solution. Allowing for
>> arbitrary <audio>
>> codecs is a worthwhile endeavor. ECMAScript can detect if playback is
>> too slow.
I want to point this out again.
While there is some struggle to define a standard codec (so we might be
spared the burden
of so very many encoders), there is a very large supply of
already-encoded media in the wild.
I've recently worked on a project that required a difficult to
Open source descriptions were available, and if it was an option, I
certainly would have
paid to have the codec written in ECMAScript, and delivered it with the
In that particular case, paying someone to write a decoder for one
particular, minority codec,
would have been cheaper, and more correct, than paying for the
transcoding of 60 gigs of low bit-rate audio.
Most media formats are lossy, making their current format, whatever the
encumbrance, the best solution.
>> Additionally, in some cases, the programmer could work-around broken
>> codec implementations.
>> It's forward-looking, it allows real backward compatibility and
>> interoperability across browsers.
>> <canvas> allows for arbitrary, programmable video, <audio> should allow
>> for programmable audio. Then, we can be codec neutral in our media
> While stressing that I don't think this should go into the spec until
> there's a proof-of-concept implementation that does useful stuff, is
> the idea to set audio.src=new MySynthesizer() and play()?
> (MySynthesizer would need to implement some standard interface.) You
> also have the question of push vs pull, i.e. does the audio source
> request data from the synthesizer when needed or does the synthesizer
> need to run a loop pushing audio data?
Well we really need to define what useful stuff is, you know, to set
There are two use cases that I think are important: a codec
implementation (let's use Vorbis),
and an accessibility implementation, working with a <canvas> element.
I don't know what would qualify for accessibility. A topographical map,
which makes a lower or higher
pitched hum, based on elevation (surrounding the pointer), is an example.
On that same line of thinking, a hum of varying intensity signaling
proximity to a clickable element,
(we're still talking about <canvas>) might be useful. If there is no
sound in the right-channel,
there are no elements to be clicked on, to the right of the pointer. If
it is a low-sound, then the
element is rather far away.
Site developers still need to put in the work. With a buffered audio
API, they'll at least
have the option to do so.
Can we come to an agreement as to what would constitute a reasonable
proof of concept?
This is meant to allow <canvas> to be more accessible to the visually
Obviously, <audio src> tags could be used in many cases with <canvas>,
so our test-case
should be one where <audio src> would be insufficient.
Both of these use cases can be accomplished with a raw audio buffer.
They do not need native channel mixing, nor toDataURL support.
In the long term, I think those two options would be nice, but in the
short term, would just cause delays in adoption.
As Robert has said, there are "much more important things to work on"
( https://bugzilla.mozilla.org/show_bug.cgi?id=490705 ).
I think at this point, the model should play buffered bytes as they are
made available (if the buffer has anything, start playing it).
I believe the "buffered" attribute can be used by the ECMAScript loop to
how much data is buffered, and whether it should continue decoding or
take other actions.
The buffered audio API should be handled by the media API in a way
similar to streaming Web radio.
There should be an origin-clean flag, for future use. One might
add audio into a currently playing stream. (regardless of toDataURL
Does this sound reasonable? What I'm requesting is an append-only raw
audio buffer, and an origin-clean flag (similar to <canvas>)
to be added to the <audio> tag, if not the Media element interface, for
future use. The audio buffer plays immediately,
if any data is available in it.
In v2, we would discuss Vlad's getAudioSampleData proposal, native
channel mixing (mix two audio streams, for whatever reason),
and other effects that allow the more complex "audio editor" use case.
For now, let's just consider an "audio player"
to support arbitrary audio codecs and address accessibility for the
We need Audio.appendBuffer, Audio.createBufferArray
and an AudioBufferArray interface of some sort, and I think it's good to go.
The naming and arguments still need to be worked out.
I'd enthusiastically support such an interface in Java, Flash and
.Net/Active X plugins.
For the legacy/IE crowd.
More information about the whatwg