[whatwg] Codecs for <audio> and <video>

Wed Jul 8 09:24:42 PDT 2009

On 7/8/09 2:20 AM, Philip Jagenstedt wrote:
> On Tue, 07 Jul 2009 22:45:41 +0200, Charles Pritchard 
> <chuck at jumis.com> wrote:
>> At some point, a Blob / Stream API could make things like this easier.
>>> If the idea is to write a Vorbis decoder in JavaScript that would be 
>>> quite cool in a way, but for vendors already implementing Vorbis it 
>>> wouldn't really add anything. A pure JS-implementation of any modern 
>>> audio codec would probably be a ridiculous amount of code and slow, 
>>> so I doubt it would be that useful in practice.
>>>
>> Well I'd like to disagree, and reiterate my prior arguments.  Vorbis 
>> decoders have been written in ActionScript and in Java.
>> They are not ridiculous, in size, nor in CPU usage. They can play 
>> audio streams, smoothly, and the file size is completely
>> tolerable. And the idea is codec neutrality, a Vorbis decoder is just 
>> one example.
>
> OK, I won't make any assumptions of the size/speed of such an 
> implementation until I see one.
Well,  again, there exist implementations running on Sun/Oracle's Java 
VM and the Flash VM.
These two use byte-code packaging, so the file size is under 100k, 
deflated ECMAScript
source would also weigh under 100k.

>> Transcoding lossy data is a sub-optimal solution. Allowing for 
>> arbitrary <audio>
>> codecs is a worthwhile endeavor. ECMAScript can detect if playback is 
>> too slow.
I want to point this out again.

While there is some struggle to define a standard codec (so we might be 
spared the burden
of so very many encoders), there is a very large supply of 
already-encoded media in the wild.

I've recently worked on a project that required a difficult to 
obtain/install codec.
Open source descriptions were available, and if it was an option, I 
certainly would have
paid to have the codec written in ECMAScript, and delivered it with the 
media files.

In that particular case, paying someone to write a decoder for one 
particular, minority codec,
would have been cheaper, and more correct, than paying for the 
transcoding of 60 gigs of low bit-rate audio.

Most media formats are lossy, making their current format, whatever the 
encumbrance, the best solution.
>>
>> Additionally, in some cases, the programmer could work-around broken 
>> codec implementations.
>> It's forward-looking, it allows real backward compatibility and 
>> interoperability across browsers.
>>
>> <canvas> allows for arbitrary, programmable video, <audio> should allow
>> for programmable audio. Then, we can be codec neutral in our media 
>> elements.
>
> While stressing that I don't think this should go into the spec until 
> there's a proof-of-concept implementation that does useful stuff, is 
> the idea to set audio.src=new MySynthesizer() and play()? 
> (MySynthesizer would need to implement some standard interface.) You 
> also have the question of push vs pull, i.e. does the audio source 
> request data from the synthesizer when needed or does the synthesizer 
> need to run a loop pushing audio data?
>
Well we really need to define what useful stuff is, you know, to set 
that bar.

There are two use cases that I think are important: a codec 
implementation (let's use Vorbis),
and an accessibility implementation, working with a <canvas> element.

I don't know what would qualify for accessibility. A topographical map, 
which makes a lower or higher
pitched hum, based on elevation (surrounding the pointer), is an example.

On that same line of thinking, a hum of varying intensity signaling 
proximity to a clickable element,
(we're still talking about <canvas>) might be useful.  If there is no 
sound in the right-channel,
there are no elements to be clicked on, to the right of the pointer. If 
it is a low-sound, then the
element is rather far away.

Site developers still need to put in the work. With a buffered audio 
API, they'll at least
have the option to do so.

Can we come to an agreement as to what would constitute a reasonable 
proof of concept?
This is meant to allow <canvas> to be more accessible to the visually 
impaired.

Obviously, <audio src> tags could be used in many cases with <canvas>, 
so our test-case
should be one where <audio src> would be insufficient.

Both of these use cases can be accomplished with a raw audio buffer.
They do not need native channel mixing, nor toDataURL support.

In the long term, I think those two options would be nice, but in the 
short term, would just cause delays in adoption.
As Robert has said, there are "much more important things to work on"
( https://bugzilla.mozilla.org/show_bug.cgi?id=490705 ).

I think at this point, the model should play buffered bytes as they are 
made available (if the buffer has anything, start playing it).

I believe the "buffered" attribute can be used by the ECMAScript loop to 
detect
how much data is buffered, and whether it should continue decoding or 
take other actions.

The buffered audio API should be handled by the media API in a way 
similar to streaming Web radio.

There should be an origin-clean flag, for future use. One might 
theoretically
add audio into a currently playing stream. (regardless of toDataURL 
support).

Does this sound reasonable? What I'm requesting is an append-only raw 
audio buffer, and an origin-clean flag (similar to <canvas>)
to be added to the <audio> tag, if not the Media element interface, for 
future use. The audio buffer plays immediately,
if any data is available in it.

In v2, we would discuss Vlad's getAudioSampleData proposal, native 
channel mixing (mix two audio streams, for whatever reason),
and other effects that allow the more complex "audio editor" use case. 
For now, let's just consider an "audio player"
to support arbitrary audio codecs and address accessibility for the 
visually impaired.

We need Audio.appendBuffer, Audio.createBufferArray
and an AudioBufferArray interface of some sort, and I think it's good to go.

The naming and arguments still need to be worked out.

I'd enthusiastically support such an interface in Java, Flash and 
.Net/Active X plugins.
For the legacy/IE crowd.

-Charles