[whatwg] Codecs for <audio> and <video>

Thu Jul 9 07:35:31 PDT 2009

On Wed, 08 Jul 2009 18:24:42 +0200, Charles Pritchard <chuck at jumis.com>  
wrote:

> On 7/8/09 2:20 AM, Philip Jagenstedt wrote:
>> On Tue, 07 Jul 2009 22:45:41 +0200, Charles Pritchard <chuck at jumis.com>  
>> wrote:
>>> At some point, a Blob / Stream API could make things like this easier.
>>>> If the idea is to write a Vorbis decoder in JavaScript that would be  
>>>> quite cool in a way, but for vendors already implementing Vorbis it  
>>>> wouldn't really add anything. A pure JS-implementation of any modern  
>>>> audio codec would probably be a ridiculous amount of code and slow,  
>>>> so I doubt it would be that useful in practice.
>>>>
>>> Well I'd like to disagree, and reiterate my prior arguments.  Vorbis  
>>> decoders have been written in ActionScript and in Java.
>>> They are not ridiculous, in size, nor in CPU usage. They can play  
>>> audio streams, smoothly, and the file size is completely
>>> tolerable. And the idea is codec neutrality, a Vorbis decoder is just  
>>> one example.
>>
>> OK, I won't make any assumptions of the size/speed of such an  
>> implementation until I see one.
> Well,  again, there exist implementations running on Sun/Oracle's Java  
> VM and the Flash VM.
> These two use byte-code packaging, so the file size is under 100k,  
> deflated ECMAScript
> source would also weigh under 100k.
>
>>> Transcoding lossy data is a sub-optimal solution. Allowing for  
>>> arbitrary <audio>
>>> codecs is a worthwhile endeavor. ECMAScript can detect if playback is  
>>> too slow.
> I want to point this out again.
>
> While there is some struggle to define a standard codec (so we might be  
> spared the burden
> of so very many encoders), there is a very large supply of  
> already-encoded media in the wild.
>
> I've recently worked on a project that required a difficult to  
> obtain/install codec.
> Open source descriptions were available, and if it was an option, I  
> certainly would have
> paid to have the codec written in ECMAScript, and delivered it with the  
> media files.
>
> In that particular case, paying someone to write a decoder for one  
> particular, minority codec,
> would have been cheaper, and more correct, than paying for the  
> transcoding of 60 gigs of low bit-rate audio.
>
> Most media formats are lossy, making their current format, whatever the  
> encumbrance, the best solution.

Yes, re-encoding always lowers the quality, so this use case is something  
I would agree with.

>>>
>>> Additionally, in some cases, the programmer could work-around broken  
>>> codec implementations.
>>> It's forward-looking, it allows real backward compatibility and  
>>> interoperability across browsers.
>>>
>>> <canvas> allows for arbitrary, programmable video, <audio> should allow
>>> for programmable audio. Then, we can be codec neutral in our media  
>>> elements.
>>
>> While stressing that I don't think this should go into the spec until  
>> there's a proof-of-concept implementation that does useful stuff, is  
>> the idea to set audio.src=new MySynthesizer() and play()?  
>> (MySynthesizer would need to implement some standard interface.) You  
>> also have the question of push vs pull, i.e. does the audio source  
>> request data from the synthesizer when needed or does the synthesizer  
>> need to run a loop pushing audio data?
>>
> Well we really need to define what useful stuff is, you know, to set  
> that bar.

It really doesn't matter if you and agree on what it useful. If one  
browser implements an audio synthesis interface and it's good enough,  
others will follow and the spec work will begin.

> There are two use cases that I think are important: a codec  
> implementation (let's use Vorbis),
> and an accessibility implementation, working with a <canvas> element.
>
> I don't know what would qualify for accessibility. A topographical map,  
> which makes a lower or higher
> pitched hum, based on elevation (surrounding the pointer), is an example.
>
> On that same line of thinking, a hum of varying intensity signaling  
> proximity to a clickable element,
> (we're still talking about <canvas>) might be useful.  If there is no  
> sound in the right-channel,
> there are no elements to be clicked on, to the right of the pointer. If  
> it is a low-sound, then the
> element is rather far away.
>
> Site developers still need to put in the work. With a buffered audio  
> API, they'll at least
> have the option to do so.
>
> Can we come to an agreement as to what would constitute a reasonable  
> proof of concept?
> This is meant to allow <canvas> to be more accessible to the visually  
> impaired.
>
> Obviously, <audio src> tags could be used in many cases with <canvas>,  
> so our test-case
> should be one where <audio src> would be insufficient.
>
> Both of these use cases can be accomplished with a raw audio buffer.
> They do not need native channel mixing, nor toDataURL support.
>
> In the long term, I think those two options would be nice, but in the  
> short term, would just cause delays in adoption.
> As Robert has said, there are "much more important things to work on"
> ( https://bugzilla.mozilla.org/show_bug.cgi?id=490705 ).
>
>
> I think at this point, the model should play buffered bytes as they are  
> made available (if the buffer has anything, start playing it).
>
> I believe the "buffered" attribute can be used by the ECMAScript loop to  
> detect
> how much data is buffered, and whether it should continue decoding or  
> take other actions.
>
> The buffered audio API should be handled by the media API in a way  
> similar to streaming Web radio.
>
> There should be an origin-clean flag, for future use. One might  
> theoretically
> add audio into a currently playing stream. (regardless of toDataURL  
> support).
>
>
> Does this sound reasonable? What I'm requesting is an append-only raw  
> audio buffer, and an origin-clean flag (similar to <canvas>)
> to be added to the <audio> tag, if not the Media element interface, for  
> future use. The audio buffer plays immediately,
> if any data is available in it.

It sounds like fun to implement and fun to play with, although none of the  
details really matter at this point (and I shouldn't have asked). When  
(if) a first browser implements something like this, that API will likely  
become the standard (unless it's not useful and all other browsers ignore  
it).

> In v2, we would discuss Vlad's getAudioSampleData proposal, native  
> channel mixing (mix two audio streams, for whatever reason),
> and other effects that allow the more complex "audio editor" use case.  
> For now, let's just consider an "audio player"
> to support arbitrary audio codecs and address accessibility for the  
> visually impaired.
>
> We need Audio.appendBuffer, Audio.createBufferArray
> and an AudioBufferArray interface of some sort, and I think it's good to  
> go.
>
> The naming and arguments still need to be worked out.
>
> I'd enthusiastically support such an interface in Java, Flash and  
> .Net/Active X plugins.
> For the legacy/IE crowd.
>
> -Charles

-- 
Philip Jägenstedt
Core Developer
Opera Software