[whatwg] Codecs for <audio> and <video>

Philip Jagenstedt philipj at opera.com
Tue Jul 7 13:10:54 PDT 2009

On Tue, 07 Jul 2009 17:52:29 +0200, Charles Pritchard <chuck at jumis.com>  

> Philip Jagenstedt wrote:
>> For all of the simpler use cases you can already generate sounds  
>> yourself with a data uri. For example, with is 2 samples of silence:  
> Yes you can use this method, and with the current audio tag and  
> autobuffer, it may work to some degree.
> We've used the data:audio/midi technique, and we've experimented with  
> audio/wav,
> using the data: injection work-around, does not currently work all that  
> well.
> It does not produce smooth transitions. We can use raw encoding instead  
> of base64 to save on
> cpu cycles, but it's still quite "hackish".
>> It might be worthwhile implementing the API you want as a JavaScript  
>> library and see if you can actually do useful things with it. If the  
>> use cases are compelling and require native browser support to be  
>> performant enough, perhaps it could go into a future version of HTML.
> Overall, we can not make near-real-time effects, nor jitter-free  
> compositions.
> We've used wav and midi in a JavaScript library, using the data: url  
> technique.
> The data: injection technique is inefficient, it's not workable.
> Opera has been championing Xiph codecs on this list, There are  
> ActionScript and Java Vorbis-players developed using the most basic of  
> APIs.
> Isn't that use-case compelling enough?

If the idea is to write a Vorbis decoder in JavaScript that would be quite  
cool in a way, but for vendors already implementing Vorbis it wouldn't  
really add anything. A pure JS-implementation of any modern audio codec  
would probably be a ridiculous amount of code and slow, so I doubt it  
would be that useful in practice.

For some use cases you could use 2 audio elements in tandem, mixing new  
sound to a new data URI when the first is nearing the end (although sync  
can't be guaranteed with the current API). But yes, there are things which  
can only be done by a streaming API integrating into the underlying media  

Here's the list of propsed features right out of a comment block in the  

     * frame forward / backwards / step(n) while paused
     * hasAudio, hasVideo, hasCaptions, etc
     * per-frame control: get current frame; set current frame
     * queue of content
       - pause current stream and insert content at front of queue to play  
       - pre-download another stream
       - add stream(s) to play at end of current stream
       - pause playback upon reaching a certain time
       - playlists, with the ability to get metadata out of them (e.g. xspf)
     * control over closed captions:
       - enable, disable, select language
       - event that sends caption text to script
     * in-band metadata and cue points to allow:
       - Chapter markers that synchronize to playback (without having to  
         the playhead position)
       - Annotations on video content (i.e., pop-up video)
       - General custom metadata store (ratings, etc.)
     * notification of chapter labels changing on the fly:
       - onchapterlabelupdate, which has a time and a label
     * cue points that trigger at fixed intervals, so that
       e.g. animation can be synced with the video
     * general meta data, implemented as getters (don't expose the whole  
       - getMetadata(key: string, language: string) => HTMLImageElement or  
       - onmetadatachanged (no context info)
     * external captions support (request from John Foliot)
     * video: applying CSS filters
     * an event to notify people of when the video size changes
       (e.g. for chained Ogg streams of multiple independent videos)
     * balance and 3D position audio
     * audio filters
     * audio synthesis
     * feedback to the script on how well the video is playing
        - frames per second?
        - skipped frames per second?
        - an event that reports playback difficulties?
        - an arbitrary quality metric?
     * bufferingRate/bufferingThrottled (see v3BUF)
     * events for when the user agent's controls get shown or hidden
       so that the author's controls can get away of the UA's

Your use cases probably fall under audio filters and synthesis. I expect  
that attention will turn to gradually more complex use cases when the  
basic API we have now is implemented and stable cross-browser and  

Philip Jägenstedt
Core Developer
Opera Software

More information about the whatwg mailing list