[whatwg] Audio related feedback from years past

Thu Apr 26 14:59:05 PDT 2012

I respond below to a number of e-mails regarding audio features in HTML. 
The broad picture is that for effects, I recommend we defer to the work 
being done at the W3C in the public-audio mailing list.

On Sat, 8 Aug 2009, Chris McCormick wrote:
> On Wed, Jul 08, 2009 at 09:24:42AM -0700, Charles Pritchard wrote:
> > There are two use cases that I think are important: a codec 
> > implementation (let's use Vorbis), and an accessibility 
> > implementation, working with a <canvas> element.
> 
> Here are a few more use-cases that many people would consider just as 
> important:
> 
> * Browser based music software and synthesis toys.
> * New types of 'algorithmic' music like that pioneered by Brian Eno.
> * Browser based games which want to use procedural audio instead of
> pre-rendered sound effects.
>
> What is really needed is a DSP vector processor which runs outside of 
> ECMA script, but with a good API so that the ECMAscripts can talk to it 
> directly. Examples of reference software, mostly open source, which do 
> this type of thing follow:
> 
> * Csound
> * Supercollider
> * Pure Data
> * Nyquist
> * Chuck
> * Steinberg VSTs
> 
> I am going to use the terms "signal vector", "audio buffer", and "array" 
> interchangeably below.
> 
> Four major types of synthesis would be useful, but they are pretty much 
> isomorphic, so any one of them could be implemented as a base-line:
> 
> * Wavetable (implement vector write/read/lookup operators)
> * FM & AM (implement vector + and * operators)
> * Subtractive (implement unit delay from which you can build filters)
> * Frequency domain (implemnt FFT and back again)
> 
> Of these, I feel that wavetable synthesis should be the first type of 
> synthesis to be implemented, since most of the code for manipulating 
> audio buffers is already going to be in the browsers and exposing those 
> buffers shouldn't be hugely difficult. Basically what this would take is 
> ensuring some things about the audio tag:
> 
> * Supports playback of arbitrarily small buffers.
> * Seamlessly loops those small buffers.
> * Allows read/write access to those buffers from ECMAscript.
> 
> Given the above, the other types of synthesis are possible, albeit 
> slowly. For example, FM & AM synthesis are possible by adding 
> adding/multiplying vectors of sine data together into a currently 
> looping audio buffer. Subtractive synthesis is possible by adding 
> delayed versions of the data in the buffer to itself. Frequency domain 
> synthesis is possible by analysing the data in the buffer with FFT (and 
> reverse FFT) and writing back new data.

On Mon, 1 Feb 2010, Chris McCormick wrote:
> 
> Whilst I haven't had the time to do this myself, I did hear about the 
> perfect example use-case for what I was getting at. Someone required a 
> very small flash applet just to do the last javascript-to-audio bit of 
> synthesis. Everything else was done in Javascript.
> 
> <http://stockholm.musichackday.org/index.php?page=Webloop>
> 
> "Since almost no browser is able to output sound directly from 
> javascript, I currently use a small flash applet to push the sound to 
> your speakers, I hope you don't mind."
> 
> I think I speak for all procedural audio people when I say, can't we get 
> the browsers to allow sample-block access to audio?

This kind of thing is now handled by the various MediaStream Processing 
API and Web Audio API efforts:

   https://dvcs.w3.org/hg/audio/raw-file/tip/streams/StreamProcessing.html
   https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html

On Tue, 9 Feb 2010, Toni Ruottu wrote:
> 
> I have been working around a "game music" composing application called 
> Kunquat, which is completely unrelated to web. Discussing various 
> details of the said desktop application has however, awaken me to think 
> about related issues. At times we have discussed implementing similar 
> pieces of music software for the browser environment. With Google Chrome 
> OS coming up, being able to create music in a browser environment is 
> becoming increasingly, but similar interfaces are also needed for 
> implementing games, or writing players for formats unsupported by 
> browsers.
> 
> I am excited about html5 supporting audio files on web sites, at the
> same time I am worried about how the more general need of generating
> sound. It seems to me that it has been forgotten from the
> specification. Lets consider a case where one wants to create a simple
> web instrument that will produce a sine wave, send it to the speakers,
> and let user alternate the parameters used for producing the
> continuous sound. With the current model, one wishing to do so is in
> trouble. First of all she will need a java-script library for turning
> the wave into an ogg file. Then she needs to turn the vorbis file into
> a data url and add it as an audio element to the web page. This is
> still tolerable, but trying to add the next chunk of sound with
> another audio element at the exact right time for the sound to be
> continuous is the real killer.
> 
> The lack of a simple audio outputting method leads to hacks, such as the 
> one used in jsnes ( http://benfirshman.com/projects/jsnes/ ). Jsnes is a 
> web application used for playing old NES games. It currently outputs 
> sound by having a separate flash application read the sound from some 
> variable. Other than that the application is java-script. Starting flash 
> for playing the sound seems to drain lots of resources atleast on my 
> computer. I bought my computer ~1 year ago. Yet I can not play the games 
> with sound turned on using Chrome, which is supposed to have the best 
> js-engine currently available. Without sound the game runs fine with 
> plenty of cpu cycles left, but turning sound on changes the situation 
> completely. Thus I believe that a standard way for producing sound would 
> help both software users and developers.

This sounds like exactly the kind of thing the efforts listed above now 
address.

On Tue, 11 May 2010, Eoin Kilfeather wrote:
> 
> A Google search on the discussion list returns the unanswered question 
> from Keith Bauer 
> (http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024340.html) 

Actually that e-mail received a reply back in February 2010:

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2010-February/025028.html

> I'm working on an application which has two separate channels of sound 
> panned fully left and right in a stereo audio file. The application is 
> for language learners and it would be very useful to set the audio pan 
> position fully left or right to effectively "mute" one speaker. As the 
> draft spec stands the only solution that presents itself is to split the 
> stereo file into two and load two <audio> elements in the HTML. I 
> haven't tried this yet but I suspect it will be a headache. Any chance 
> of reviving panning?

For the time being, I recommend relying on the APIs I mention above (or 
rather, their more widely implemented descendants) for panning too. 
However, if there are compelling use cases that would just need panning 
and nothing else, maybe it makes sense to add to the core media API too. 
Can you elaborate on your use case? Are they really two separate audio 
files that just happened to have been mixed together?

On Fri, 28 May 2010, Charles Pritchard wrote:
> 
> I'm exploring programmable MIDI, and would like to generate some 
> discussion.
> 
> Currently: <audio src="data:audio/midi;base64,...."></audio> is a valid 
> way of generating a MIDI file; and if the browser actually supports 
> midi, it can result in a playable stream.
>
> Live midi requires a data stream.

I haven't added anything to enable MIDI control or playback from a Web 
page. It would be interesting to find out how much demand there is for 
this and whether the specs mentioned above will take care of this 
particular issue. So far, I have not seen much demand for it, but that 
doesn't mean we shouldn't eventually handle it.

On Sat, 8 Aug 2009, Chris McCormick wrote:
> 
> I'd like to reiterate the previously expressed sentiment that only 
> implementing pre-rendered audio playback is like having a browser that 
> only supports static images loaded from the server instead of animations 
> and <canvas> tags.

So what you're saying is that we should wait ten years from when we added 
<audio> before adding a way to do dynamic audio, right? Like we did with 
<img> and <canvas>. :-)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'