[whatwg] default audio upload format (was Fwd: The Media Capture API Working Draft)

David Singer singer at apple.com
Fri Sep 3 14:19:21 PDT 2010

I agree that if the server says it accepts something, then it should cover at least the obvious bases, and transcoding at the server side is not very hard.  However, I do think tht there needs to be some way to protect the server (and user, in fact) from mistakes etc.  If the server was hoping for up to 10 seconds of 8kHz mono voice to use as a security voice-print, and the UA doesn't cut off at 10 seconds, records at 48 Khz stereo, and the user forgets to hit 'stop', quite a few systems might be surprised (and maybe charge for) the size of the resulting file.

It's also a pain at the server to have to sample-rate convert, downsample to mono, and so on, if the terminal could do it.

On Sep 3, 2010, at 14:08 , Roger Hågensen wrote:

> On 2010-09-01 21:34, David Singer wrote:
>> seems like a comma-separated list is the right way to go, and that audio/* should mean what it says -- any kind of audio (whether that is useful or not remains to be seen).
>> I would suggest that this is likely to be used for short captures, and that uncompressed (such as a WAV file or AVI with PCM or u-law audio) should be the recommended format.
>> If your usage is for longer captures or more specific situations, then indicate a suitable codec.
>> Shouldn't there be statements about channels (mono, stereo, more), sampling rate (8 kHz speech, 16 kHz wideband speech, 44.1 CD-quality, 96 kHz bat-quality) and so on?
> Hmm! Channels, bits, frequency should be optional in my opinion, (and with a recommendation for a default, stereo 16bit 44.1KHz which is the legacy standard for audio in most formats I guess, or maybe 48KHz as most soundcards seem to be these days?)
> In most cases a service will either A. use it as it's received (since most computer systems can play back pretty much anything), or B. it's transcoded/converted into one or more formats by the service. (like Youtube does etc.)
> In other words I am assuming that if the server accept for example the WAV format then it actually fully support the WAV format (at least the PCM audio part). Ditto with MP3, AAC, Ogg, FLAC, Speex etc.
> So any quality, channels, bits, frequency specified in the accept would just be what the server prefers (suggested default, or for best quality/best effort scenario), but the full format should be supported and accepted if asked for.
> Now whether the service takes advantage of surround rear recording is up to the service, if it simply discards that, takes the front channels and mix them to mono then that is up to the service and the user to decide/negotiate about rather than the browser.
> -- 
> Roger "Rescator" Hågensen.
> Freelancer - http://EmSai.net/

David Singer
Multimedia and Software Standards, Apple Inc.

More information about the whatwg mailing list