[whatwg] Web-based dynamic audio apps - WAS: Re: video tag : loop for ever

Dr. Markus Walther walther at svox.com
Fri Oct 17 03:38:56 PDT 2008


Eric Carlson wrote:
>>
>> Imagine e.g. an audio editor in a browser and the task "play this
>> selection of the oscillogram"...
>>
>> Why should such use cases be left to the Flash 10 crowd
>> (http://www.adobe.com/devnet/flash/articles/dynamic_sound_generation.html)?
>>
>>
>> I for one want to see them become possible with open web standards!
>>
>   I am anxious to see audio-related web apps appear too, I just don't
> think that including 'start' and 'end' attributes won't make them
> significantly easier to write.

I did a tiny prototype of the above use case - audio editor in a browser
- and it would have been significantly easier to write, if not Apple's
Safari had a bad and still unfixed bug in the implementation of 'end'
... (http://bugs.webkit.org/show_bug.cgi?id=19305)

> 
>> In addition, cutting down on number of HTTP transfers is generally
>> advocated as a performance booster, so the ability to play sections of a
>> larger media file using only client-side means might be of independent
>> interest.
>>
>   The 'start' and 'end' attributes, as currently defined in the spec,
> only limit the portion of a file that is played - not the portion of a
> file that is downloaded.

I know that, but for me that's not the issue at all.

The issue is _latency_. How long from a user action to audible playback
- that's what's relevant to any end user.

You can't do responsive audio manipulation in the browser without fast,
low-latency client-side computation. All the server-side proposals miss
this crucial point.

For another use case, consider web-based tools for DJs, for mixing and
combining audio clips. There's a lot of clips on the web. But if
manipulating them is not realtime enough, people won't care.

For another use case, consider web-based games with dynamic audio, etc.

Robert O'Callahan wrote:
> On Fri, Oct 17, 2008 at 5:24 AM, Dr. Markus Walther <walther at svox.com
> <mailto:walther at svox.com>> wrote:
>
>     Imagine e.g. an audio editor in a browser and the task "play this
>     selection of the oscillogram"...
>
>     Why should such use cases be left to the Flash 10 crowd
>
(http://www.adobe.com/devnet/flash/articles/dynamic_sound_generation.html)?
>
>
> If people go in that direction they won't be using cue ranges etc,
> they'll be using dynamic audio generation, which deserves its own API.

And I proposed the beginnings of such an API in several postings on this
list under the topic 'audio canvas', but it seemingly met with little
interest. Now Flash 10 has some of the things I proposed... maybe that's
a louder voice?

> OK, in principle you could use <audio> with data:audio/wav, but that
> would be crazy. Then again, this is the Web so of course people will do
> that.

I did exactly that in my tiny audio-editor prototype for
proof-of-concept purposes - I guess I must be crazy :-) Actually it was
partly a workaround for browser bugginess, see above.

Give me an API with at least

float getSample(long samplePosition)

putSample(long samplePosition, float sampleValue)

play(long samplePositionStart, unsigned long numSamples),

and sanity will be restored ;-)

The current speed race w.r.t. the fastest JavaScript on the planet will
then take care of the rest.

Silvia Pfeiffer wrote:
>
> Linking to a specific time point or section in a media file is not
> something that needs to be solved by HTML. It is in fact a URI issue
> and is being developed by the W3C Media Fragments working group.
>
> If you use a URI such as http://example.com/mediafile.ogv#time=12-30
> in the src attribute of the video element, you will not even have to
> worry about "start" and "end" attributes for the video element.

Unless Media Fragments can be a) set dynamically for an already
downloaded media file _without triggering re-download_,  b) time
specification can be accurate to the individual sample for the case of
audio, c) W3C finishes this quickly enough and d) browsers take the W3C
recommendation seriously, it is not an alternative for my use cases.

It's all about dynamic audio and the future. By the time the spec hits
the market, static media is not the only thing on the web anymore.

Jonas Sicking wrote:
>
> The problem with relying on cues is that audio plays a lot faster than
> we can gurentee that cue-callbacks will happen. So if you for example
> create a audio file with a lot of sound effects back to back it is
> possible that a fraction of a second of the next sound will play before
> the cue-callback is able to stop it.

If I understand this correctly, cue callback delay would potentially
make it impossible to have precise audio intervals, needed for the above
use cases.

But _then_ replacing 'start' and 'end' with cue ranges and 'currentTime'
is NOT possible, because they are no longer guaranteed to be equivalent
in terms of precision.

It seems the arguments are converging more towards keeping 'start' and
'end' in the spec.

-- Markus



More information about the whatwg mailing list