[whatwg] Codecs for <audio> and <video>

Wed Jul 1 14:45:10 PDT 2009

On Wed, Jul 1, 2009 at 4:06 PM, Jonas Sicking<jonas at sicking.cc> wrote:
[snip]
> I think the first bullet has been demonstrated to be false. The
> relative quality between theora and h.264 is still being debated, but
> the arguments are over a few percent here or there. Arguments that
> theora is simply not good enough seems based on poor or outdated
> information at this point.

I'm commenting here because I don't my own posts to be a source of
misinformation.

Depending on how and what you compare it's more than a few percent.

It turns out that H.264 as used in many places on web is within
spitting distance of the newer theora encoder due to encode side and
decode side computational complexity and compatibility concerns and
the selection of encoder software. For these same reasons there are
many 'older' formats still in wide use which Theora clearly
outperforms. The reality of what people are using puts the lie to
broad claims that Theora is generally unusable because it
under-performs the best available H.264 encoders in the lab.

Different uses and organizations will have different requirements.
Which is a good reason why HTML5 never required solutions to support
only one codec.

I do not doubt that there are uses which Theora is clearly inferior,
because of the mixture of tolerance for licensing, computational load,
intolerance for bitrate, requirements to operate at bits-per-pixel
levels below the range that theora operates well at, etc.  but it is
an enormous jump to go from "there are some uses" to apply the claim
to the general case, or to go from "it's needs some more bitrate to
achieve equivalent subjective quality" to remarks that the bitrate
inflation would endanger the Internet.  It was this kind of over
generalization that my commentary on Theora quality was targeting.

(And it should be absolutely unsurprising that at the limit Theora
does a somewhat worse off than H.264 in terms of quality/bits— it's an
older less CPU hungry design which is, from 50,000 ft, almost a strict
subset of H.264)

At the same time, we have clearly defined cases where H.264/AAC is
absolutely unacceptable. Not merely inferior, but completely
unworkable due to the licensing issues.

Different uses and organizations will have different requirements.
Different codecs will be superior depending on your requirements.
Which is a good reason why HTML5 never required solutions to support
only one codec.

But what I think is key is that the inclusion of Theora as a baseline
should do nothing to inhibit the parties which are already invested in
H.264, or whom have particular requirements which make it especially
attractive, from continuing to offer and use it.

The advantage of a baseline isn't necessarily that it's the best at
anything in particular, but that it's workable and mostly universal.
If when talking about a baseline you find yourself debating details
over efficiency vs the state of the art you've completely missed the
point.

This is a field which is still undergoing rapid development. Even if
codec-science were to see no improvements we will still see the state
of the art advance tremendously in the next years simply due to
increasing tolerance for CPU hungry techniques invented many years ago
but still under-used. Anything we use today is going to look pretty
weak compared to the options available 10 years from now.

It's important for a codec to be efficient, but the purpose of the
baseline is to be compatible. As such the relevant arguments should be
largely limited to workability, of which efficiency is only one part.

It was suggested here that MJPEG be added as a baseline.  I considered
this as an option for Wikipedia video support some years ago before we
had the Theora in Java playback working. I quickly determined that it
was unworkable for over-the-web use because of the bitrate: we're
talking about on the order of >10x the required bitrate over Theora
before considering the audio (which would also be >10x the bitrate of
Vorbis).

At lest for general public web use I think the hard workability
threshold could be fairly set as "can a typical consumer broadband
connection stream a 'web resolution' (i.e. somewhat sub-standard
definition) in real time with decent quality". Even though thats a
fairly vague criteria it seems clear that Ogg/Theora is well inside
this limit while MJPEG is well outside it.  Obviously different
parties will have different demands.

As far as I'm concerned spec might as well recommend a lossless codec
as MJPEG— at least lossless has advantages for the set of applications
which are completely insensitive to bitrate.