[whatwg] Video source selection based on quality (was: <video> feedback)

Tue Feb 9 23:47:30 PST 2010

On Wed, Feb 10, 2010 at 1:03 PM, Ian Hickson <ian at hixie.ch> wrote:
[..]
> On Sat, 12 Dec 2009, Hugh Guiney wrote:
>>
>> So, in my first foray into preparing Theora/Vorbis content, for use with
>> <video>, I realized that I wasn't sure with what settings to encode my
>> materials. Should I:
>>
>> A.) Supply my visitors with the best possible quality at the expense of
>> loading/playback speed for people on slower connections
>>
>> B.) Just account for the lowest common denominator and give everyone a
>> low quality encode
>>
>> or
>>
>> C.) Go halfway and present a medium quality encode acceptable for "most
>> people"?
>>
>> A. is not legacy-proof, B. is not future-proof, and the C. is neither.
>> C. may sound like the most sensible solution, but even if I were to put
>> up something that worked for "most people" *right now*, as computers
>> become more capable and connections become faster, more visitors are
>> going to want higher-quality videos, meaning I'd have to stay on top of
>> the relevant trends and update my pages accordingly.
>>
>> Ideally, I would like to be able to simply encode a few different
>> quality variations of the same file and serve each version to its
>> corresponding audience.
>>
>> There are a few ways I could do this. One of the most obvious ways would
>> be to present different versions of the site, e.g. one for "slow
>> connections" and one for "fast connections" and have the user pick via a
>> splash page before entering, as was popular in '90s. But this is almost
>> certainly a faux pas today: it puts a wall between the user and my
>> content, and requires me to maintain two different versions of the site.
>> Hardly efficient.
>>
>> Another way would be to itemize each version of the file in a list, with
>> details next to them such as frame and file size, so the user could pick
>> accordingly. While this would probably be fine for downloads, it
>> completely defeats the point of embedded media.
>>
>> Alternatively, I could devise a script that prompts users for their
>> connection speed and/or quality preference, which (assuming they know
>> it) would then go through the available resources on the server and
>> return the version of the file I'd have allocated to that particular
>> response. But that would require either branching for every file
>> alternative of every video on my site in the script—or specifying the
>> quality in some other way that can be programmatically exploited;
>> perhaps using microdata, but then I'd be stuffing the fallback content
>> with name-value pairs, which isn't particularly accessible.
>>
>> Or, I could invent my own HTTP header and try to get everyone to use it.
>> Which is a lot to do for something like this, and isn't guaranteed to
>> work.
>>
>> None of these options seem particularly viable to me. Right now, the
>> HTML5 spec allows UAs to choose between multiple versions of a media
>> resource based on type. In the interest of making media more accessible
>> to users of varying bandwidth and processing power, and easier to
>> maintain for authors, I propose allowing the relative quality of each
>> resource to be specified for multiple-source media.
>>
>> You will notice that in Flash animations, there is a context menu option
>> to change the rendered quality between "High", "Medium", and "Low". Each
>> setting degrades or upgrades the picture, and requires less or more
>> computing power to process respectively. Additionally, some Flash video
>> authors elect to construct their own quality selection UI/scripting
>> within the video itself, allowing them to have a finer degree of control
>> over the presentation of the image.
>>
>> Similarly, YouTube has the ability to switch between standard quality,
>> high quality, and high definition videos based on users' preferences. In
>> the "Playback Setup" section of "Account Settings", you will find the
>> following options:
>>
>> "Video Playback Quality
>> Choose the default setting for viewing videos
>> * Choose my video quality dynamically based on the current connection speed.
>> * I have a slow connection. Never play higher-quality video.
>> * I have a fast connection. Always play higher-quality video when it's
>> available."
>>
>> If HTML video is to compete with Flash, or become implemented on as
>> wide a scale as YouTube <http://www.youtube.com/html5>, it makes sense
>> to allow for some sort of quality choice mechanism, as users will have
>> come to expect that functionality.
>>
>> This could be done by allowing an attribute on <source> elements that
>> takes a relative value, such as (or similar to) those specified in
>> HTTP <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.9>.
>> This attribute could be called "quality" or "qvalue" or just "q" (my
>> personal preference would be it that order decreasing), and be used as
>> such:
>>
>> <video controls>
>>   <source src='video-hd.ogv' quality='1.0' type='video/ogg;
>> codecs="theora, vorbis"'>
>>   <source src='video-hq.ogv' quality='0.5' type='video/ogg;
>> codecs="theora, vorbis"'>
>>   <source src='video-sd.ogv' type='video/ogg; codecs="theora, vorbis"'>
>> </video>
>>
>> In this case, video-hd.ogv (a high definition encode) would be the
>> author's preferred version, video-hq.ogv (a high quality standard
>> definition encode) would be less preferred than video-hd.ogv, but more
>> preferred than video-sd, and video-sd (a standard definition encode)
>> would be less preferred than both, since it lacks a quality attribute
>> and would thus be the equivalent of specifying "quality='0.001'".
>>
>> The UA could then have a playback setup that would allow the user to
>> specify how it should handle content negotiation for multiple-source
>> media. This could be based solely on the quality attribute if provided,
>> or if @type is also provided, also based on what content-type the user
>> prefers.
>
> Thank you for this detailed problem description and discussion of a
> suggested solution.
>
> I think my recommendation would be something similar to what you suggest
> above regarding an HTTP header, but more specific to the Content-Type
> header: a new MIME parameter similar to "codecs" that describes the power
> needed for playback, in terms of network bandwidth, CPU, etc. This could
> just be boiled down to a number, e.g. "1" for today's "low" and "2" for
> today's "high", with the number being increased over the years as we get
> better and better.
>
> Alternatively, we could extend Media Queries to specify the kind of CPU
> and bandwidth expected to be needed for a media resource. This would fit
> right into the Media Queries model.
>
> Or, of course, we could add an attribute to <source>, as you suggest.
>
> The best thing to do is to approach browser vendors directly (e.g. on
> their relevant mailing lists, like webkit-dev for WebKit, or the Mozilla
> newsgroups for Firefox), and see if they would be interested in doing
> something like this. The WHATWG FAQ gives some detail on this:
>
>   http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F

I've taken one particular issue out of the large feedback thread of
Ian (which, incidentally, I found a most interesting read).

I'd like to address the issue video source selection where a content
provider wants to stream the best quality video to the user based
either on maximising use of the pipe between him and the user, or
based on explicit user choice.

Firstly, I think that explicit user choice isn't a problem.

As a content provider, you have several means of doing this user choice:

1) You can provide in a single (admittedly javascript-based) video
player interface an option to the user to switch between source files
of different quality (bitrate, width x height, audio samplerate, and
whatever other choices you make for differently encoded content). This
is what YouTube does in their latest players, e.g. 360p / 480p / 720p
choice (though this is not really a quality measure, but only a
measure of width x height, but since the display size is not changed
in YouTube, it actually is a quality setting).

2) If users log in to use your content, you can ask them to provide a
default setting, just like YouTube does it in their Account Settings
(as Hugh describes below).

3) Even if users don't log in, you can have a button on the side of
the video (and a once-off splashscreen if you so like), which allows
users to set and change their preference and leave a cookie to
remember their choice.

All of this is based on the premise that the user either knows what
their pipe and their computer can take, or experiments with it until
he/she is happy.

Secondly, choosing the best video encoding format for a given
user-server connection (and user device capabilities) is actually a
really difficult decision to make automatically.

Let's say we declare the quality in the <source> elements in some form
or other (either an additional attribute or by addition to the media
queries). Now we have to take this information into account in the
source selection algorithm, since we are asking the UA to make a
choice of which media source to use based on the quality information.
The source selection algorithm goes through the list of <source>
elements from top to bottom and stops at the first one that it is able
to play. It does not check whether in that list there would be a
better choice. Thus, we have to require from authors to build the list
in a way that the highest quality content is put at the top of the
list, while lower qualities are put further down.

For example:
<video>
  <source src='video-hd.ogv' media='quality:1.0' type='video/ogg;
codecs="theora, vorbis"'>
  <source src='video-hq.ogv' media='quality:0.5' type='video/ogg;
codecs="theora, vorbis"'>
  <source src='video-sd.ogv' type='video/ogg; codecs="theora, vorbis"'>
</video>

Now, we need to devise an algorithm for UAs to determine which quality
to choose based on the given computer/device and connection. This is
not trivial, but let's assume we are able to do so and set
* quality:1.0 to any connection >5Mbit, CPU >  2GHz, and
* quality:0.5 to any connection > 1Mbit, CPU > 1.5GHz.

This would be measured once during source selection and thus the
choice made. But it's actually not a guarantee that it will work. If
your connection degrades or your CPU gets busy with other
applications, the choice may need to be revised. YouTube doesn't
currently allow for this, so this kind of solution would replicate
what YouTube does at this point - which doesn't seem to be such a bad
thing, since YouTube is acceptable for most people.

An improvement over this would be the introduction of an adaptive
stream scaling scheme over HTTP, similar to e.g. Microsoft's Smooth
Streaming and Apple's HTTP Live Streaming (also note: Adobe is in the
process of developing HTTP streaming support). There is no such thing
available for Ogg yet, but the Ogg community is interested in
developing/using something that is open and fulfills the needs for
HTML5. It may well be that an activity should be taken up by the
WHATWG (or W3C? or IETF?) to develop a media-format independent
adaptive streaming standard over HTTP. The point about adaptive
streaming is that it does not require any new HTTP headers to deliver
the data or any new software on the HTTP server - the choice is made
client-side by switching between different encodings of the same
resource on the server. This requires declaration of the available
alternative files to the client - which could either be done inside
HTML5 or through some extra resource. Apple's scheme, for example,
uses m3u-based files (m3u8), while MS's scheme uses SMIL-like files
(ismv).

Apple's scheme is already going through the IETF for standardisation
as an informal RFC, but not through a working group. Apple's scheme is
based on massive creation of small chunks (e.g. 10s duration) on the
server - an overhead that could possibly be avoided by using W3C Media
Fragment URIs. There are lots of things to discuss for such an
activity and the WHATWG may not be the best forum for discussing this
- though in the end it's up to the browser vendors to implement it, so
maybe it would.

Note that adaptive HTTP streaming deliberately avoids introducing new
HTTP parameters and server requirements, because these are really
difficult to roll out, in particular since they also create new
requirements on HTTP proxy infrastructure.

If we develop such an adaptive streaming approach, the source
selection algorithm would then select the default resource to stream
from, while being given the option for adaptive streaming through the
extra information (e.g. delivered through an extra attribute on the
<source> elements, e.g. @adaptive="alternatives.xml"). There could
then be dynamic switching between the files listed as alternatives in
the @adaptive file.

Incidentally, it may make more sense to expose the actual components
of "quality" explicitly in media queries, just like they are
explicitly exposed both in m3u8 and ismv, in particular bandwidth and
resolution.

Further, it needs to be considered that current media queries (see
http://dev.w3.org/csswg/css3-mediaqueries/) are actually NOT about
defining the features of a given resource, but about defining the
features of a device. Some of the most relevant queries for a/v are :
* min/max-device-width/height (rendering surface of output device)
* min/max-width/height (targeted display area of output device)
* aspect ratio / device-aspect-ratio
* min/max-resolution (pixel density of output device)
* tv / handheld / screen / aural / braille (devices)

Thus, the width/height are already defined through media queries.
Thus, mainly adding "bitrate" and "CPU" may be sufficient to define
device qualities to distinguish default loaded media files.

Note that YouTube uses width/height encoding parameters for
distinguishing between different "quality" video encodings, so the
media queries parameters width/height could potentially be used here
in the same way.

Regards,
Silvia.