[whatwg] Video source selection based on quality (was: <video> feedback)

Mon Feb 15 15:07:16 PST 2010

Thanks for your insight Silvia.

On Wed, Feb 10, 2010 at 2:47 AM, Silvia Pfeiffer
<silviapfeiffer1 at gmail.com> wrote:
> Firstly, I think that explicit user choice isn't a problem.
>
> As a content provider, you have several means of doing this user choice:
>
> 1) You can provide in a single (admittedly javascript-based) video
> player interface an option to the user to switch between source files
> of different quality (bitrate, width x height, audio samplerate, and
> whatever other choices you make for differently encoded content).
> This is what YouTube does in their latest players, e.g. 360p / 480p / 720p
> choice (though this is not really a quality measure, but only a
> measure of width x height, but since the display size is not changed
> in YouTube, it actually is a quality setting).

I can *maybe* see this feature being a video player UI component (more
on why in a bit), though not a JS-based one. I imagine people with
slower computers/connections and/or in more restrictive environments,
who probably stand to benefit the most from this, would be more likely
to have JS off. Additionally, it would require document authors to
take on the responsibility of scripting their own content selection
algorithms (unless there's a standard library that everyone just
copies and pastes), which seems unnecessary given the fact that
resource selection is already capable of being done by the browser
and/or server.

Also, the choices don't *only* measure width x height, but also
indicate of how frames are scanned: "p" for progressive and "i" for
interlaced, which have a direct impact on both file size and perceived
image quality. Although, the label "p" is redundant in YouTube, since
AFAIK, it automatically de-interlaces whatever you upload.

The problems with making this a video UI component are that:

It'd be heavily abbreviated. YouTube is using an industry convention
that has only entered consumer parlance due to HDTV marketing (just
Google "720p vs. 1080i" or "1080p TrueHD"), which ONLY specifies
height and scan type, since the rest of the information is implied due
to engineering and broadcast standards, e.g. 1080p implies 1920x1080
29.97 or 23.976 fps progressive scan; 720p implies the same but at
1280x720. But "360p" doesn't imply anything, because there's no
standard that defines it, and once you hit the SD level (480p and
below), there are two different display sizes depending on the pixel
aspect ratio.

But even if we had a standard, YouTube further dilutes the meaning of
these abbreviations since they now also have a toggle button (depicted
as two arrows at a right angle) that expands or contracts the player
but leaves the quality setting the same. So if you select "360p", and
decide you want it to fill more of your screen, it will, but then it's
no longer 360 pixels tall because it's been scaled.

The alternative would be to specify the video information in full, or
in a partially-abbreviated form. But then you'd have to cram stuff
like "1920x1080p24 (Scaled to 1280x720)" into the UI, which crowds the
other controls and hinders the viewing experience.

The other thing is that so much goes into video. Yes,

> (bitrate, width x height, audio samplerate

go into it, but

> whatever other choices you make for differently encoded content).

covers a huge spectrum, as I previously outlined in the original
thread: <http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024520.html>.

All of this criteria is potentially useful, so maybe the goal for now
should be to prioritize them in the order of importance they'd be to
the average user and implement a high-ranking subset.

> 2) If users log in to use your content, you can ask them to provide a
> default setting, just like YouTube does it in their Account Settings
> (as Hugh describes below).

This may be fine for video portal sites, but not every page utilizes
logins. Most people just want to share a video they made or like with
their audience, the same way they would an image. And they may be
using a free blogging service that doesn't allow them to implement
additional features. I also find it impractical to require a login
system to be in place just to ask users to select a content quality
preference.

> 3) Even if users don't log in, you can have a button on the side of
> the video (and a once-off splashscreen if you so like), which allows
> users to set and change their preference and leave a cookie to
> remember their choice.

I'd be OK with cookies as an interim solution but they're not ideal,
since they'd require setting new preferences for every site visited
while the clients' connection and computational speeds would stay more
or less the same.

> All of this is based on the premise that the user either knows what
> their pipe and their computer can take, or experiments with it until
> he/she is happy.

Well, they don't always have to know off the top of their head. P2P
programs often allow one to run automated tests to estimate the speed
of his/her current connection. If similar functionality were
incorporated into browsers, it could be a "set it and forget it"
experience for those users who do not know. If the resulting video
quality isn't to their liking, and the content author offers
alternatives, the user could adjust the settings as you said and see
if the quality/playback speed tradeoff is worth it to them.

> Secondly, choosing the best video encoding format for a given
> user-server connection (and user device capabilities) is actually a
> really difficult decision to make automatically.
>
> Let's say we declare the quality in the <source> elements in some form
> or other (either an additional attribute or by addition to the media
> queries). Now we have to take this information into account in the
> source selection algorithm, since we are asking the UA to make a
> choice of which media source to use based on the quality information.
> The source selection algorithm goes through the list of <source>
> elements from top to bottom and stops at the first one that it is able
> to play. It does not check whether in that list there would be a
> better choice. Thus, we have to require from authors to build the list
> in a way that the highest quality content is put at the top of the
> list, while lower qualities are put further down.

Which is why I think it'd ultimately be best to update the source
selection algorithm to function non-linearly. I realize that may be
asking a lot at this stage in the game, but then again, it'd have to
be changed to incorporate quality values anyway.

> For example:
> <video>
>  <source src='video-hd.ogv' media='quality:1.0' type='video/ogg;
> codecs="theora, vorbis"'>
>  <source src='video-hq.ogv' media='quality:0.5' type='video/ogg;
> codecs="theora, vorbis"'>
>  <source src='video-sd.ogv' type='video/ogg; codecs="theora, vorbis"'>
> </video>
>
> Now, we need to devise an algorithm for UAs to determine which quality
> to choose based on the given computer/device and connection. This is
> not trivial, but let's assume we are able to do so and set
> * quality:1.0 to any connection >5Mbit, CPU >  2GHz, and
> * quality:0.5 to any connection > 1Mbit, CPU > 1.5GHz.
>
> This would be measured once during source selection and thus the
> choice made. But it's actually not a guarantee that it will work. If
> your connection degrades or your CPU gets busy with other
> applications, the choice may need to be revised. YouTube doesn't
> currently allow for this, so this kind of solution would replicate
> what YouTube does at this point - which doesn't seem to be such a bad
> thing, since YouTube is acceptable for most people.

Let's let the user worry about CPU, RAM, network traffic, etc. If they
have video playback problems after a best available choice has been
made, they can either change their settings, close other applications,
leave the site, or upgrade their computer. None of that can be done
automatically—well, unless the settings adjusted dynamically,
selecting lesser qualities if the playback speed fell below a certain
point. But I wouldn't ever want to make something like that a required
feature for conformance.

> An improvement over this would be the introduction of an adaptive
> stream scaling scheme over HTTP, similar to e.g. Microsoft's Smooth
> Streaming and Apple's HTTP Live Streaming (also note: Adobe is in the
> process of developing HTTP streaming support). There is no such thing
> available for Ogg yet, but the Ogg community is interested in
> developing/using something that is open and fulfills the needs for
> HTML5. It may well be that an activity should be taken up by the
> WHATWG (or W3C? or IETF?) to develop a media-format independent
> adaptive streaming standard over HTTP. The point about adaptive
> streaming is that it does not require any new HTTP headers to deliver
> the data or any new software on the HTTP server - the choice is made
> client-side by switching between different encodings of the same
> resource on the server. This requires declaration of the available
> alternative files to the client - which could either be done inside
> HTML5 or through some extra resource. Apple's scheme, for example,
> uses m3u-based files (m3u8), while MS's scheme uses SMIL-like files
> (ismv).
>
> Apple's scheme is already going through the IETF for standardisation
> as an informal RFC, but not through a working group. Apple's scheme is
> based on massive creation of small chunks (e.g. 10s duration) on the
> server - an overhead that could possibly be avoided by using W3C Media
> Fragment URIs. There are lots of things to discuss for such an
> activity and the WHATWG may not be the best forum for discussing this
> - though in the end it's up to the browser vendors to implement it, so
> maybe it would.
>
> Note that adaptive HTTP streaming deliberately avoids introducing new
> HTTP parameters and server requirements, because these are really
> difficult to roll out, in particular since they also create new
> requirements on HTTP proxy infrastructure.

So, it's basically RTSP, only not. That actually sounds perfect for
this, seeing as it wouldn't require much extra work on the part of the
content authors. But, then there's the issue of which scheme to
support; I imagine they're competing? :P

> If we develop such an adaptive streaming approach, the source
> selection algorithm would then select the default resource to stream
> from, while being given the option for adaptive streaming through the
> extra information (e.g. delivered through an extra attribute on the
> <source> elements, e.g. @adaptive="alternatives.xml"). There could
> then be dynamic switching between the files listed as alternatives in
> the @adaptive file.

I'd move to specify any extra information in-context or server-side
rather than by reference, simply because if we introduce an XML
settings file, it's one more syntax to learn, and file to host. Maybe
the resource pointed to by <source> is itself the settings file, akin
to an Apache type map. Or @src is a comma-delimited list of fallbacks
like CSS @font* properties.
Or <source> becomes non-empty and we introduce <alternative> allowed
as children elements.

> Incidentally, it may make more sense to expose the actual components
> of "quality" explicitly in media queries, just like they are
> explicitly exposed both in m3u8 and ismv, in particular bandwidth and
> resolution.
>
> Further, it needs to be considered that current media queries (see
> http://dev.w3.org/csswg/css3-mediaqueries/) are actually NOT about
> defining the features of a given resource, but about defining the
> features of a device. Some of the most relevant queries for a/v are :
> * min/max-device-width/height (rendering surface of output device)
> * min/max-width/height (targeted display area of output device)
> * aspect ratio / device-aspect-ratio
> * min/max-resolution (pixel density of output device)
> * tv / handheld / screen / aural / braille (devices)
>
> Thus, the width/height are already defined through media queries.
> Thus, mainly adding "bitrate" and "CPU" may be sufficient to define
> device qualities to distinguish default loaded media files.

I'd agree that those two are high-priority, but altogether I think it
would need to cover a bit more than those, namely:
"display-aspect-ratio" (or have "aspect-ratio" defined more explicitly
so that it accounts for pixel aspect ratio), "codec" (and possibly
"container"), and something to account for audio properties—at the
bare minimum.

> Note that YouTube uses width/height encoding parameters for
> distinguishing between different "quality" video encodings, so the
> media queries parameters width/height could potentially be used here
> in the same way.

I would stay away from that, for the reasons stated above.
Width/height are insufficient to describe the quality of a video. You
could encode a 1080p video with a really low bitrate and have it come
out smaller than a 480p video with a really high bitrate.

-Hugh