[whatwg] HTTP adaptive streaming of video (was Re: Video feedback)
watsonm at netflix.com
Tue Jun 21 03:01:23 PDT 2011
On Jun 21, 2011, at 3:32 AM, Silvia Pfeiffer wrote:
> Moving this to a different subject, since it's all about adaptive streaming now.
> On Tue, Jun 21, 2011 at 1:43 AM, Mark Watson <watsonm at netflix.com> wrote:
>> On Jun 20, 2011, at 5:28 PM, Silvia Pfeiffer wrote:
>>> On Tue, Jun 21, 2011 at 12:07 AM, Mark Watson <watsonm at netflix.com> wrote:
>>>> On Jun 20, 2011, at 11:52 AM, Silvia Pfeiffer wrote:
>>>>> On Mon, Jun 20, 2011 at 7:31 PM, Mark Watson <watsonm at netflix.com> wrote:
>>>>> The way in which HTML deals with different devices and their different
>>>>> capabilities is through media queries. As a author you provide your
>>>>> content with different versions of media-dependent style sheets and
>>>>> content, so that when you view the page with a different device, the
>>>>> capabilities of the device select the right style sheet and content
>>>>> for display on that device. Opera has an example on how to use this
>>>>> here: http://dev.opera.com/articles/view/everything-you-need-to-know-about-html5-video-and-audio/
>>>>> (search for "Media Query").
>>>>> I believe that this mechanism should also work for adaptive streaming,
>>>>> such that you provide multiple alternative media resources through the
>>>>> <source> element, each of which has a @media attribute that says what
>>>>> device capabilities that particular resource is adequate for. Except
>>>>> that the "media resource" provides alternative bitrate files for that
>>>>> case. I do not see a need to move this functionality into the adaptive
>>>>> streaming file.
>>>>> Nice to get started on this discussion about adaptive streaming. ;-)
>>>> So, what I said above is that there is no rationale for a hierarchy. What I mean is that if I have ten encodings of a video, I should just list those ten in a flat list somewhere, annotated with their properties. The device knows best what it can support, what's appropriate etc. The key point is that I get this list without having to download the actual media.
>>>> It's not a good idea to split up that list of ten into sub-lists "intended" for different devices, because then I am making assumptions about what kinds of devices there are and what they need. But it is the devices that know best. In DASH people often proposed splitting the list into "Handheld", "SD" and "HD", but then there are devices that happily cope with resolutions that span those categories. Consider a few such devices and you find you need finer granularity. Since we're talking about a tiny amount of descriptive metadata, its much simpler just to list them all in one flat list.
>>>> So then there is the question of where this "flat" list should be: in the HTML or in an adaptive streaming manifest ?
>>>> Here we have a genuine functional overlap. HTML provides information which drives a resource selection function which considers many things, such as container types, codecs and everything which can be expressed in Media Queries. Adaptive streaming manifests also provide the same information for the same "selection" purpose plus additional information supporting adaptive streaming as well. Much of the information which drives selection is also needed for adaptation. Also there is no strict split between "adaptation" and "selection": the capabilities of clients may differ in terms of what they can seamlessly switch and what they can't.
>>>> So, in integrating HTML and adaptive streaming we have to define the interactions between these overlapping selection functions - we cannot get away from this functional overlap.
>>>> I think it would be a bad idea to try and re-invent adaptive streaming in HTML itself. A lot of work has been done on this over the past few years and anything HTML starts from scratch will be way behind. For my part I would like to see adaptive streaming defined in a way which is independent of the presentation layer technology, so adaptive streams can be constructed which play both in HTML and in other places.
>>>> The consequence is that we should not assume that an "adaptive stream" (for want of a better term) will be split up into multiple sources when used in HTML. Of course people can do this: if you want to provide 4 separate adaptive streams and use media queries to have the client select which one to play, that's fine, but we must also consider the case where everything is in one manifest.
>>> Note that this is not what I suggested. I just believe we cab use both
>>> approaches: Media Queries and adaptive streams. With media queries we
>>> can more easily pre-select the first stream that is picked to be more
>>> appropriate for the device that is being used, and we can make more
>>> appropriate alternative streams. For example, if we use markup as
>>> <video controls>
>>> <source src="manifest1_ogv" media="min-device-height:720px" type="video/ogg">
>>> <source src="manifest2_ogv" media="max-device-height:720px" type="video/ogg">
>>> <source src="manifest1_mp4" media="min-device-height:720px" type="video/mp4">
>>> <source src="manifest2_mp4" media="max-device-height:720px" type="video/mp4">
>>> then we can have Ogg Theora or MP4 videos of different bandwidth and
>>> screen size in manifest1 to manifest2 and the device will itself
>>> decide which of the two first its screen height and then stick with
>>> the streams in that manifest.
>> Right, and this is exactly the problem. What about a device where it's appropriate to range from 420px to 1080px ?
> Surely you'd start with the highest resolution and try to keep it as
> high as possible. So, it would pick the manifest1 option. Within that
> option would be a whole bunch of alternatives at different bitrates
> that provide enough choice for adaptability.
Ah, are you saying that manifest1 is a *superset* of manifest2 ? i.e. it contains all the lower rate streams as well ?
>>> If a group of files are from the start
>>> excluded from being useful for a particular device because they are
>>> unfit, it will make the switching much faster, so this is a good
>> I'm not sure how it will make switching faster.
>> Certainly, the first thing a client should do with a manifest is exclude the options which are not useful for that device, based on its capabilities.
> It would be faster because this step would not be necessary, since it
> has already been taken and been removed from the list of choices.
>> Whether that is done in HTML based on Media Queries, or within the adaptive streaming client based on information in the manifest makes no difference in terms of speed, except that your manifests above are smaller than a manifest containing all the streams. But manifests are anyway pretty small.
> It does make a difference, because adaptation has to choose from all
> the available acceptable options and if the set of acceptable
> alternatives has been made smaller, the algorithm to choose can focus
> on less parameters, be simpler, and therefore faster.
Firstly, we're talking about processing small data sets with at most 10s of entries and a handful of parameters, so we're talking microseconds of processing time here, maybe small numbers of milliseconds on constrained devices.
Secondly, there's two steps. A first step "prunes" the available choices based on device capabilities. This step takes the same time whether it's done based on media queries at the HTML level or based on manifest annotations (possible media queries would be slower, because in their generality they are rather more complex than the annotations in a manifest).
The second step is what you describe above, where the adaptation process makes continuous choices amongst the remaining options. This is the same speed in both case.
>> Having separate sources for different container formats makes more sense, because it's unlikely anyone would ever support adaptive switching between container formats (though its certainly logically meaningful and technically possible). But still if I want to create an adaptive stream that is useful also in non-HTML contexts I would put all the container formats into one manifest.
> You can always concatenate them if that is required.
Well, that depends on the Manifest format - bytewise concatenation would only work if the format supported it. Nevertheless, combination of multiple manifests is always easy. And I might still prefer to manage a single, combined, manifest for each content item, rather than have a bunch of sliced and diced versions to manage.
But actually, all I am saying is that it should not be a *design assumption* that there must be separate manifests for these different cases. Any design which supports a single combined manifest also supports smaller, targeted, ones which can be used as you propose above. In our system we effectively custom-craft a manifest for each individual device on request. That model needs to be supported, but not required.
More information about the whatwg