[whatwg] A standard for adaptive HTTP streaming for media resources

Thu Aug 19 18:08:25 PDT 2010

On Tue, 25 May 2010, Silvia Pfeiffer wrote:
> 
> We've in the past talked about how there is a need to adapt the bitrate 
> version of a audio or video resource that is being delivered to a user 
> agent based on the available bandwidth on the network, the available CPU 
> cycles, and possibly other conditions.
> 
> It has been discussed to do this using @media queries and providing 
> links to alternative versions of a media resources through the <source> 
> element inside it. But this is a very inflexible solution, since the 
> side conditions for choosing a bitrate version may change over time and 
> what is good at the beginning of video playback may not be good 2 
> minutes later (in particular if you're on a mobile device driving 
> through town).
> 
> Further, we have discussed the need for supporting a live streaming 
> approach such as RTP/RTSP - but RTP/RTSP has its own "non-Web" issues 
> that will make it difficult to make it part of a Web application 
> framework - in particular it request a custom server and won't just work 
> with a HTTP server.
> 
> In recent times, vendors have indeed started moving away from custom 
> protocols and custom servers and have moved towards more intelligence in 
> the UA and special approaches to streaming over HTTP.
> 
> Microsoft developed "Smooth Streaming", Apple developed "HTTP Live 
> Streaming" and Adobe recently launched "HTTP Dynamic Streaming". (Also 
> see a comparison at). As these vendors are working on it for MPEG files, 
> so are some people for Ogg. I'm not aware anyone is looking at it for 
> WebM yet.
> 
> Standards bodies haven't held back either. The 3GPP organisation have 
> defined 3GPP adaptive HTTP Streaming (AHS) in their March 2010 release 9 
> of 3GPP. Now, MPEG has started consolidating approaches for adaptive 
> bitrate streaming over HTTP for MPEG file formats.
> 
> Adaptive bitrate streaming over HTTP is the correct approach towards 
> solving the double issues of adapting to dynamic bandwidth availability, 
> and of providing a live streaming approach that is reliable.
> 
> Right now, no standard exists that has been proven to work in a 
> format-independent way. This is particularly an issue for HTML5, where 
> we want at least support for MPEG4, Ogg Theora/Vorbis, and WebM.
> 
> I know that it is not difficult to solve this issue in a 
> format-independent way, which is why solutions are jumping up 
> everywhere. They are, however, not compatible and create a messy 
> environment where people have to install solutions for multiple 
> different approaches to make sure they are covered for different 
> platforms, different devices, and different formats. It's a clear 
> situation where a new standard is necessary.
> 
> The standard basically needs to provide three different things:
> * authoring of content in a specific way
> * description of the alternative files on the server and their
> features for the UA to download and use for switching
> * a means to easily switch mid-way between these alternative files

On Mon, 24 May 2010, Chris Holland wrote:
> 
> I don't have something decent to offer for the first and last bullets 
> but I'd like to throw-in something for the middle bullet:
> 
> The http protocol is vastly under-utilized today when it comes to URIs 
> and the various Accept* headers.
> 
> Today developers might embed an image in a document as chris.png. Web 
> daemons know to find that resource and serve it, in this sense, 
> chris.png is a resource locator.
> 
> Technically one might reference the image as a resource identifier named 
> "chris". The user's browser may send "image/gif" as the only value of an 
> accept header, signaling the following to the server: "I'm supposed to 
> download an image of chris here, but I only support gif, so don't bother 
> sending me a .png". In a perhaps more useful scenario the user agent may 
> tell the server "don't bother sending me an image, I'm a screen reader, 
> do you have anything my user could listen to?". In this sense, the 
> document's author didn't have to code against or account for every 
> possible "context" out there, the author merely puts a reference to a 
> higher-level representation that should remain forward-compatible with 
> evolving servers and user-agents.
> 
> By passing a list of accepted mimetypes, the accept http header provides 
> this ability to serve context-aware resources, which starts to feel like 
> a contender for catering to your middle bullet.
> 
> To that end, new mime-types could be defined to encapsulate media 
> type/bit rate combinations.
> 
> Or the accept header might remain confined to media types and acceptable 
> bit rate information might get encapsulated into a new header, such as: 
> X-Accept-Bitrate .
> 
> If you combined the above approach with existing standards for http byte 
> range requests, there may be a mechanism there to cater to your 3rd 
> bullet as well: when network conditions deteriorate, the client could 
> interrupt the current stream and issue a new request "where it left off" 
> to the server. Although this likel wouldn't work because a byte range 
> request would mean nothing on files of two different sizes. For 
> playbacked media, time codes would be needed to define range.

On Tue, 25 May 2010, Silvia Pfeiffer wrote:
> 
> That's not quite sufficient, actually. You need to know which byte range 
> to retrieve or which file segment. Apple solved it by introducing a m3u8 
> file format, Microsoft by introducing a SMIL-based server manifest file, 
> Adobe by introducing a XML-based Flash Media Manifest file F4M. That 
> kind of complexity canot easily be transferred through HTTP headers.
>
> The idea of the manifest file is to provide matching transition points 
> between the different files of different bitrate to segments or byte 
> ranges. This information has to somehow come to the UA (amongst other 
> information as available in typical manifest files). I don't think that 
> can be achieved without a manifest file.

On Fri, 28 May 2010, Jeroen Wijering wrote:
> 
> Indeed, one such key condition is the current dimensions of the video 
> window. Tracking this condition allows user-agents to:
> 
> *) Not waste bandwidth, e.g. by pushing a 720p video in a 320x180 video 
> tag.
>
> *) Respond to changes in the video display, e.g. when the video is 
> switched to fullscreen playback.
> 
> Providing the different media options using <source> elements might 
> still work out fine, if there's a clearly defined API that covers all 
> scenarios. A rough example:
> 
> <video>
>   <source bitrate="100" height="120" src="video_100.mp4" type="video/mp4; codecs='avc1.42E01E, mp4a.40.2'; keyframe-interval='00:02'" width="160">
>   <source bitrate="500" height="240" src="video_500.mp4" type="video/mp4; codecs='avc1.42E01E, mp4a.40.2'; keyframe-interval ='00:02'" width="320">
>   <source bitrate="900" height="540" src="video_900.mp4" type="video/mp4; codecs='avc1.42E01E, mp4a.40.2'; keyframe-interval ='00:02'" width="720"> 
> </video>
> 
> This example would tell the user-agent that the three MP4 files have a 
> keyframe-interval of 2 seconds - which of course raises the issue that 
> fixed keyframe-intervals would be required.
> 
> The user-agent can subsequently use e.g. the Media Fragments API to 
> request chunks, switching between sources as the conditions change.

It seems to me that we are not lacking in solutions in this space -- it 
would behoove us to try to leverage the existing solutions rather than 
making up new ones. Have the above solutions been tried in browsers?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'