[whatwg] video tag: pixel aspect ratio

Thu Nov 13 20:39:54 PST 2008

On Wed, 15 Oct 2008, Sander van Zoest wrote:
> > >
> > > <source pixelratio="10:11"> <!-- 525 composite NTSC -->
> > > <source pixelratio="59:54"> <!-- 625 composite PAL -->
> > > <source pixelratio="1018:1062"> <!-- 1920x1035 HDTV SMPTE RP 187-1995 -->
> >
> > Currently pixelratio is a floating point number, as in:
> >
> >   <source pixelratio="0.909090909"> <!-- 525 composite NTSC -->
> >   <source pixelratio="1.09259259"> <!-- 625 composite PAL -->
> >   <source pixelratio="0.958568738"> <!-- 1920x1035 HDTV SMPTE RP 187-1995 -->
> >
> > Is that not enough?
> 
> I hate to say it, but if it was enough, I wouldn't be commenting here. 
> It simply isn't accurate enough to store it as a float. Every respected 
> container stores the ratio as X x Y. See the PASP atom for example.

On Wed, 15 Oct 2008, Anne van Kesteren wrote:
> 
> How is not accurate? In terms of precision it shouldn't really matter...

I agree with Anne here. I've left the spec as is.

Note that the whole point here is to discourage people from using the 
attribute. Also, it is intended only for use by people who are trying to 
fix a broken video; they might not know the right value, so it doesn't 
really matter so much whether the value that is used is precise or not, 
just so long as it improves the video relative to what it would be if the 
value in the video data file itself was assumed.

On Wed, 15 Oct 2008, Sander van Zoest wrote:
> 
> We are talking video here. Precision is at its core. If you consider the 
> majority of the broken video out on the net today a good example of what 
> you want more of then, I see no reason to accurately define PAR. [...] 
> you can not accurately convert video on the fly if you do not have the 
> exact ratio. All this stems from the conversion from analog to digital 
> and in the analog world we did a lot of funky tricks to make things work 
> better on hardware of those days, but as our computers and electronics 
> in general get faster and faster, putting in inaccuracies can cause for 
> some seriously ill side effects, now and especially in the future.

The whole point here is that the video is wrong, and the author is trying 
to apply a last-minute hack to improve it a bit. If performance is an 
issue, then encode the video correctly and don't use the attribute at all.

On Wed, 15 Oct 2008, Jonas Sicking wrote:
> 
> I think if we make the syntax really simple like:
> 
> 1. Find the first ':'
> 2. Parse the value before as an integer
> 3. Parse the value after as an integer
> 
> then adding a new syntax is pretty cheap. Of course having separate 
> attributes is even cheaper. But if the "10:11" syntax is really common 
> then I think it might help authoring to use it.

The whole point is to discourage authors from using it, so actually we 
sort of _want_ it to not help authoring. :-)

On Wed, 15 Oct 2008, Eduard Pascual wrote:
> 
> The issue is that, for most (probably all cases), the limitations of 
> representation of floating numbers are a guarantee that the value will 
> be wrong. It is not the same 0.909090909 that 0.909090909090909090... 
> and that wouldn't still be the same that 10:11.

Assume that we have a video that is 10000 video pixels square. That's a 
100 megapixel video.

If we pretend that aspect ratio of the pixels is 11:11, then the video 
will, at 1:1 zoom, take 10000 horizontal pixels (and 10000 vertical 
pixels).

If we assume that the pixel ratio is 10:11 = 0.9090909090909090909090909, 
and we keep the vertical number of pixels at 10000, then it will be 
(rounding to 0dp) 9091 pixels wide. If we assume that the pixel ratio is 
just 10:11 = 0.90909 then it will STILL be (rounding to 0dp) 9091 pixels 
wide. In practice, videos are much smaller and displayed on much smaller 
displays. So, with all due respect, why is it not enough?

> Although the difference may seem negligible, simple image scaling 
> algorythms tend to yield horribel results for slight scaling (you can 
> try to render a PDF document at 101% zoom to see what I'm speaking 
> about; especially one with images). The alternative, bilinear or 
> trilinear scaling, could be too much of a strain for video: there is a 
> higly noticeable difference between running such algorythm once and 
> running it 25 or 30 times each second. If we add to the mix that pages 
> might be including several videos; that most probably there is also 
> audio playing along with the video, the computation cost of decoding, 
> and the fact that not every user on the web (probably, not even the 
> majority of them) uses a high end computer; then scaling needs to be 
> quick and simple enough to achieve decent rendering without mass-frying 
> CPUs.

Scaling a video to 9090 pixels wide or 9091 pixels wide isn't going to 
show the artifacts you are talking about.

> I still don't understand why the spec has to define each and every 
> parser algorythm (IMO, it should only define the syntax, and then the 
> implementation should define its own algorythm that parses that syntax 
> as defined)

We have to define the parsers because the syntax alone doesn't define the 
error handling behavior.

> but if that's the issue then a microsyntax can be perfectly avoided by 
> splitting the argument into two separate ones, such as pixelratiox and 
> pixelratioy.

We want to make this as cheap as possible, and adding more attributes 
doesn't make it cheaper.

For example, having two attributes introduces costs such as:

On Wed, 15 Oct 2008, Sander van Zoest wrote:
> 
> The only issue I have with splitting it into separate ones, is that we 
> need to ensure that both exist or none exist, having just X or just Y is 
> clearly confusing and should not be allowed.

On Wed, 15 Oct 2008, Kristof Zelechovski wrote:
>
> We could also say that specifying only one coordinate has no effect.  I 
> think the requirement that both attributes have to be present or absent 
> cannot be specified in a document type definition (if someone would like 
> to have one nevertheless).

On Wed, 15 Oct 2008, Peter Kasting wrote:
> 
> The entire problem is that it is not simple.  It is less simple to spec, 
> less simple to declare, less simple to parse, and less simple to test, 
> and there is zero real-world gain in it.  It is not a "hack" to note 
> that the floating-point precision available here is far higher than what 
> any display could ever manage and therefore any rounding errors occur at 
> levels many orders of magnitude to small to have any effect at all.

Going back to the original topic:

On Wed, 15 Oct 2008, Ralph Giles wrote:
> 
> It is enough. Sander and Eduard have provided excellent arguments why 
> the pixel aspect ratio, and especially the frame rate, should be 
> represented as rationals in video formats. But as an override for 
> already broken video streams compliance to best practice does not 
> justify another data type in html5.
> 
> To put Anne's comment another way, one needs a gigapixel display device 
> before the difference between 1.0925 (rounded to only 5 figures) and 
> 59/54 affects the behaviour of the scaling algorithm at all. There 
> aren't so many aspect ratios is common use--you're welcome to choose the 
> one nearest to the floating point value given if you think it's 
> important.

Agreed.

On Wed, 15 Oct 2008, Sander van Zoest wrote:
> 
> [...] having non-square pixels is not broken. If we go this route, we 
> might as well get rid of the distinction all together.

I think you may misunderstand what the attribute is for. It's only 
intended as an override for broken video files that don't have the correct 
pixel aspect ratio in the first place.

On Wed, 15 Oct 2008, Sander van Zoest wrote:
> 
> Following that logic, why add the attribute at all? If it is going to be 
> wrong, and require guessing to the nearest fraction by the user agent. 
> It is rarely going to be used. Why not just force people to transcode 
> the content to make it work correctly? Have them put it in a Ogg 
> container while they are at it?

This is intended to be a quick fix attribute for cases where the video is 
under someone else's control, e.g. if a blog wants to embed a YouTube 
video file that has the wrong ratio.

On Wed, 15 Oct 2008, Eric Carlson wrote:
>
> I agree that incorrectly encoded videos are annoying, but I don't think 
> we should have this attribute at all because I don't think it passes the 
> "will it be commonly used" smell test.
> 
> I am also afraid that it will difficult to use correctly, since you 
> frequently have to use clean aperture in conjunction with pixel aspect 
> ratio to get the correct display size.

For the videos we're talking about, just getting near the right ratio is 
probably all we can ask for -- we're not talking about professional video 
data here. We're talking misencoded YouTube videos where an embedder wants 
to fix the most egregious error before showing his friends the cat jumping 
off the side of the pool or something.

I agree that this is just a hack attribute, and I agree that it isn't 
going to be widely used. But I think it will be used enough to justify its 
existence. There are a surprisingly large number of misencoded videos on 
the Web, and plenty of people who care.

On Wed, 15 Oct 2008, Sander van Zoest wrote:
> 
> Certainly not. I forgot about the required crop. I am now even more 
> convinced it doesn't belong in the spec. Let the container handle this 
> detail.

Ideally it would; this is just intended for the case where it failed.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'