[whatwg] Question on Limits in Adaption Agency Algorithm

Mon Jul 1 15:53:14 PDT 2013

On Sat, 3 Nov 2012, Yasuhiko Minamide wrote:
> 
> This is about Adaption Agency Algorithm in 12.2.5.4.7 The "in body" 
> insertion mode.
> 
> Limits of loops in the adoption agency algorithm were introduced in 
> http://html5.org/tools/web-apps-tracker?from=5641&to=5642. However, the 
> limit for the inner loop introduces an unexpected behaviour for the 
> following fragment of an HTML document.
> 
> <b><i><a><s><tt><div></b>abc</b></div></tt></s></a>xyz</i>
> 
> This document is parsed into the following DOM tree.
> 
> <b>
>   <i>
>     <a>
>       <s>
>         <tt></tt>
>       </s>
>     </a>
>     "xyz"
>   </i>
> </b>
> <a>
>   <s>
>     <tt>
>       <div>
>         <b></b>
>         "abc"
>       </div>
>     </tt>
>   </s>
> </a>    
> 
> "xyz" is inserted as a child of <i> and the order between "abc" and "xyz" is 
> reversed in the tree. We would like to know whether this is an intended 
> behaviour of the specification.
> 
> We are aware that the similar reversal occurs in markup-in-tables.

On Sat, 3 Nov 2012, Adam Barth wrote:
> 
> The DOM you get when you hit the limits in the adoption agency algorithm 
> don't make a lot of intuitive sense.  Unfortunately, the limits are 
> necessary so that implementations don't end up having to do quadratic 
> work.  If this behavior is causing you trouble, you might want to run 
> your content through some sort of pre-processor that re-writes these 
> cases into valid HTML, which should get parsed in intuitive ways by user 
> agents.

On Sat, 8 Dec 2012, Ian Hickson wrote:
> 
> Yeah that's definitely not intentional.
> 
> Does anyone have any preference for how this is fixed?

On Wed, 12 Dec 2012, James Graham wrote:
> On Wed, 12 Dec 2012, Ian Hickson wrote:
> > On Wed, 12 Dec 2012, Henri Sivonen wrote:
> > > 
> > > Does it need to be fixed? That is, is it breaking real sites?
> > 
> > It reverses the order of text nodes. That's ridiculously unintuitive. 
> > Yes, I think it needs solving, even if it isn't hit by any sites.
> > 
> > (If it's hit by sites, it seems likely that they are breaking because 
> > of it. If it isn't, then we can safely change it regardless.)
> 
> Although changing it does introduce the possibility of unforeseen 
> regressions. Not that I have a strong opinion here, really.

I still think this should be fixed, but having studied it further, I'm not 
sure how to fix it. The problem is that the <i> element isn't cloned (note 
how the "abc" text doesn't end up in the <i>), but when we close all the 
other elements, we eventually get left with the <i> on the stack, and so 
the "xyz" text ends up appended to it, but it's still back in its original 
position, earlier in the tree.

One option would be to remove from the stack of open elements any elements 
that we are skipping when we bail out of the AAA.

Can anyone see a problem with doing that?

On Mon, 5 Nov 2012, Yasuhiko Minamide wrote:
> 
> I'm wondering whether the adoption agency algorithm without the limits 
> really has a quadratic complexity (with respect to the size of the 
> stack).
> 
> Even if we do not impose a limit on the inner loop, each node in the 
> stack of open elements is processed at most once.
> 
> - The inner loop processes the nodes between the formatting element
>   and the furthest block.
> - Then, the formatting element is moved below the furthest block.
> 
> Then, the nodes above the furthest block will not be processed anymore.
> 
> If we do not impose the limit on the outer loop, the step 4 may cause
> the quadratic behaviour. However, I think that it can be avoided by
> slightly revising the algorithm.
> 
> I'm working on the automated test generation for the HTML5 parser 
> specification and had the question when we try to understand the 
> specification precisely.

On Tue, 18 Dec 2012, Yasuhiko Minamide wrote:
> 
> The easiest fix will be just to remove the limit of the inner loop. 
> Unfortunately, this makes the complexity of current implementations of 
> the adoption agency algorithm in O(n^2). This is because "remove node 
> from the stack of open elements" is in O(n) on implementations as far as 
> I have checked.
> 
>  9.5 If node is not in the list of active formatting elements, then remove 
>      node from the stack of open elements and then go back to the step labeled 
>      inner loop.
> 
> * WebKit : stack of open elements is implemented as a singly-linked list.
> * Firefox : stack of open elements is implemented as an array.
>             (need to move elements in the array to remove)
> 
> I'm not sure whether this is ok or not. 
> (I thought the operation remove node was implemented in O(1). For 
> example, by using doubly linked list.)

The limits are really intended to reduce the memory consumption of the 
algorithm, not its running time. Elements are expensive, and this 
algorithm can create exponentially more elements than tags in the markup, 
when it's not limited.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'