[whatwg] parsing nested forms
Ian Hickson
ian at hixie.ch
Mon Dec 1 19:06:15 PST 2008
On Thu, 6 Nov 2008, Tommy Thorsen wrote:
>
> Before I get to the real issue, I think I should give you a little bit
> of background. I'm working for a company which makes a web browser.
> We've been having some problems with our algorithm for parsing illegal
> html, so we decided to scrap the whole module and implement the
> algorithm exactly as outlined in the html5 spec. So far this has been a
> great success. We're already way better than we used to be, but there
> are some situations where the html5 parsing algorithm does not quite
> give us the result we expected.
This is great feedback!
> Yesterday I noticed that we were not displaying the site
> http://bankrate.com correctly. The problem we had on that page boils
> down to the following markup:
>
> <div id="firstdiv">
> A
> <div id="seconddiv">
> <form id="firstform">
> <div id="thirddiv">
> <form id="secondform"></form>
> </div>
> </form>
> </div>
> B
> </div>
>
> I'll walk you through it; Everything is normal until we reach the start
> tag for the "secondform". It is ignored, since we're already in a form
> (the form element pointer points to "firstform".) Then we see the end
> tag which was meant for "secondform". We pop elements from the stack of
> open elements until we find a form element (which is "firstform")
> popping off "thirddiv" in the process. The next token we get is the end
> div tag which was meant for "thirddiv". Since "thirddiv" is already
> gone, we pop "seconddiv" instead, and now we're sort of off-balance. The
> result is that A and B does not end up as children of the same div.
>
> I've applied a fix to our code which makes us handle this particular
> case better. I haven't tested it very thoroughly, but the change is to
> implement the 'An end tag whose tag name is "form"' section in "in body"
> as if it said:
>
> ------
> An end tag whose tag name is "form"
>
> Let /node/ be the form element pointer
> Set the form element pointer to null.
>
> If the stack of open elements does not have an element in scope with the
> same tag name as that of the token, then this is a parse error; ignore the
> token.
>
> Otherwise, run these steps:
>
> 1. Generate implied end tags.
> 2. If the current node is not an element with the same tag name as that
> of the token, then this is a parse error.
> 3. Remove /node/ from the stack of open elements
> ------
>
> This seems to give us pretty much the same behaviour as Opera for the
> simple example above. Can any of you see any potential problems with
> this approach? In any case, I do believe that the specification needs to
> be changed one way or another, so that it handles this case better.
I concurr that this is closer to what we need. I have updated the spec
accordingly.
> I think I have a couple of other instances where we've had to deviate
> from the specification in order to tackle problems discovered by our
> testers, and if any of you are interested in this kind of feedback, I'll
> dig them out and post them on this list.
Yes, please do!
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list