[whatwg] <!DOCTYPE html><body><table><math><mi>foo</mi></math></table>

Mon Dec 12 21:05:41 PST 2011

Yes, that's the same issue.  It appears to be fallout from removing
the "in foreign content" insertion mode.

Adam

On Mon, Dec 12, 2011 at 7:36 PM, David Flanagan <dflanagan at mozilla.com> wrote:
> I think this is the same problem I reported here: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-October/033533.html
> See Hixie's response to that message.  I think this is a known problem, though I don't know if a bug has been filed on it.
>
>    David
>
> ----- Original Message -----
> From: "Adam Barth" <w3c at adambarth.com>
> To: "whatwg" <whatwg at lists.whatwg.org>
> Cc: "Henri Sivonen" <hsivonen at iki.fi>
> Sent: Monday, December 12, 2011 6:23:23 PM
> Subject: [whatwg] <!DOCTYPE     html><body><table><math><mi>foo</mi></math></table>
>
> I'm trying to understand how the HTML parsing spec handles the following case:
>
> <!DOCTYPE html><body><table><math><mi>foo</mi></math></table>
>
> According to the html5lib test data, we should parse that as follows:
>
> | <!DOCTYPE html>
> | <html>
> |   <head>
> |   <body>
> |     <math math>
> |       <math mi>
> |         "foo"
> |     <table>
>
> However, I'm not sure whether that's what the spec actually does.
>
> Consider point at which we parse the "f" character token (from "foo").
>  The insertion mode will be "in table".  The spec will execute as
> follows:
>
> -> If the current node is a MathML text integration point and the
> token is a character token
>  * Process the token according to the rules given in the section
> corresponding to the current insertion mode in HTML content.
>
> -> A character token
>  * Let the pending table character tokens be an empty list of tokens.
>  * Let the original insertion mode be the current insertion mode.
>  * Switch the insertion mode to "in table text" and reprocess the token.
>
> -> Any other character token
>  * Append the character token to the pending table character tokens list.
>
> ... the "o" and "o" will be processed similarly and end up in the
> pending table character tokens list.
>
> Now, consider the </mi> token.  We're still at a MathML text
> integration point, but the current token is neither a start token
> (with certain names) nor a character token, so we process the token
> according to the rules given in the section for parsing tokens in
> foreign content.
>
> -> Any other end tag
>  * Run these steps:
>    ...
>
> The net result of which is popping the stack of open elements, but not
> flushing out the pending table character tokens list.  The list will
> eventually be flushed when we process the </table> token, resulting
> these character tokens getting foster parented:
>
> | <!DOCTYPE html>
> | <html>
> |   <head>
> |   <body>
> |     <math math>
> |       <math mi>
> |     "foo"
> |     <table>
>
> Thanks,
> Adam