[whatwg] <!DOCTYPE html><body><table><math><mi>foo</mi></math></table>

Mon Dec 12 19:36:47 PST 2011

I think this is the same problem I reported here: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-October/033533.html
See Hixie's response to that message.  I think this is a known problem, though I don't know if a bug has been filed on it.

    David

----- Original Message -----
From: "Adam Barth" <w3c at adambarth.com>
To: "whatwg" <whatwg at lists.whatwg.org>
Cc: "Henri Sivonen" <hsivonen at iki.fi>
Sent: Monday, December 12, 2011 6:23:23 PM
Subject: [whatwg] <!DOCTYPE	html><body><table><math><mi>foo</mi></math></table>

I'm trying to understand how the HTML parsing spec handles the following case:

<!DOCTYPE html><body><table><math><mi>foo</mi></math></table>

According to the html5lib test data, we should parse that as follows:

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <math math>
|       <math mi>
|         "foo"
|     <table>

However, I'm not sure whether that's what the spec actually does.

Consider point at which we parse the "f" character token (from "foo").
 The insertion mode will be "in table".  The spec will execute as
follows:

-> If the current node is a MathML text integration point and the
token is a character token
  * Process the token according to the rules given in the section
corresponding to the current insertion mode in HTML content.

-> A character token
  * Let the pending table character tokens be an empty list of tokens.
  * Let the original insertion mode be the current insertion mode.
  * Switch the insertion mode to "in table text" and reprocess the token.

-> Any other character token
  * Append the character token to the pending table character tokens list.

... the "o" and "o" will be processed similarly and end up in the
pending table character tokens list.

Now, consider the </mi> token.  We're still at a MathML text
integration point, but the current token is neither a start token
(with certain names) nor a character token, so we process the token
according to the rules given in the section for parsing tokens in
foreign content.

-> Any other end tag
  * Run these steps:
    ...

The net result of which is popping the stack of open elements, but not
flushing out the pending table character tokens list.  The list will
eventually be flushed when we process the </table> token, resulting
these character tokens getting foster parented:

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <math math>
|       <math mi>
|     "foo"
|     <table>

Thanks,
Adam