[whatwg] Reconstructing formatting elements (8.2.5)

Kartikaya Gupta lists.whatwg at stakface.com
Fri Feb 27 14:20:58 PST 2009


I have a question about how formatting elements are reconstructed when dealing with tainted tables. Specifically, the fine folks running westjet.com stuck some malformed HTML on their site that I've boiled down to the following snippet:

<table>
 <tr>
  <a href="foo"><td></a></td>
  <td> </td>
 </tr>
</table>

When I parse this using the validator.nu HTML5 parser implementation, the <a> tag gets put into the list of formatting elements. All the bits of whitespace that come later trigger a reconstruction of the active formatting elements, so the <a> gets cloned a bunch of times. The resulting DOM ends up like so:

<HTML><HEAD></HEAD><BODY><A href="foo"></A><A href="foo">
  </A><A href="foo">
 </A><A href="foo">
</A><TABLE>
 <TBODY><TR>
  <TD></TD><TD> </TD></TR></TBODY></TABLE><A href="foo">
</A></BODY></HTML>

This seems to be correct behavior according to what is specced in HTML5. However, none of the major browsers clone the <a> tag at all. [1] It looks like the <a> gets removed from the list of active formatting elements at some point, but that step seems to be missing from the spec. Thoughts?

[1] Live DOM viewer link: http://software.hixie.ch/utilities/js/live-dom-viewer/?<table>%0A%20<tr>%0A%20%20<a%20href%3D"foo"><td></a></td>%0A%20%20<td>%20</td>%0A%20</tr>%0A</table>%0A



More information about the whatwg mailing list