[whatwg] Tag Soup: Blocks-in-inlines
Mikko Rantalainen
mikko.rantalainen at peda.net
Thu Jan 26 07:10:43 PST 2006
Lachlan Hunt wrote:
> <!DOCTYPE html>
> <em><p><span><h1>X</em>Y</span>Z</h1></p>
>
> Mozilla:
> BODY
> + EM
> + P
> + SPAN
> + H1
> + EM
> + #text: X
> + #text: YZ
>
> That look reasonably like what the author would want with that rubbish,
> except that the Z is within the span, but it's not in the markup. If
> you swap <span> with <strong>, the result is even more perplexing, but
> the Z is not put within the STRONG element)
I don't like this style because it messes badly with parents and
children. It should be clear from the source that CSS selector "em p
span h1" should match the string "X". However, with mozilla this
isn't the case.
> Safari:
> BODY
> + EM
> + P
> + SPAN
> + H1
> + #text: X
> + #text: Y
> + #text: Z
>
> In this case, it's all emphasised, instead of just the X like it is in
> Mozilla. If you swap <span> with <strong>, the result is almost the
> same, except there is an additional empty STRONG element added as a
> child of the EM, after the P for no apparent reason.)
Why not just a single text node?
I think a simple way to parse what the author meant is to use just
the following rules:
1) An opening tag always starts a new element
2) A matching closing tag closes the element
3) A non-matching closing tag (top of the element stack
doesn't match with the closing tag) closes all still
open elements until a match is found. Exceptions for
this rule:
3.1) There's no matching element in the stack.
The closing tag will be ignored.
3.2) Closing tag is for inline element and closing
it would require closing a block-level element.
The closing tag will be ignored.
4) At the end of file, all still open elements are closed.
Unless I made a mistake these rules are usually able to decipher the
meaning the author intended. Applying these rules to example
<em><p><span><h1>X</em>Y</span>Z</h1></p>
gives us
EM
+ P
+ SPAN
+ H1
+ #text: XYZ
which is about the same as Safari's interpretation.
As an added bonus, the above simple algorithm doesn't need to look
forward for tags to come, so it doesn't prevent incremental rendering.
However, it isn't this easy in real world, because step 1 must
support stuff like META, LINK and IMG which have no end tag and
never contain other elements. I think the best way is to just close
those tags immediately afterwards automatically. If an explicit
closing tag is later found, it will be automatically ignored in step 3.
--
Mikko
More information about the whatwg
mailing list