[whatwg] Tag Soup: Blocks-in-inlines

Wed Jan 25 04:21:22 PST 2006

On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
> I'm not saying it won't break anything, but every single change we make
> to the parsing could possibly break any number of the billions of pages
> on the web in any number of browsers.

But using your method (swapping inline node and block node) would
break presently valid and correct webpages.  If breaking things is
unavoidable, I prefer breaking things which are written incorrectly. 
My idea is very extreme but simple and effecient:
    Parse the page regardless of what between "</" & ">".  See what's
written inside the close-tag merely a visual clue.

Example: <span><div>X</span>Y</div>
+ span
  + div
    + #text: X
  + #text: Y

To correctly written webpages, this should pose no problems.  To
incorrect webpages, they deserve it since the point they ask the UA to
use "standard mode".

On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
> Anne van Kesteren wrote:
> > Quoting Lachlan Hunt <lachlan.hunt at lachy.id.au>:
> >> 1.
> >> <em><p>X</em>Y</p>
> >>
> >> BODY
> >>   + P
> >>     + EM
> >>       + #text: X
> >>     + #text: Y
> >>
> >> The theory is that any inline elements
> >
> > This gives problems for new elements I assume... We already have a
> > problem with
> > <header><h1>test</h1></header>...
>
> I don't see how this affects new elements, it should only affect known
> inline elements.
>
> >> 2.
> >> <em><p>XY</p></em>
> >>
> >> BODY
> >>   + P
> >>     + EM
> >>       + #text: X
> >>       + #text: Y
> >
> > And this likely breaks existing content. Perhaps not for EM, but
> > certainly for
> > other inline elements, like <span>.
>
> I'm not saying it won't break anything, but every single change we make
> to the parsing could possibly break any number of the billions of pages
> on the web in any number of browsers.  However, the chances are that
> such pages are already broken is several browsers already (probably
> built for IE only, who's quirks we are definitely not keeping), so I
> don't see this as a huge problem.
>
> There's nothing wrong with saner parsing at the expense of a few broken
> pages which I'm sure will still remain readable (even if they don't look
> perfect) and/or be easily fixed by their authors.  Trying to remain 100%
> compatible with 100% of the web is physically impossible.
>
> However, span does show some interesting behaviour which should be made
> more consistent with other inline elements.
>
> <!DOCTYPE html><span><p>X</span>Y</p>
>
> Firefox:
> HTML
>    + HEAD
>    + BODY
>      + SPAN
>        + P
>          + #text: X
>      + #text: Y
>
> Opera 9/Win:
> HTML
>    + BODY
>      + SPAN
>        +P
>          +#text: X
>          +#text: Y
>
> IE6:
> HTML
>    + HEAD
>      + TITLE
>    + BODY
>      + SPAN
>        + P
>          + #text: X
>          + #text: Y
>        + #text: Y (Highlighted in red in the DOM view)
>
> --
> Lachlan Hunt
> http://lachy.id.au/
>
>