[whatwg] Tag Soup: Blocks-in-inlines

Billy Wong billyswong at gmail.com
Wed Jan 25 07:17:51 PST 2006

On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
> Billy Wong wrote:
> > On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
> >> I'm not saying it won't break anything, but every single change we make
> >> to the parsing could possibly break any number of the billions of pages
> >> on the web in any number of browsers.
> >
> > But using your method (swapping inline node and block node) would
> > break presently valid and correct webpages.
> Such pages are invalid because inline-level elements are not allowed to
> contain block-level elements.  HTML pages containing the following:
> <span>
>    <div>...</div>
> </span>
> could be considered well-formed (if you apply the concept of
> well-formedness to HTML, even though it's not formally defined for it),
> but it's certainly not valid according to any official DTD.
Sorry.  I don't notice that this is invaild.  I am new here.  What
makes inline-level element not feasible to contain block-level
elements??  I am confused.

> > If breaking things is unavoidable, I prefer breaking things which are written incorrectly.
> No-one is intending to break anything that is written correctly.

I should change my line to "break things that are not well-formed
instead of those well-formed"

> > My idea is very extreme but simple and effecient:
> >     Parse the page regardless of what between "</" & ">".  See what's
> > written inside the close-tag merely a visual clue.
> >
> > Example: <span><div>X</span>Y</div>
> > + span
> >   + div
> >     + #text: X
> >   + #text: Y
> I'm kind of confused by what you're trying to do there.  You seem to be
> implicitly closing the div immediately before the span.  But then the Y
>   doesn't seem to be a child of the span at all in the markup, it looks
> like it should be a child of the div, yet in your DOM, it's not a child
> of the div, but is of the span.
> The DOM look equivalent to this markup:
>    <span><div>X</div>Y</span>
It is my fault for not explaining it more clearly.  Here I treat a
close tag like, what is written inside the close-tag doesn't matter to
the parser.  So your observation is correct.  I don't read and guess
what should I do when </span> is given instead of </div>.  I treat any
</xyz> after <div> to be </div>.  If somebody write a webpage not
well-formed, then the error will be displayed in such a distubing way
that no one can ignore it.  If the error is by mistake (which I
presume to be the only reason of a page not well-formed), web
developer(s) can catch the source of problem more easily - the error
will be observable *from* the starting point of the error *to* the
ending point of the error.  If this is too insane to everyone, as I
have said before, this idea is "very extreme".  I do not suggest that
this will be the best choice.

> which is insane.  It would make a little more sense if it were like this:
>    + span
>      + div
>        + #text: X
>    + #text: Y
> In other words, it would be equivlant to this markup:
> <span><div>X</div></span>Y
> That is actually quite sane and is what OpenSP does with invalid HTML,.
> regardless of which elements are used (presumably according to some SGML
> rules), but it would not be compatible with the current state of the web
> at all, and so is not a real option.
> > To correctly written webpages, this should pose no problems.  To
> > incorrect webpages, they deserve it since the point they ask the UA to
> > use "standard mode".
> In theory, that sounds nice, but you have to remember:
>    "to a rough approximation, all the content on the Web is errorneous,
>     invalid, or non-conformant." -- Hixie
> So, to say "they deserve it" to 100% of the web (roughly speaking) isn't
> really an option, unfortunately.  It's ok to say it to the most
> pathological of cases that depend on one particular browser's insane and
> undefined error recovery techniques, yet already breaks in everything
> else, but not to the whole web.
First, my idea would not, and should not, break the whole web.  If it
is really deployed, it would only break webpage that are not
well-formed in this particular way.
Second, this discussion begins to be for error-handling in HTML5.  I
believe the motto "Make the wrong looks wrong".  Since the
introduction of CSS and its ability to do "div span { blahblahblah;
}", we can't go back to IE's insectual appoach.  If the error-handling
mechanism make people feel mixing open-close-tags "okay" and then the
mechanism doesn't work up to their expectation occasionally, they will
blame the browser and never notice their fault.  Unless we can find a
perfect mechanism which will never "break" their expectation, the
problem will go on.  And I suppose the mechanism we are discussing
here should be used only in HTML5 onward, something the whole web not
using these day.
Of course, if someone can suggest a mechanism which does not "break"
things, I will love it.

> --
> Lachlan Hunt
> http://lachy.id.au/

More information about the whatwg mailing list