[whatwg] Tag Soup: Blocks-in-inlines
lachlan.hunt at lachy.id.au
Wed Jan 25 05:02:12 PST 2006
Billy Wong wrote:
> On 1/25/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
>> I'm not saying it won't break anything, but every single change we make
>> to the parsing could possibly break any number of the billions of pages
>> on the web in any number of browsers.
> But using your method (swapping inline node and block node) would
> break presently valid and correct webpages.
Such pages are invalid because inline-level elements are not allowed to
contain block-level elements. HTML pages containing the following:
could be considered well-formed (if you apply the concept of
well-formedness to HTML, even though it's not formally defined for it),
but it's certainly not valid according to any official DTD.
> If breaking things is unavoidable, I prefer breaking things which are written incorrectly.
No-one is intending to break anything that is written correctly.
> My idea is very extreme but simple and effecient:
> Parse the page regardless of what between "</" & ">". See what's
> written inside the close-tag merely a visual clue.
> Example: <span><div>X</span>Y</div>
> + span
> + div
> + #text: X
> + #text: Y
I'm kind of confused by what you're trying to do there. You seem to be
implicitly closing the div immediately before the span. But then the Y
doesn't seem to be a child of the span at all in the markup, it looks
like it should be a child of the div, yet in your DOM, it's not a child
of the div, but is of the span.
The DOM look equivalent to this markup:
which is insane. It would make a little more sense if it were like this:
+ #text: X
+ #text: Y
In other words, it would be equivlant to this markup:
That is actually quite sane and is what OpenSP does with invalid HTML,.
regardless of which elements are used (presumably according to some SGML
rules), but it would not be compatible with the current state of the web
at all, and so is not a real option.
> To correctly written webpages, this should pose no problems. To
> incorrect webpages, they deserve it since the point they ask the UA to
> use "standard mode".
In theory, that sounds nice, but you have to remember:
"to a rough approximation, all the content on the Web is errorneous,
invalid, or non-conformant." -- Hixie
So, to say "they deserve it" to 100% of the web (roughly speaking) isn't
really an option, unfortunately. It's ok to say it to the most
pathological of cases that depend on one particular browser's insane and
undefined error recovery techniques, yet already breaks in everything
else, but not to the whole web.
More information about the whatwg