[whatwg] Configure Apache to send the right MIME type for XHTML

Wed Mar 7 11:45:18 PST 2007

On Wed, 07 Mar 2007 20:04:08 +0100, Elliotte Harold <elharo at metalab.unc.edu> wrote:

> Yes, and I use it. However it constantly surprises people in the markup
> it generates, as hanging out for a day or two on the tagsoup-friends
> mailing list will show. That's not it's fault. There's just no one
> obvious way to fix all the broken markup that's out there. TagSoup picks
> one approach. HTML 5 picks another. Both will surprise people a lot of
> the time. At the parser level that can't be helped.
>
> However at the document level it can be helped. When the document author
> takes the care to generate a well-formed document, they are rarely
> surprised by the resulting tree the parser builds. The tree is explicit
> in the markup. Explicit markup is more obvious and less surprising than
> the implicit fill-in both TagSoup and HTML 5 do.

There's nothing surprising in the DOM that TagSoup generates when parsing a valid HTML4 document.

> Hmm, that brings up another question. Does the HTML 5 fixup algorithm
> ever change the *tree* for a well-formed (but invalid) document?

There is no notion of well-formedness for HTML. A document is simply either conformant to HTML5 or not.

> For instance, if it finds an li element that is a child of a p, what would
> it do? Either ignoring the <li></li> tags, skipping the li element
> completely, or filling in a ul element would all change the tree.

That would be a non-conformant document. According to HTML5, such document will be parsed into a tree which, when serialized, would result in text different from the text of the original (non-conformant) document.

> I suspect it does one of these three things (or something similar like
> filling in an ol element) but without opening the spec or writing a
> sample program, I can't tell you which.
>
> By contrast with a real XML parser, I can tell you what's going to
> happen without cracking open the spec. HTML5, TagSoup, and XML parse
> trees are all deterministic and thus predictable; but only the XML tree
> is *obvious*.

HTML5 unambiguously defines what should happen.

To summarize:
1. Parsers in today's browsers generate predictable DOM for valid HTML4 documents.
2. A conformant HTML5 parser generates predictable DOM for both conformant and non-conformant HTML5 documents.

Also, the result of parsing a valid HTML4 document with a today's browser, as well as the result of parsing a conformant HTML5 document with a conformant HTML5 parser, are both predictable and obvious (i.e. it doesn't require actually performing the complex HTML5 parsing alorithm in your mind to predict the resulting DOM).

So, if you stick to valid HTML4 (or, in the future, conformant HTML5), you'll get both predictable and obvious results.

-- 
Alexey Feldgendler <alexey at feldgendler.ru>
[ICQ: 115226275] http://feldgendler.livejournal.com