[whatwg] a few comments to Webforms 2.0 Call For Comments

Sun Aug 22 04:04:37 PDT 2004

I agree with everything in Henri's e-mail explaining why spec-mandated 
DOCTYPE-triggered mode switching makes no sense.

On Sun, 1 Aug 2004, Henri Sivonen wrote:
>
> On Aug 1, 2004, at 06:10, Matthew Thomas wrote:
> 
> > On 31 Jul, 2004, at 11:58 PM, Henri Sivonen wrote:
> > > ...
> > > First of all, the solution needs to apply to XHTML as well as HTML. If we
> > > still assume XML is to be taken seriously (and not as tag soup), doctype
> > > sniffing on the XML side is totally, utterly bogus.
> > 
> > That's a presumptive definition of "seriously".
> 
> The presumption is that if lower-level spec defines two things that are
> equivalent, a higher-level spec should not try to give different meanings to
> the two things. So I'm being presumptuous only in the sense that I think
> layered spec design general best practice should be followed.
> 
> In formal terms, if two XML documents have the same canonical form and an app
> treats them differently (and the difference is not due to opting not to
> process external entities), the app is broken, IMHO.
> 
> In practical terms, if two XML documents cause the same content to be reported
> (qnames ignored) to SAX2 ContentHandler and an app treats the documents
> differently, the app is broken, IMHO.
> 
> A spec that would explicitly or implicitly require an implementation to be
> broken is itself broken.

Indeed. And DOCTYPEs are basically optional -- in XML, everything has to 
be based off namespaces.

It should also be possible to construct a DOM tree "by hand" using the DOM 
and have the exact same rendering as if the DOM tree was obtained by 
parsing a document.

> > In the long run, it *may* be the case that treating XHTML as tag soup is the
> > only "serious" way of doing it.
> 
> WHAT WG should not try to push things to that direction.

Indeed. And I disagree that treating XHTML as tag soup would ever be the 
right way to do it -- if that is what people want, they should use HTML, 
which is already at that stage.

XML has well-defined parsing rules. There's no reason not to follow them.

> > > The reason why it is bogus is that including a DTD by reference and
> > > pasting it inline are supposed to be equivalent for validating XML
> > > processor and in the latter case you don't see a public identifier for the
> > > DTD. Hence, using the public identifier for any purpose other than
> > > locating the DTD is just plain wrong. Of course, sane real-world XHTML
> > > user agents use non-validating XML processors which makes the inclusion of
> > > the doctype declaration rather pointless.
> > 
> > So do any real-world XHTML UAs handle a DTD pasted inline, or is this just a
> > theoretical argument?
> 
> Mozilla processes the internal DTD subset, but that was not my point.

UAs must, per XML, handle internal subsets.

> My point was that if you have
> #include "foo.h"
> you should not bind any black magic to the name foo.h, because it should be
> permissible to paste the contents of foo.h inline or copy the contents of
> foo.h to bar.h and say
> #include "bar.h"

Indeed.

> However, considering that as a Web author you cannot trust that everyone
> parsing your pages uses an XML processor that resolves external entities,
> including a doctype in XML intended for the Web is mostly pointless and often
> done out of a cargo cultish habit.

Hear hear. This is one of the many reasons that WHATWG specs actually 
subtly discourage the use of DOCTYPEs.

> > > ...
> > > Now, similar argumentation does not work on the HTML side if we agree not
> > > to pretend that real SGML is being processed. Doctype sniffing is a tag
> > > soup solution to a tag soup problem.
> > 
> > That's an extrapolation from a single data point. The only use of doctype
> > sniffing *so far* has been to handle quirky style/layout expectations of old
> > pages (and in the case of table style inheritance, they wouldn't even need
> > to be tag-soup pages). In the long run, doctype sniffing may become a
> > general-purpose method of changing *any* undesired behavior (whether
> > de-facto or de-jure) of old syntax in new spec versions.
> 
> Doctype sniffing was devised after the HTML 4 and CSS2 specs had been written
> as a heuristic to distinguish legacy documents from documents whose authors
> might expect conforming behavior.
> 
> The circumstances and requirements that led to doctype sniffing were different
> from the circumstances and requirements for specs that have not yet been
> finalized. With WF2 there is no need to come up with an extension to an old
> heuristic. Now that the issue has been raised in the speccing phase we can
> have a more explicit incantation. For example: <meta
> name="mpt-approved-radio-buttons" content="true"> or <meta
> name="what-wg-behavior" content="do-the-right-thing">

Exactly. DOCTYPE sniffing was always meant to be a heuristic; a way of 
detecting whether the page author had written the page before or after 
browsers started seriously looking at spec compliance.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'