[whatwg] a few comments to Webforms 2.0 Call For Comments

Henri Sivonen hsivonen at iki.fi
Sun Aug 1 09:38:17 PDT 2004

On Aug 1, 2004, at 06:10, Matthew Thomas wrote:

> On 31 Jul, 2004, at 11:58 PM, Henri Sivonen wrote:
>> ...
>> First of all, the solution needs to apply to XHTML as well as HTML. 
>> If we still assume XML is to be taken seriously (and not as tag 
>> soup), doctype sniffing on the XML side is totally, utterly bogus.
> That's a presumptive definition of "seriously".

The presumption is that if lower-level spec defines two things that are 
equivalent, a higher-level spec should not try to give different 
meanings to the two things. So I'm being presumptuous only in the sense 
that I think layered spec design general best practice should be 

In formal terms, if two XML documents have the same canonical form and 
an app treats them differently (and the difference is not due to opting 
not to process external entities), the app is broken, IMHO.

In practical terms, if two XML documents cause the same content to be 
reported (qnames ignored) to SAX2 ContentHandler and an app treats the 
documents differently, the app is broken, IMHO.

A spec that would explicitly or implicitly require an implementation to 
be broken is itself broken.

> In the long run, it *may* be the case that treating XHTML as tag soup 
> is the only "serious" way of doing it.

WHAT WG should not try to push things to that direction.

>> The reason why it is bogus is that including a DTD by reference and 
>> pasting it inline are supposed to be equivalent for validating XML 
>> processor and in the latter case you don't see a public identifier 
>> for the DTD. Hence, using the public identifier for any purpose other 
>> than locating the DTD is just plain wrong. Of course, sane real-world 
>> XHTML user agents use non-validating XML processors which makes the 
>> inclusion of the doctype declaration rather pointless.
> So do any real-world XHTML UAs handle a DTD pasted inline, or is this 
> just a theoretical argument?

Mozilla processes the internal DTD subset, but that was not my point.

My point was that if you have
#include "foo.h"
you should not bind any black magic to the name foo.h, because it 
should be permissible to paste the contents of foo.h inline or copy the 
contents of foo.h to bar.h and say
#include "bar.h"

However, considering that as a Web author you cannot trust that 
everyone parsing your pages uses an XML processor that resolves 
external entities, including a doctype in XML intended for the Web is 
mostly pointless and often done out of a cargo cultish habit.

>> ...
>> Now, similar argumentation does not work on the HTML side if we agree 
>> not to pretend that real SGML is being processed. Doctype sniffing is 
>> a tag soup solution to a tag soup problem.
> That's an extrapolation from a single data point. The only use of 
> doctype sniffing *so far* has been to handle quirky style/layout 
> expectations of old pages (and in the case of table style inheritance, 
> they wouldn't even need to be tag-soup pages). In the long run, 
> doctype sniffing may become a general-purpose method of changing *any* 
> undesired behavior (whether de-facto or de-jure) of old syntax in new 
> spec versions.

Doctype sniffing was devised after the HTML 4 and CSS2 specs had been 
written as a heuristic to distinguish legacy documents from documents 
whose authors might expect conforming behavior.

The circumstances and requirements that led to doctype sniffing were 
different from the circumstances and requirements for specs that have 
not yet been finalized. With WF2 there is no need to come up with an 
extension to an old heuristic. Now that the issue has been raised in 
the speccing phase we can have a more explicit incantation. For 
example: <meta name="mpt-approved-radio-buttons" content="true"> or 
<meta name="what-wg-behavior" content="do-the-right-thing">

>> Still, doctype sniffing is already confusing and convoluted enough 
>> for casual authors. (See http://iki.fi/hsivonen/doctype.html for 
>> subtle differences between user agents.)  I think perturbing it 
>> further is a bad idea.
> Sure, but it may be unavoidable, just like it is with natural 
> languages.

The code that implements doctype sniffing is (unlike natural languages) 
controlled by a very small group of people. Their acts don't just 
happen unavoidably.

> (Try running "The Canterbury Tales" through a Modern English 
> spellchecker or grammar checker, for example.)

English is the new lingua franca despite the (supposedly unavoidable) 
damage that has been allowed to happen to its spelling--not thanks to 
allowing the spelling to get to the state it is in now.

Henri Sivonen
hsivonen at iki.fi

More information about the whatwg mailing list