[whatwg] text/html conformance checkers and doctype

Henri Sivonen hsivonen at iki.fi
Fri Jul 29 11:06:05 PDT 2005


On Jul 29, 2005, at 02:19, Ian Hickson wrote:

> On Thu, 28 Jul 2005, Henri Sivonen wrote:
>>>>
>>>> The advantage of allowing case-insensitivity and white space
>>>> variance is that it would be more uniform with HTML4 doctypes. That
>>>> is, it would be easier to write software that deals with both.
>>>
>>> You seem to be mixing authoring requirements and implementation
>>> requirements.
>>
>> No. I am interested in requirements for conformance checker
>> implementations and, therefore, authoring.
>
> Ah. I hadn't considered people wanting to write one conformance 
> checker to
> check HTML4 and HTML5. I would imagine the DOCTYPE would be the least 
> of
> your problems if you did. :-)

I would like to add HTML (both 4 and 5) support to 
http://hsivonen.iki.fi/validator/ .

A key component consists of schemas (both in RELAX NG and in 
Schematron), which I do not intend to develop myself. The schemas are 
the largest chunk of work.

For XHTML 1.0 I am currently offering James Clark's schemas as presets. 
They do not enforce attribute data types, though. I may switch the 
presets over to the stricter versions modified by Petr Nalevka. I'm 
envisioning applying the XHTML 1.0 schemas to HTML4.

To use XML tools with HTML, the lower left part of
http://hsivonen.iki.fi/img/what-wg-conformance-checker.png
needs to become real. I am currently pondering the box labeled 
"Tag-level HTML parser". Assuming that the supported syntax for HTML4 
is constrained to exclude minimizations that don't work in browsers, 
the biggest issue with decoupling the parser from the HTML version 
seems to be the doctype.

> In WF2 there is no conformance checker product class so no need to
> put anything there, IMHO.

Since WA1 is a larger departure from HTML4 and less likely to be 
implemented soon in browsers, I think it would be desirable to also 
provide checking for WF2 without the rest of WA1 stuff.

>> Eg. an author could reasonable expect to be able to use one or more
>> whitespace charecters instead of one space between "DOCTYPE" and 
>> "html",
>> because that's how it has been before and still is between attributes 
>> (I
>> hope). Why forbid it?
>
> Why forbid:
>
>    <!doctype html "foo" "bar">
>
> ...? :-)

Because it triggers the quirks mode in Firefox and Safari.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/




More information about the whatwg mailing list