[whatwg] On "validation"
hsivonen at iki.fi
Thu Mar 16 08:46:13 PST 2006
From the spec:
> The term "validation" specifically refers to a subset of
> conformance checking that only verifies that a document complies
> with the requirements given by an SGML or XML DTD. Conformance
> checkers that only perform validation are non-conforming, as there
> are many conformance requirements described in this specification
> that cannot be checked by SGML or XML DTDs.
> To put it another way, there are three types of conformance
> 1. Criteria that can be expressed in a DTD.
> 2. Criteria that cannot be expressed by a DTD, but can still
> be checked by a machine.
> 3. Criteria that can only be checked by a human.
> A conformance checker must check for the first two. A simple
> DTD-based validator only checks for the first class of errors and
> is therefore not a conforming conformance checker according to this
There are three things I don't like about this note:
First, it perpetuates the "Validation means only DTD validation" mantra.
Second, it mentions SGML and XML DTDs casually together.
Third, it can be read to imply that using a DTD as part of a
conformance checker is a good idea.
In addition to the SGML and XML specifications there are no other
specifications used in the context of XML that to define "valid" in
the context of each specification and define it meaning something
other than what is meant in the SGML or XML specifications. RELAX NG,
Schematron and W3C XML Schema are examples of specifications that use
the word of "valid" as a technical term that does not involve any
kind of DTD. RELAX NG and Schematron have even made it through ISO.
(The closest definition of validation in WA1.0/WF2.0 is validation of
form field values.)
Despite what the W3C Validator has led people to believe, if a data
object is valid as per SGML, it could still not be even well-formed
as per XML. Since HTML5 is not based on SGML, I think any implication
that SGML DTDs could in any way be relevant to HTML5 (or XHTML in
general) should be avoided.
The implication that XML DTDs could be used for partial conformance
checking is a, in my opinion, harmful because:
* the way DTDs are normally used and the only way that is
sanctioned by the XML spec contaminates the document instance
* the document itself can smuggle grammar rules of its own into the
* DTDs don't support namespaces
* DTDs are hopelessly inadequate in expressing the conformance
Suggested replacement text:
Note: XML DTDs cannot express all the conformance requirement of this
specification. Therefore, a validating the XML processor and a DTD
cannot constitute a conformance checker. Also, since the two
authoring formats defined in this specification are applications of
SGML, a validating SGML system cannot constitute a conformance checker.
Since a large part of HTML5 involves aligning in the spec with the
real world, perhaps the term "HTML5 validation" should be defined to
mean the same as "HTML5 conformance checking". :-)
When I tell a friend or an acquaintance about my thesis, the
discussion usually goes more or less like this:
Me: So the working title of my master's thesis is "A Conformance
Checking Service for Web Applications 1.0 Documents".
Friend: Come again?
Me: "A Conformance Checking Service for Web Applications 1.0 Documents".
Friend: Web 1.0 applications?
Me: "Web Applications 1.0" is the name of a spec. The nickname is
HTML5, but that's a politically hot name.
And if the friend is interested enough, it continues like this:
Friend: So what it is you are doing exactly?
Me: I'm developing a service that takes a document, which means a
finite sequence of bytes and a Content-Type header, and checks if it
meets the requirements of the spec.
Friend: So basically you are developing an HTML validator.
Me: Roughly, yes, but it is called a conformance checker.
And then once:
Me: The working title of my thesis is "A Conformance Checking Service
for Web Applications 1.0 Documents".
SemWeb guy: You mean an HTML validator?
hsivonen at iki.fi
More information about the whatwg