[whatwg] Create my own DTD and specify in DOCTYPE? Re: Validation

Eduard Pascual herenvardo at gmail.com
Tue Jul 21 14:02:03 PDT 2009

On Tue, Jul 21, 2009 at 10:02 PM, <Darxus at chaosreigns.com> wrote:
> On 07/21, Tab Atkins Jr. wrote:
>> HTML5 is not an SGML or XML language.  It does not use a DOCTYPE in
> I thought HTML5 conformed to XML?
>> any way.  The "<!DOCTYPE HTML>" incantation required at the top of
>> HTML5 pages serves the sole purpose of tricking older browsers into
>> rendering the document as well as possible.  No checking is made
>> against a DTD, official or otherwise.
> I understand that, but the spec says an HTML5 document must include
> <!DOCTYPE html>.  And I would like, for my own purposes, to be able to
> instead use <!DOCTYPE html SYSTEM
> "http://www.chaosreigns.com/DTD/html5.dtd"> without violating HTML5.

First things first: DTDs are a quite limited mechanism to describe
what a specific XML or SGML language allows.
The decision of HTML5 not having a DTD was influenced by two essential
factors: first, and most obvious, is that HTML5 isn't neither XML nor
SGML (sure, it provides a XML serialization; but if you are using it
you might use XML Schema instead of DTD anyway); and second, less
obvious but not less important: many of the requirements,
restrictions, and so on defined in HTML5 can't be properly described
via DTDs. So, what would be the point on defining a DTD which can't be
used to actually validate the document?

It is possible to go nuts treating your HTML documents as pure XML
(you'd need to ensure that they are well-formed and so on), and use a
DTD (or a XML Schema) with it. To spice it up, toss in an XSLT
stylesheet with the "HTML" output mode that just outputs the root of
the document, and voilà, you get your pure XML (served as text/xml or
application/xml) document treated as pure HTML by all browsers
(including IE6 onwards). IMHO, quite overkill. The issue here is that,
either DTDs or Schemas have some limitations, so it wouldn't be enough
to properly validate the document.

This leads to some deeper thought then: If DTD or similar tools
doesn't really help to validate the document, what is the problem we
are trying to solve with them? IIRC, the original mail stated that the
goal was to differentiate between versions (hypothetical HTML7 and
HTML9) in order to ensure browser compatibility (with an hypothetical
IE10, which would support HTML7 but not HTML9). Well, if your
validator needs to distinguish between these two versions, there are
already several mechanisms at your reach: you may use custom HTTP
headers, or add a "data-html-version=7" or "data-html-version=9"
attribute to your <body> tag: on both cases, your documents would
still comply with (current) HTML requirements and document model, and
your validator will have a way to differentiate the, Problem solved.
With no changes to HTML5. And without having to write a DTD that can
get close but will never be able to work properly.

Furthermore, either of these approaches have additional benefits.
Let's make a slight change to the original scenario: suppose that IE10
complies with most of HTML7, but fails to render properly one or two
new elements; and maybe even supports some features introduced by
HTML8 (IMHO, partial support of multiple iterations of the language is
more likely to match reality than perfectly implementing one of them
but providing zero support for the following ones). Why should you
restrain yourself from using those features of HTML8 that are
supported on IE10? With the @data-* approach, you don't have to: you
may instead put something like this "data-html-subset=IE10-compatible"
in your <body> and there you go. Your validator should be made aware
of what is supported on each "subset" you are using, and you will be
able to squeeze the most from each browser you whish to support, and
automate the validation as intended in the original use case.

Eduard Pascual

More information about the whatwg mailing list