[html5] Identifying HTML 5 documents? (vs. alternate flavors)

Henri Sivonen hsivonen at iki.fi
Mon Feb 4 08:24:17 PST 2008

On Feb 4, 2008, at 17:28, Jim Correia wrote:

> I know there has been some discussion about this on the forum. But
> after having read through the draft spec and the FAQ, I'm still a
> little unclear about how I can auto-detect that a document is using
> HTML 5.

The short answer is that HTML5 by design tries to discourage you from  
trying to do that.

> (Or more precisely, that the author of the document intended
> it to be conformant to HTML 5.)

HTML5 is designed so that this doesn't need to be asserted to the  
other party when sending HTML5 content to a consuming client. In the  
case of an author who is conformance checking his own stuff (as  
opposed to communicating with another party), the theory goes that the  
authors simply chooses to use a tool that only supports HTML5 or that  
is configured to support HTML5.

This might be a bit inconvenient if during a transition period the  
author also wants to target legacy flavors of HTML in some of his  

> I have a conformance checker tool which needs to autodetect the flavor
> of HTML in use so it can determine which particular set of conformance
> tests to apply to the document.

Do I guess correctly that this will be part of a text editor for Mac?

> (We may be talking about a single
> document, or traversing a directory tree and processing all documents
> in the tree. In either case, the document type should be auto  
> detected.)

Wouldn't that kind of approach fail to detect that a set of documents  
isn't fully HTML5-compliant if a document in the set is autodetected  
as non-HTML5 and passes checks as whatever it was detected as?

> For HTML syntax, the shorted form of the doctype "<!DOCTYPE HTML>" is
> required. This is sufficiently different from all previous doctypes
> that it can be mapped to HTML 5. But since there is no version
> information included in the doctype, what happens when the successor
> to HTML 5 comes out?

When the successor of HTML5 comes out, authors are supposed to create  
content according to the requirements of the successor and no longer  
according to HTML5.

This assumes, of course, that whoever defines the successor of HTML5  
define the successor reasonably, so that conforming HTML5 documents  
remain conforming and mean the same thing according to the successor.  
The obvious problem with that assumption is that so far definers of  
HTML flavors have had a tendency to deprecate or obsolete features. We  
can hope that the definers of the successors of HTML5 don't seek to  
deprecate or obsolete anything unless the deprecated or obsoleted bit  
is so harmful that telling every author that their documents no longer  
conform is of paramount importance.

> For XHTML syntax, the doctype is to be omitted. In this situation, how
> should I autodetect that we are using XHTML 5 as opposed to some other
> flavor?

By design, you shouldn't. Validator.nu defaults to XHTML5 + SVG 1.1 +  
MathML 2.0 for application/xhtml+xml. I suggest doing the same  
for .xhtml (assuming that the tool in question is a text editor  
operating on local files): defaulting to the latest Web-relevant  
compound document format combination supported by the checker.

Henri Sivonen
hsivonen at iki.fi

More information about the Help mailing list