[html5] Identifying HTML 5 documents? (vs. alternate flavors)

Fri Feb 8 06:33:08 PST 2008

On Feb 8, 2008, at 4:44 AM, Henri Sivonen wrote:

> On Feb 4, 2008, at 18:39, Jim Correia wrote:
>> On Feb 4, 2008, at 11:24 AM, Henri Sivonen wrote:
>>> On Feb 4, 2008, at 17:28, Jim Correia wrote:
>>>
>>>> I know there has been some discussion about this on the forum. But
>>>> after having read through the draft spec and the FAQ, I'm still a
>>>> little unclear about how I can auto-detect that a document is using
>>>> HTML 5.
>>>
>>> The short answer is that HTML5 by design tries to discourage you  
>>> from trying to do that.
>>
>> I can understand that discouraging user-agents from doing this  
>> might be a good thing. At the same time, it appears to make life  
>> more difficult for those of us who produce authoring tools which  
>> must support legacy formats alongside HTML 5.
>
> If the spec had a centrally-prescribed way for authoring tools to do  
> spec versioning, people would be tempted to suggest all sorts of  
> version-based conditional behavior in browsers.

People may suggest it anyway. And some browser vendors may even oblige  
them. Meanwhile, without a sanctioned way to clearly identify HTML 5,  
it has been made difficult for those of us who want to do the right  
thing because some to avoid some hypothetical wrongness on someone  
else's part.

(If browser vendors want a version identifier, there's nothing  
stopping them from inventing one. Or several. It is not as if  
proprietary browser-specific

> I suppose we could add a modeline attribute on the root element if  
> its content were a non-standard tool-specific configuration  
> identifier to prevent general consuming apps from performing mode  
> switching on it.
>
> http://lists.w3.org/Archives/Public/public-html/2007JanMar/0433.html

Thanks for the pointer. In that message, for point 4, you wrote:

	If HTML6 is a superset of HTML5, writing HTML5 and checking with an
	HTML6 conformance checker won't be a problem. If HTML6 deprecates or
	obsoletes parts of HTML5, then we won't want to make it too easy for
	people to keep using the bad stuff without mentioning it to them, will
	we?

My experience of having shipping software to users and having to  
support those users tells me this is going to be a problem. Suppose  
they are using version 12 of my tool which does HTML 5 conformance  
checking; checking their documents reports no errors. But they have  
used elements or attributes which are deprecated in HTML6. They  
upgrade to version 13 of my tool which supports HTML6, and now  
checking those very same documents reports hundreds of errors. They  
won't have read the release notes, or the documentation, or...  
Instead, they'll write to my technical support address and complain  
that the conformance checker is broken because yesterday there were no  
errors in their documents and today there are hundreds.

(We did go through a painful period if this once in the past. Before I  
took over this part of the product, the checker was quite in adequate,  
but people used it in ignorant bliss. When we shipped an updated  
checker that found and reported many conformance issues that needed  
fixing, the reaction was that we broke the checker, not that the  
documents had always been broken.)

	If someone wants to keep checking against the definitions of HTML5 in
	the era of HTML6, I think it is reasonable put the burden of choosing a
	different version from a pop-up menu in the conformance checker UI on
	the person who wants to do legacy checking.

I agree that it is reasonable that a tool which supports HTML6  
conformance checking should default to HTML6.

The issue about - deprecated features and surprised users getting  
errors in previously error-free documents still stands.

Another usability issue is also the use case of users who work on  
multiple trees, and need to have mixed conformance checking (without  
constantly reconfiguring the tool) until such time as they can move  
their legacy HTML5 documents with deprecated elements over to the  
HTML6 standard.

>>> Wouldn't that kind of approach fail to detect that a set of  
>>> documents isn't fully HTML5-compliant if a document in the set is  
>>> autodetected as non-HTML5 and passes checks as whatever it was  
>>> detected as?
>>
>> I'm not sure I understand the question.
>
> Suppose I want to see if the .html files in a directory hierarchy  
> are HTML5-compliant. If the documents can declare themselves as non- 
> HTML5 and avoid being checked as HTML5, I get the wrong answer.

Now I see what you are getting at. I currently don't support that  
operation, and it is not something I typically do. But I see your  
point - one could check a tree full of documents against HTML 5, and  
if they were compliant, post process them to change the doctype (or  
remove it for the xml serialization.)

> If there are issues we don't foresee now but we see when the  
> successor of HTML5 is being defined, we can make the successor have  
> a distinguishing feature at that time.

After reading through the message you pointed through, as well as  
others found via searching, it sounds as though we've been around this  
block a time or two by now and that the spec authors are rather  
inflexible about this point (and no new arguments have swayed them)?

I also posted this to the help mailing list, and after having done so  
wondered if the specs or implementors (which hasn't seen traffic in a  
long time) may have been a more appropriate forum.

>> Again, this is a similar problem to HTML5. Without a heuristic that  
>> that says XHTML syntax, no doctype, probably XHTML 5 it seems like  
>> there isn't a good way to infer an author's intent when the  
>> document lives in a tree of documents targeting various  
>> specifications.
>
> Other XML editors solve this using an editor-specific PI.

I currently offer something similar to this for document fragments.  
(It takes the form of a comment, not a PI, since it has to work in  
HTML syntax as well as XHTML syntax.)

The problem with this approach is that, if people are required to add  
an editor specific PI to their document for any reason, there is  
resistance our outright refusal. The reasons can be any one of

	- Only a small number of as part of a larger team use your tool.
	- I'm not allowed to commit editor specific hacks to the repository.
	- etc.

Jim